Deep Learning Project – Air Pollution Level Estimation using ANN
Machine Learning courses with 100+ Real-time projects Start Now!!
Program 1
# -*- coding: utf-8 -*-
"""Air Pollution Level Estimation_ANN.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1iD91CUNvXYOx4WjCe9LgVvuKZv6sthM4
Air-Pollution Level Estimation (PM2.5) from Weather Conditions
Estimate or predict PM2.5 concentration (fine particulate matter in micrograms per cubic meter) based on weather and time-related features.
PM2.5 is a critical indicator of air quality and public health.
Accurate predictions help in issuing early warnings, health advisories, and urban planning.
Shows how machine learning + environmental data can drive real-world impact.
| Column | Description |
| --------- | ------------------------------------------------------ |
| No | Row index (1 to N) |
| year | Year of measurement (2010–2014) |
| month | Month of measurement (1–12) |
| day | Day of the month (1–31) |
| hour | Hour of the day (0–23) |
| pm2.5 | PM2.5 concentration (µg/m³); **target variable** |
| DEWP | Dew Point temperature (°C) |
| TEMP | Ambient air temperature (°C) |
| PRES | Atmospheric pressure (hPa) |
| cbwd | Combined wind direction (categorical: e.g. NE, NW, SE, cv) |
| Iws | Cumulative wind speed (m/s) |
| Is | Cumulative hours of snow |
| Ir | Cumulative hours of rain |
"""
import pandas as pd, numpy as np, joblib, matplotlib.pyplot as plt, seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
# 1. Load & clean data
df = pd.read_csv("D://scikit_data\global/beijing_pm25.csv") # path to the file you saved
df = df[df["pm2.5"].notna()] # drop rows with missing target
df.isnull().sum()
df.head()
# Combine Y-M-D-h into a Datetime index (handy, but not mandatory)
#This helps us understand when each pollution reading was taken.
#We make this datetime the index of our data for easier time-based handling
df["datetime"] = pd.to_datetime(df[["year", "month", "day", "hour"]])
df.set_index("datetime", inplace=True)
df.head()
# 2. Minimal feature engineering
# Now that we have a full datetime, we extract:
# hour of day (e.g., 11 AM) month (e.g., January)
# Because air pollution often changes with time of day or season.
df["hour"] = df.index.hour
df["month"] = df.index.month
df.head()
FEATURES = ["DEWP", "TEMP", "PRES", "Iws", "Is", "Ir", "hour", "month"] # Indedpend
TARGET = "pm2.5" # Depended
X = df[FEATURES]
y = df[TARGET]
#X.head()
y.head()
# 3. Train / test split + standardisation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
joblib.dump(scaler, "scaler.joblib") # keep for later inference
# 4. Build & train the ANN
# optimizer="adam": helps adjust the model during training.
# loss="mse": we use Mean Squared Error to measure how far predictions are from true values.
model = Sequential([
Dense(64, activation="relu", input_shape=(X_train_scaled.shape[1],)),
Dense(32, activation="relu"),
Dense(1) # linear output for regression
])
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
history = model.fit(
X_train_scaled, y_train,
validation_split=0.1,
epochs=50,
batch_size=256,
callbacks=[EarlyStopping(patience=5, restore_best_weights=True)],
verbose=1
)
# validation_split=0.1 --> This tells the model to use 10% of the training data for validation.
#10% is used to validate how well the model is doing after each epoch
#It helps detect overfitting — if your model is memorizing the training data instead of learning to generalize.
# batch_size=256
# Instead of training on the entire dataset at once the model processes 256 samples at a time.,This is called a batch.
#Training with batches:Reduces memory usage.Speeds up training.
# Adds randomness that helps prevent overfitting.
#callbacks=[EarlyStopping()]
#This is a special rule to stop training early if the model stops improving.
#patience=5: If validation loss does not improve for 5 epochs in a row, stop training.
#restore_best_weights=True: After stopping, restore the model weights from the epoch when validation
#loss was lowest (not from the last epoch).
# 5. Evaluate
# MAE = average error
# RMSE = root mean squared error (penalizes bigger mistakes)
# R2 = how much of the data variance our model explains
y_pred = model.predict(X_test_scaled).flatten()
print("\nTest-set metrics")
print(f" MAE : {mean_absolute_error(y_test, y_pred):.2f} µg/m3") # Its m qube
print(f" RMSE : {np.sqrt(mean_squared_error(y_test, y_pred)):.2f} µg/m3") # Its m qube
print(f" R2 : {r2_score(y_test, y_pred):.3f}")
# Training loss curve
plt.figure(figsize=(6,4))
plt.plot(history.history["loss"], label="Train")
plt.plot(history.history["val_loss"], label="Val")
plt.xlabel("Epoch"); plt.ylabel("MSE"); plt.title("Training Loss");
plt.legend()
plt.grid(True);
plt.tight_layout();
plt.show()
# Actual vs Predicted scatter
plt.figure(figsize=(6,6))
sns.scatterplot(x=y_test, y=y_pred, alpha=0.3, color="blue")
plt.plot([0,600], [0,600], color="darkorange")
plt.xlabel("Actual PM2.5 (µg/m³)");
plt.ylabel("Predicted PM2.5 (µg/m³)")
plt.title("Actual vs Predicted PM2.5");
plt.grid(True);
plt.tight_layout();
plt.show()
# Save model
model.save("pm25_ann.h5")
joblib.dump(FEATURES, "feature_order.joblib")
# --------------------------------------------------------------
# 6. Simple console inference
# --------------------------------------------------------------
print("\n=== Quick PM2.5 Estimator ===")
new_vals = {}
for feat in FEATURES:
new_vals[feat] = float(input(f"Enter {feat}: "))
row_df = pd.DataFrame([new_vals])[FEATURES]
row_scaled = scaler.transform(row_df)
pm25_est = model.predict(row_scaled)[0][0]
print(f"\n Estimated PM2.5 concentration: {pm25_est:.1f} µg/m3\n") # Its m qube
Did we exceed your expectations?
If Yes, share your valuable feedback on Google

