{"id":146663,"date":"2025-08-22T12:03:15","date_gmt":"2025-08-22T06:33:15","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=146663"},"modified":"2025-08-22T12:03:15","modified_gmt":"2025-08-22T06:33:15","slug":"air-pollution-level-estimation-using-ann","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/","title":{"rendered":"Deep Learning Project &#8211; Air Pollution Level Estimation using ANN"},"content":{"rendered":"<h3>Program 1<\/h3>\n<p><a href=\"https:\/\/drive.google.com\/file\/d\/1fgrOsZygBa33JZKY_UKNQ_z1eYRIIo8j\/view?usp=sharing\" target=\"_blank\" rel=\"noopener\"><strong>Air Pollution Dataset<\/strong><\/a><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\"># -*- coding: utf-8 -*-\r\n\"\"\"Air Pollution Level Estimation_ANN.ipynb\r\n\r\nAutomatically generated by Colab.\r\n\r\nOriginal file is located at\r\n    https:\/\/colab.research.google.com\/drive\/1iD91CUNvXYOx4WjCe9LgVvuKZv6sthM4\r\n\r\nAir-Pollution Level Estimation (PM2.5) from Weather Conditions\r\n\r\nEstimate or predict PM2.5 concentration (fine particulate matter in micrograms per cubic meter) based on weather and time-related features.\r\nPM2.5 is a critical indicator of air quality and public health.\r\nAccurate predictions help in issuing early warnings, health advisories, and urban planning.\r\nShows how machine learning + environmental data can drive real-world impact.\r\n\r\n| Column    | Description                                            |\r\n| --------- | ------------------------------------------------------ |\r\n| No    | Row index (1 to N)                                         |\r\n| year  | Year of measurement (2010\u20132014)                            |\r\n| month | Month of measurement (1\u201312)                                |\r\n| day   | Day of the month (1\u201331)                                    |\r\n| hour  | Hour of the day (0\u201323)                                     |\r\n| pm2.5 | PM2.5 concentration (\u00b5g\/m\u00b3); **target variable**           |\r\n| DEWP  | Dew Point temperature (\u00b0C)                                 |\r\n| TEMP  | Ambient air temperature (\u00b0C)                               |\r\n| PRES  | Atmospheric pressure (hPa)                                 |\r\n| cbwd  | Combined wind direction (categorical: e.g. NE, NW, SE, cv) |\r\n| Iws   | Cumulative wind speed (m\/s)                                |\r\n| Is    | Cumulative hours of snow                                   |\r\n| Ir    | Cumulative hours of rain                                   |\r\n\"\"\"\r\n\r\nimport pandas as pd, numpy as np, joblib, matplotlib.pyplot as plt, seaborn as sns\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\r\nfrom tensorflow.keras.models import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.callbacks import EarlyStopping\r\n\r\n# 1. Load &amp; clean data\r\ndf = pd.read_csv(\"D:\/\/scikit_data\\global\/beijing_pm25.csv\")          # path to the file you saved\r\ndf = df[df[\"pm2.5\"].notna()]                 # drop rows with missing target\r\ndf.isnull().sum()\r\ndf.head()\r\n\r\n# Combine Y-M-D-h into a Datetime index (handy, but not mandatory)\r\n#This helps us understand when each pollution reading was taken.\r\n#We make this datetime the index of our data for easier time-based handling\r\ndf[\"datetime\"] = pd.to_datetime(df[[\"year\", \"month\", \"day\", \"hour\"]])\r\ndf.set_index(\"datetime\", inplace=True)\r\ndf.head()\r\n\r\n# 2. Minimal feature engineering\r\n# Now that we have a full datetime, we extract:\r\n# hour of day (e.g., 11 AM) month (e.g., January)\r\n# Because air pollution often changes with time of day or season.\r\n\r\n\r\ndf[\"hour\"]  = df.index.hour\r\ndf[\"month\"] = df.index.month\r\ndf.head()\r\nFEATURES = [\"DEWP\", \"TEMP\", \"PRES\", \"Iws\", \"Is\", \"Ir\", \"hour\", \"month\"] # Indedpend\r\nTARGET   = \"pm2.5\" # Depended\r\n\r\nX = df[FEATURES]\r\ny = df[TARGET]\r\n#X.head()\r\ny.head()\r\n\r\n# 3. Train \/ test split  +  standardisation\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\r\n\r\nscaler = StandardScaler()\r\nX_train_scaled = scaler.fit_transform(X_train)\r\nX_test_scaled  = scaler.transform(X_test)\r\njoblib.dump(scaler, \"scaler.joblib\")   # keep for later inference\r\n\r\n# 4. Build &amp; train the ANN\r\n# optimizer=\"adam\": helps adjust the model during training.\r\n# loss=\"mse\": we use Mean Squared Error to measure how far predictions are from true values.\r\n\r\nmodel = Sequential([\r\n    Dense(64, activation=\"relu\",  input_shape=(X_train_scaled.shape[1],)),\r\n    Dense(32, activation=\"relu\"),\r\n    Dense(1)                        # linear output for regression\r\n])\r\nmodel.compile(optimizer=\"adam\", loss=\"mse\", metrics=[\"mae\"])\r\n\r\nhistory = model.fit(\r\n    X_train_scaled, y_train,\r\n    validation_split=0.1,\r\n    epochs=50,\r\n    batch_size=256,\r\n    callbacks=[EarlyStopping(patience=5, restore_best_weights=True)],\r\n    verbose=1\r\n)\r\n\r\n# validation_split=0.1 --&gt; This tells the model to use 10% of the training data for validation.\r\n#10% is used to validate how well the model is doing after each epoch\r\n#It helps detect overfitting \u2014 if your model is memorizing the training data instead of learning to generalize.\r\n\r\n# batch_size=256\r\n# Instead of training on the entire dataset at once the model processes 256 samples at a time.,This is called a batch.\r\n#Training with batches:Reduces memory usage.Speeds up training.\r\n# Adds randomness that helps prevent overfitting.\r\n#callbacks=[EarlyStopping()]\r\n#This is a special rule to stop training early if the model stops improving.\r\n\r\n#patience=5: If validation loss does not improve for 5 epochs in a row, stop training.\r\n#restore_best_weights=True: After stopping, restore the model weights from the epoch when validation\r\n#loss was lowest (not from the last epoch).\r\n\r\n# 5. Evaluate\r\n# MAE = average error\r\n# RMSE = root mean squared error (penalizes bigger mistakes)\r\n# R2 = how much of the data variance our model explains\r\n\r\ny_pred = model.predict(X_test_scaled).flatten()\r\nprint(\"\\nTest-set metrics\")\r\nprint(f\" MAE   : {mean_absolute_error(y_test, y_pred):.2f} \u00b5g\/m3\") # Its m qube\r\nprint(f\" RMSE  : {np.sqrt(mean_squared_error(y_test, y_pred)):.2f} \u00b5g\/m3\") # Its m qube\r\nprint(f\" R2    : {r2_score(y_test, y_pred):.3f}\")\r\n\r\n# Training loss curve\r\nplt.figure(figsize=(6,4))\r\nplt.plot(history.history[\"loss\"], label=\"Train\")\r\nplt.plot(history.history[\"val_loss\"], label=\"Val\")\r\nplt.xlabel(\"Epoch\"); plt.ylabel(\"MSE\"); plt.title(\"Training Loss\");\r\nplt.legend()\r\nplt.grid(True);\r\nplt.tight_layout();\r\nplt.show()\r\n\r\n# Actual vs Predicted scatter\r\nplt.figure(figsize=(6,6))\r\nsns.scatterplot(x=y_test, y=y_pred, alpha=0.3, color=\"blue\")\r\nplt.plot([0,600], [0,600], color=\"darkorange\")\r\nplt.xlabel(\"Actual PM2.5 (\u00b5g\/m\u00b3)\");\r\nplt.ylabel(\"Predicted PM2.5 (\u00b5g\/m\u00b3)\")\r\nplt.title(\"Actual vs Predicted PM2.5\");\r\nplt.grid(True);\r\nplt.tight_layout();\r\nplt.show()\r\n\r\n# Save model\r\nmodel.save(\"pm25_ann.h5\")\r\njoblib.dump(FEATURES, \"feature_order.joblib\")\r\n\r\n# --------------------------------------------------------------\r\n# 6. Simple console inference\r\n# --------------------------------------------------------------\r\nprint(\"\\n=== Quick PM2.5 Estimator ===\")\r\nnew_vals = {}\r\nfor feat in FEATURES:\r\n    new_vals[feat] = float(input(f\"Enter {feat}: \"))\r\n\r\nrow_df     = pd.DataFrame([new_vals])[FEATURES]\r\nrow_scaled = scaler.transform(row_df)\r\npm25_est   = model.predict(row_scaled)[0][0]\r\n\r\nprint(f\"\\n Estimated PM2.5 concentration: {pm25_est:.1f} \u00b5g\/m3\\n\") # Its m qube\r\n<\/pre>\n<p>&nbsp;<span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:11,&quot;href&quot;:&quot;https:\\\/\\\/drive.google.com\\\/file\\\/d\\\/1fgrOsZygBa33JZKY_UKNQ_z1eYRIIo8j\\\/view?usp=sharing&quot;,&quot;archived_href&quot;:&quot;&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[],&quot;broken&quot;:false,&quot;last_checked&quot;:null,&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Program 1 Air Pollution Dataset # -*- coding: utf-8 -*- &#8220;&#8221;&#8221;Air Pollution Level Estimation_ANN.ipynb Automatically generated by Colab. Original file is located at https:\/\/colab.research.google.com\/drive\/1iD91CUNvXYOx4WjCe9LgVvuKZv6sthM4 Air-Pollution Level Estimation (PM2.5) from Weather Conditions Estimate or predict&#46;&#46;&#46;<\/p>\n","protected":false},"author":581,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[35117,35118,35120,35119,35121,8431,33127,33128,20697],"class_list":["post-146663","post","type-post","status-publish","format-standard","hentry","category-machine-learning","tag-air-pollution-level-estimation","tag-air-pollution-level-estimation-using-ann","tag-air-pollution-level-estimation-using-deep-learning","tag-air-pollution-level-estimation-using-machine-learning","tag-deep-learning-air-pollution-level-estimation-using-ann","tag-machine-learning","tag-machine-learning-practical","tag-machine-learning-program","tag-machine-learning-project"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Deep Learning Project - Air Pollution Level Estimation using ANN - DataFlair<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning Project - Air Pollution Level Estimation using ANN - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Program 1 Air Pollution Dataset # -*- coding: utf-8 -*- &quot;&quot;&quot;Air Pollution Level Estimation_ANN.ipynb Automatically generated by Colab. Original file is located at https:\/\/colab.research.google.com\/drive\/1iD91CUNvXYOx4WjCe9LgVvuKZv6sthM4 Air-Pollution Level Estimation (PM2.5) from Weather Conditions Estimate or predict&#046;&#046;&#046;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-22T06:33:15+00:00\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning Project - Air Pollution Level Estimation using ANN - DataFlair","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/","og_locale":"en_US","og_type":"article","og_title":"Deep Learning Project - Air Pollution Level Estimation using ANN - DataFlair","og_description":"Program 1 Air Pollution Dataset # -*- coding: utf-8 -*- \"\"\"Air Pollution Level Estimation_ANN.ipynb Automatically generated by Colab. Original file is located at https:\/\/colab.research.google.com\/drive\/1iD91CUNvXYOx4WjCe9LgVvuKZv6sthM4 Air-Pollution Level Estimation (PM2.5) from Weather Conditions Estimate or predict&#46;&#46;&#46;","og_url":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2025-08-22T06:33:15+00:00","author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/c187795dc82ab948373cca526df7c445"},"headline":"Deep Learning Project &#8211; Air Pollution Level Estimation using ANN","datePublished":"2025-08-22T06:33:15+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/"},"wordCount":15,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"keywords":["air pollution level estimation","air pollution level estimation using ann","air pollution level estimation using deep learning","air pollution level estimation using machine learning","deep learning air pollution level estimation using ann","machine learning","machine learning practical","machine learning program","machine learning project"],"articleSection":["Machine Learning Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/","url":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/","name":"Deep Learning Project - Air Pollution Level Estimation using ANN - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"datePublished":"2025-08-22T06:33:15+00:00","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/air-pollution-level-estimation-using-ann\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Machine Learning Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/machine-learning\/"},{"@type":"ListItem","position":3,"name":"Deep Learning Project &#8211; Air Pollution Level Estimation using ANN"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/c187795dc82ab948373cca526df7c445","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/2302ebc438084d2f1f993edc1996a0aae01332e81f3227cba8df0c48ec010ca4?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/2302ebc438084d2f1f993edc1996a0aae01332e81f3227cba8df0c48ec010ca4?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2302ebc438084d2f1f993edc1996a0aae01332e81f3227cba8df0c48ec010ca4?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team provides high-impact content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. We make complex concepts easy to grasp, helping learners of all levels succeed in their tech careers.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam6\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/146663","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/581"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=146663"}],"version-history":[{"count":4,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/146663\/revisions"}],"predecessor-version":[{"id":146680,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/146663\/revisions\/146680"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=146663"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=146663"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=146663"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}