

{"id":110938,"date":"2022-09-13T10:00:15","date_gmt":"2022-09-13T04:30:15","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=110938"},"modified":"2026-04-25T14:19:48","modified_gmt":"2026-04-25T08:49:48","slug":"web-scraping-using-python","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/","title":{"rendered":"Learn Web Scraping using Python"},"content":{"rendered":"<p>Many of us would have come across situations when we need to extract information from websites. We generally copy the required information from the website. What if the data is too large that it&#8217;s hard to copy? This is where web scraping comes into play. And we will be learning to do web scraping using Python. Let\u2019s start with a Web Scraping using Python tutorial.<\/p>\n<h3>What is Web scraping?<\/h3>\n<p>Web scraping is an automated method of extracting data from websites. Wondering if we can copy and paste? Of course, you can, but it will be tiring to separate the data you require from the others, and also, data on the websites is unstructured. And so, we do web scraping that helps to collect this unstructured data and store it in a structured form.<\/p>\n<p>Thinking if it would be legal to scrape from any website? Some websites actually allow web scraping while some don\u2019t. We can check the website\u2019s \u201crobots.txt\u201d file, append \u201c\/robots.txt\u201d to the URL, to know more about this information. that you want to scrape.<\/p>\n<h3>Applications of Web Scraping<\/h3>\n<p>We just talked about how we extract data by scraping. Have you got a doubt about what we will be doing with the data? Here are some applications for you.<\/p>\n<p><strong>1. Price Comparison:<\/strong> This is used for comparing the prices of similar products from different online shopping websites.<\/p>\n<p><strong>2. Gathering emails:<\/strong> Have you all received marketing emails from a website you subscribed to? Do you think these emails are sent individually? Of course, no! Using web scraping, email IDs are collected and then sent bulk emails.<\/p>\n<p><strong>3. Scraping Social Media<\/strong> <strong>Content:<\/strong> Social Media websites are scraped to find out what\u2019s trending.<\/p>\n<p><strong>4. Research and Analysis:<\/strong> A large set of data (Statistics, Reviews, Temperature, etc.) is used for analysis and R&amp;D, for developing a model and testing.<\/p>\n<p><strong>5. Listings:<\/strong> Details of job openings, interviews, etc., are collected from different websites and made available in one place.<\/p>\n<h3>Why is Python Good for Web Scraping?<\/h3>\n<p>We will be using Python to do web scraping, which is very suitable because of the following reasons.<\/p>\n<p><strong>1. Ease of Use:<\/strong> Python is simple to code with easy-to-learn syntax<\/p>\n<p><strong>2. Large Collection of Libraries:<\/strong> It has a huge collection of libraries, which provides methods and services, making the coding task easier. It also has modules specifically for web scraping purposes.<\/p>\n<p><strong>3. Dynamically typed:<\/strong> Python does not require you to define data types for variables. This saves time and makes the job faster.<\/p>\n<p><strong>4. Small code, large task:<\/strong> As said previously, Python has built-in functions that make small code do large tasks.<\/p>\n<p><strong>5. Community:<\/strong> It has a huge community working on improvements and clearing queries. These active communities help you whenever you are struck.<\/p>\n<h3>Libraries for Web Scraping in Python<\/h3>\n<p>Python has libraries explicitly used for the purpose of web scraping. And these libraries come with multiple built-in functions, making scraping easy. These include<\/p>\n<ul>\n<li>Requests<\/li>\n<li>Beautiful Soup<\/li>\n<li>lxml<\/li>\n<li>Selenium<\/li>\n<\/ul>\n<h3>The requests Library in Python<\/h3>\n<p>Requests is a library used for making HTTP requests to a specific URL and getting the response. It also contains some built-in functions for managing both the request and the response.<\/p>\n<p>This library can be installed using the following command:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install requests<\/pre>\n<h4>Making the request<\/h4>\n<p>Once installed, you can get the HTTP request of the required URL using the get() function. It also extracts the information from the server related to the website. Therefore, this object can be used to get information like URL, status, content, etc., as shown in the code below.<\/p>\n<p><strong>Example of making an HTTP request:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\nprint(r)\r\n\r\n#printing the URL\r\nprint(r.url)\r\n\r\n# print the status code\r\nprint(r.status_code)\r\n\r\n# print content of request\r\nprint(r.content)<\/pre>\n<p><strong>Output<\/strong><\/p>\n<h3>Beautiful Soup<\/h3>\n<p>Beautiful Soup is a Python library specifically built for web scraping. It works with a parser to extract data from HTML, to parse, search, and modify by generating a parse tree. It can also turn even invalid markup into a parse tree. However, it cannot request data from web servers in the form of an HTML file, and so we use the requests library.<\/p>\n<p><strong>Fun facts about the Beautiful Soup:<\/strong><\/p>\n<ul>\n<li>It was started by Leonard Richardson in 2004.<\/li>\n<li>It is named after a song sung in Alice\u2019s adventure in Wonderland, referring to its ability to handle messy HTML.<\/li>\n<li>Beautiful Soup 4 is commonly used today for modern projects.<\/li>\n<\/ul>\n<p>This library can be installed using the following command:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install beautifulsoup4<\/pre>\n<h4>Inspecting Website<\/h4>\n<p>Before extracting any information from a website, it is important to understand its structure. It helps in understanding the format of the data inside the website and in extracting the required information. It can be done by right-clicking the mouse on the website and selecting the Inspect option.<\/p>\n<p>After doing this, you get to see the Document Object Model (DOM) of the website, as shown below.<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/image-of-dom-of-a-website.webp\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-111080\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/image-of-dom-of-a-website.webp\" alt=\"image of dom of a website\" width=\"1907\" height=\"934\" \/><\/a><\/p>\n<h3>Parsing HTML using Beautiful Soup in Python<\/h3>\n<p>Now, let\u2019s see how to get the HTML version that forms a website using Beautiful Soup.<\/p>\n<p><strong>Example of parsing HTML using Beautiful Soup:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\n# parsing HTML\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\nprint(soup.prettify())\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<p>Here, we first make a request to the website using the get() function and send the content to the BeautifulSoup() class. The object obtained is what we see in the figure below, showing the HTML format of the page.<\/p>\n<h3>lxml in Python<\/h3>\n<p>lxml is a Python parsing library that works with both HTML and XML. It is a fast, powerful, and easy-to-use library, especially when extracting data from large datasets. However, it is impacted by poorly designed HTML, making its parsing capabilities impaired. Even with this library, we first need to request the HTML form using the requests library. This library can be installed using the command below<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install lxml<\/pre>\n<p>Scraping with lxml works in the following way:<\/p>\n<p>1. We first get requests from the website as discussed above<br \/>\n2. And then using the fromstring() function in the HTML module in the lxml library, we get the tree object as shown below<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">response = requests.get()\r\nfrom lxml import html\r\ntree = html.fromstring(response.text)<\/pre>\n<p>Then we use the xpath() function to give a query to extract the required information from the website<\/p>\n<h3>Selenium in Python<\/h3>\n<p>Nowadays, almost all the websites built are responsive and dynamic. This poses a problem with the Python web scraping libraries like requests. This is where the role of Selenium comes into play, which is an open-source browser automation tool (web driver) that initiates the rendering of web pages, just like any browser.<\/p>\n<p><strong>It mainly requires three components:<\/strong><\/p>\n<ul>\n<li>Web Browser &#8211; Chrome, Edge, Firefox, and Safari<\/li>\n<li><a href=\"https:\/\/pypi.org\/project\/selenium\/\">Driver<\/a> for the browser<\/li>\n<li>The Python selenium package<\/li>\n<\/ul>\n<p>You can install the package using the below command<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install selenium<\/pre>\n<p>Now we start by importing an appropriate class for the browser from the selenium package. Then we create the object of the class, giving the path of the driver executable.<\/p>\n<p>After this, we will use the get() method to load any page in the browser, as shown below.<\/p>\n<p><strong>Example of loading a page using Selenium:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">from selenium.webdriver import Chrome\r\ndriverObj = Chrome(executable_path='\/driver\/path\/on\/your\/device')\r\ndriverObj.get(\u2018https:\/\/data-flair.training\/\u2019)<\/pre>\n<p>Selenium also allows us to use CSS selectors and XPath to extract data from the websites. Let see an example to get all the blog titles using CSS selectors.<\/p>\n<p><strong>Example of getting all the blog titles using CSS selectors:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">blog_titles = driverObj.get_elements_by_css_selector(' h2.blog-card__content-title') \r\nfor title in blog_tiles: \r\n    print(title.text) \r\ndriver.quit() # closing the browser\r\n\r\n<\/pre>\n<p>Besides being able to handle dynamic websites, Selenium makes the web scraping process slow. And the reason is that it must first execute the JavaScript code for each page before making it available for parsing. And because of this, it is ideal for large-scale data extraction.<\/p>\n<h3>Comparison between the three web scraping libraries<\/h3>\n<table>\n<tbody>\n<tr>\n<td><\/td>\n<td><b>Requests<\/b><\/td>\n<td><b>Beautiful Soup<\/b><\/td>\n<td><b>lxml<\/b><\/td>\n<td><b>Selenium<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>Purpose<\/b><\/td>\n<td><span style=\"font-weight: 400\">Making the HTTP requests<\/span><\/td>\n<td><span style=\"font-weight: 400\">Parsing<\/span><\/td>\n<td><span style=\"font-weight: 400\">Parsing<\/span><\/td>\n<td><span style=\"font-weight: 400\">Making HTTP requests\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>Ease-of-use<\/b><\/td>\n<td><span style=\"font-weight: 400\">High<\/span><\/td>\n<td><span style=\"font-weight: 400\">High<\/span><\/td>\n<td><span style=\"font-weight: 400\">Medium<\/span><\/td>\n<td><span style=\"font-weight: 400\">Medium<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>Speed<\/b><\/td>\n<td><span style=\"font-weight: 400\">Fast<\/span><\/td>\n<td><span style=\"font-weight: 400\">Fast<\/span><\/td>\n<td><span style=\"font-weight: 400\">Very fast<\/span><\/td>\n<td><span style=\"font-weight: 400\">Slow<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>Ease of learning<\/b><\/td>\n<td><span style=\"font-weight: 400\">High<\/span><\/td>\n<td><span style=\"font-weight: 400\">High<\/span><\/td>\n<td><span style=\"font-weight: 400\">Medium<\/span><\/td>\n<td><span style=\"font-weight: 400\">Medium<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>Documentation<\/b><\/td>\n<td><span style=\"font-weight: 400\">Very good<\/span><\/td>\n<td><span style=\"font-weight: 400\">Very good<\/span><\/td>\n<td><span style=\"font-weight: 400\">Good<\/span><\/td>\n<td><span style=\"font-weight: 400\">Good<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>JavaScript Support<\/b><\/td>\n<td><span style=\"font-weight: 400\">None<\/span><\/td>\n<td><span style=\"font-weight: 400\">None<\/span><\/td>\n<td><span style=\"font-weight: 400\">None<\/span><\/td>\n<td><span style=\"font-weight: 400\">Yes<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>CPU and Memory Usage<\/b><\/td>\n<td><span style=\"font-weight: 400\">Low<\/span><\/td>\n<td><span style=\"font-weight: 400\">Low<\/span><\/td>\n<td><span style=\"font-weight: 400\">Low<\/span><\/td>\n<td><span style=\"font-weight: 400\">High<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>Size of Project Supported<\/b><\/td>\n<td><span style=\"font-weight: 400\">Large and small<\/span><\/td>\n<td><span style=\"font-weight: 400\">Large and small<\/span><\/td>\n<td><span style=\"font-weight: 400\">Large and small<\/span><\/td>\n<td><span style=\"font-weight: 400\">Small<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Picking a web driver and browser<\/h3>\n<p>Every web scraper needs a browser to connect to the destination URL. We recommend that you use a regular browser (or not a headless one) for testing purposes, especially for newcomers. This would make the troubleshooting and debugging processes simpler by giving a better understanding of the entire process.<\/p>\n<p>On the other hand, headless browsers can be used later on as they are more efficient for complex tasks. Here we will be using the Chrome browser. You can also use Firefox and download the web driver that matches your browser\u2019s version.<\/p>\n<p>You can do this by selecting the requisite package, downloading it, and unzipping it. Then copy the driver\u2019s executable file to any easily accessible directory.<\/p>\n<h3>Selecting an appropriate URL<\/h3>\n<p>Previously, we saw how to inspect a website and get a better understanding of its structure. Here are some more tips for you to help you pick a URL:<\/p>\n<ul>\n<li>It is very important to ensure that you are scraping public data and are not falling into third-party rights issues. You can confirm it with the help of the robots.txt file for guidance.<\/li>\n<li>Avoid data hidden in JavaScript elements, as these sometimes need to be scraped by performing specific actions and require a more sophisticated use of Python and its logic.<\/li>\n<li>Avoid image scraping because you can easily download them directly with Selenium.<\/li>\n<\/ul>\n<h3>Extracting information<\/h3>\n<p>There seems to be a lot of HTML content in the output. Don\u2019t worry, we also have methods to extract useful information, like the title<\/p>\n<p><strong>Example of extracting data from the website:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\n# check status code for response received\r\n# success code - 200\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\n\r\n#Extracting the title\r\npage_title = soup.title.text\r\n\r\n# Extract body of page\r\npage_body = soup.body\r\n\r\n# Extract head of page\r\npage_head = soup.head\r\n\r\n# print the result\r\nprint(page_title)\r\nprint(page_head)\r\nprint(page_body)\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<p>We can also get the tags and details of it. Let\u2019s see how to get the details of the title and its tags.<\/p>\n<p><strong>Example of getting tags and tag information from the website:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\n# check status code for response received\r\n# success code - 200\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\n\r\n# Getting the title tag of the page\r\nprint(soup.title)\r\n \r\n# Getting the name of the title tag\r\nprint(soup.title.name)\r\n \r\n# Getting the name of parent tag of the title\r\nprint(soup.title.parent.name)\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<div class=\"code-output\">&lt;title&gt;Attention Required! | Cloudflare&lt;\/title&gt;<br \/>\ntitle<br \/>\nhead<\/div>\n<h3>Selecting with Beautiful Soup in Python<\/h3>\n<p>Beautiful Soup has a select element that returns the list of elements that we would like to select, like headings, etc.<br \/>\nLet\u2019s see an example of selecting the heading 2\u2019s returning the 2nd one from the obtained list, respectively.<\/p>\n<p><strong>Example of selecting a heading from a website:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/www.amazon.in\/')\r\n\r\n# check status code for response received\r\n# success code - 200\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\n\r\nsecond_head = soup.select('h2')[1].text\r\nprint(second_head)\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<div class=\"code-output\">Big Savings for Everyone<\/div>\n<p>Wondering what if we want to get all the headings or other components? Then, I would like to ask you which construct do we use to access multiple elements or run through different conditions?<\/p>\n<p>Yes, you are correct, the looping statements. Let\u2019s now see how to get all the elements under a tag.<\/p>\n<p><strong>Example of selecting all headings from the website:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\nall_h1_tags = []\r\nfor element in soup.select('h1'):\r\n    all_h1_tags.append(element.text)\r\nprint(all_h1_tags)<\/pre>\n<p><strong>Output<\/strong><\/p>\n<div class=\"code-output\">[&#8216;One more step&#8217;, &#8216;Please turn JavaScript on and reload the page.&#8217;]<\/div>\n<h3>Finding components<\/h3>\n<p>This library also has other methods, like find, which is used for searching the required elements from the class. Let\u2019s dive deeper into it with some examples.<\/p>\n<h3>Finding by class<\/h3>\n<p>Let&#8217;s search based on the class. In the example below, we search for a div tag with the class \u2018a-section\u2019. And then find all the heading 2s from the website and print them.<\/p>\n<p><strong>Example of finding elements by class:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/www.amazon.in\/')\r\n\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\ns = soup.find('div', class_='a-section') \r\ncontent = s.find_all('h2')\r\n \r\nprint(content)\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<h3>Finding by Id<\/h3>\n<p>We know that when we add some components, we add class, id, etc., to add properties and add uniqueness to the elements. As we are done searching based on class, let\u2019s search based on ID. It\u2019s the same as the above example we saw, except that we use the id parameter in the find() function to search.<\/p>\n<p><strong>Example of finding elements by ID:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\n# Parsing the HTML\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\n\r\ns = soup.find('div',id=\"cf-wrapper\")\r\ncontent = s.find_all('p')\r\n\r\nprint(content)\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<h3>Extracting Text from the tags<\/h3>\n<p>If you see the above outputs, we see that tags also got included along with the text. But when it comes to real-life applications, the important part is the content. We will take the above example and see how we remove the tags from it.<\/p>\n<p><strong>Example of getting text from the tags:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\n# Parsing the HTML\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\n\r\ns = soup.find('div',id=\"cf-wrapper\")\r\ncontent = s.find_all('p')\r\n\r\nfor line in content:\r\n    print(line.text)\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<p>Here we find all the paragraphs and then run a loop to print all the found elements. This avoids the appearance of tags in the output.<\/p>\n<h3>Extracting Links<\/h3>\n<p>In many of the cases, links get attached to the content on the website. And this is done using the &lt;a&gt; tag by giving the link to be attached to the \u2018href\u2019 attribute of the tag. This information is what we will use along with the find_all() function to extract the links. Let\u2019s see an example.<\/p>\n<p><strong>Example of getting links from a website:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/data-flair.training\/')\r\n\r\n# Parsing the HTML\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\n\r\nfor link in soup.find_all('a'):\r\n    print(link.get('href'))\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<div class=\"code-output\">https:\/\/www.cloudflare.com\/5xx-error-landing<\/div>\n<h3>Extracting Image Information using Python<\/h3>\n<p>We also have images attached to the website, and extracting these is what we see in the example below.<\/p>\n<p><strong>Example of extracting images from a website:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\n# Making a GET request\r\nr = requests.get('https:\/\/www.amazon.in\/')\r\n\r\nsoup = BeautifulSoup(r.content, 'html.parser')\r\n\r\nimages_list = []\r\n \r\nimages = soup.select('img')\r\nfor image in images:\r\n    src = image.get('src')\r\n    alt = image.get('alt')\r\n    images_list.append({\"src\": src, \"alt\": alt})\r\n     \r\nfor image in images_list:\r\n    print(image)<\/pre>\n<p><strong>Output<\/strong><\/p>\n<p>Here we saw that we select all the \u2018img\u2019 tags from the website and then get the information of \u2018src\u2019 and \u2018alt\u2019 attributes from each of the images and print them.<\/p>\n<h3>Scraping multiple Pages in Python<\/h3>\n<p>Scraping information from various elements on multiple websites can be tedious. Beautiful Soup can also scrape through multiple pages from the same website or from different URLs. We will see both cases.<\/p>\n<p>When we use a website with multiple pages, we take a base URL and run a for loop to go through each of the websites. Let\u2019s say we have a website with 10 pages, and we want to extract information from each page, say the title.<\/p>\n<p><strong>Example of running through multiple pages:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup as bs\r\n\r\nURL = ''\r\n\r\nfor page in range(1, 10):\r\n\r\n    req = requests.get(URL + str(page) + '\/')\r\n    soup = bs(req.text, 'html.parser')\r\n    print(soup.title.text)\r\n<\/pre>\n<p>We can follow the same process to run through multiple URLs by storing them in a list.<\/p>\n<p><strong>Example of running through multiple URLs:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup as bs\r\n\r\nURLs = ['https:\/\/www.amazon.in\/','https:\/\/www.flipkart.com\/']\r\n\r\nfor i in URLs:\r\n\r\n    req = requests.get(i)\r\n    soup = bs(req.text, 'html.parser')\r\n    print(soup.title.text)\r\n<\/pre>\n<p><strong>Output<\/strong><\/p>\n<div class=\"code-output\">Online Shopping site in India: Shop Online for Mobiles, Books, Watches, Shoes and More &#8211; Amazon.in<br \/>\nOnline Shopping Site for Mobiles, Electronics, Furniture, Grocery, Lifestyle, Books &amp; More. Best Offers!<\/div>\n<p>Here we store all the URLs in a list and then run through each URL in the list using a loop to extract the required information from each website.<\/p>\n<h3>Saving Information in CSV<\/h3>\n<p>We can also save the information obtained from the website(s) in our device, for example, in the form of a CSV. In the example below, we run through different URLs and use a for loop to run through each website and get its title. Finally, save the title and the URL in the form of a CSV.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup as bs\r\nimport csv\r\n\r\nURLs = ['https:\/\/www.amazon.in\/gp\/bestsellers\/?ref_=nav_em_cs_bestsellers_0_1_1_2',\r\n       'https:\/\/www.amazon.in\/gp\/new-releases\/?ref_=nav_em_cs_newreleases_0_1_1_3',\r\n       'https:\/\/www.amazon.in\/gp\/movers-and-shakers\/?ref_=nav_em_ms_0_1_1_4',\r\n       'https:\/\/www.amazon.in\/finds?ref_=nav_em_sbc_desktop_foundit_0_1_1_27',\r\n        'https:\/\/www.flipkart.com\/mobile-phones-store?otracker=nmenu_sub_Electronics_0_Mobiles',\r\n        'https:\/\/www.flipkart.com\/laptops-store?otracker=nmenu_sub_Electronics_0_Laptops',\r\n        'https:\/\/www.flipkart.com\/books-store?otracker=nmenu_sub_Sports%2C%20Books%20%26%20More_0_Books',\r\n        'https:\/\/www.flipkart.com\/offers-store?otracker=nmenu_offer-zone'\r\n       ]\r\n\r\ntitles_list = []\r\nfor i in URLs:\r\n\r\n    req = requests.get(i)\r\n    soup = bs(req.text, 'html.parser')\r\n    d = {}\r\n    d['URL'] = i\r\n    d['Title Name'] = soup.title.text\r\n    titles_list.append(d)\r\n\r\nfilename = r'C:\\Users\\Sai Siva Teja\\Downloads\\titles.csv'\r\nwith open(filename, 'w', newline='') as f:\r\n    w = csv.DictWriter(f,['URL','Title Name'])\r\n    w.writeheader()\r\n     \r\n    w.writerows(titles_list)\r\n<\/pre>\n<h3>Python Web Scraping Output<\/h3>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-web-scraping-output.webp\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-111081\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-web-scraping-output.webp\" alt=\"python web scraping output\" width=\"1225\" height=\"671\" \/><\/a><\/p>\n<h3>Conclusion<\/h3>\n<p>Here we are at the end of the article on web scraping with Python. In this article, we were introduced with the concept of web scraping, sending HTTP requests, and extracting website data using Beautiful Soup in Python. Hope you enjoyed this article. Happy learning!<span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:249,&quot;href&quot;:&quot;https:\\\/\\\/pypi.org\\\/project\\\/selenium&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251003151218\\\/https:\\\/\\\/pypi.org\\\/project\\\/selenium\\\/&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-07 20:02:13&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-11 08:43:27&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-03 13:36:18&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-16 19:38:46&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-02 04:31:22&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-17 06:05:25&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-22 06:44:53&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-10 05:10:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-27 21:58:22&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-06 07:37:44&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-05 22:42:22&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-10 06:34:51&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-17 06:03:27&quot;,&quot;http_code&quot;:200}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-17 06:03:27&quot;,&quot;http_code&quot;:200},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many of us would have come across situations when we need to extract information from websites. We generally copy the required information from the website. What if the data is too large that it&#8217;s&#46;&#46;&#46;<\/p>\n","protected":false},"author":5,"featured_media":111082,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46],"tags":[21063,26506,21082,22734,27133,27134,27135,27105],"class_list":["post-110938","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","tag-beginner-python-project","tag-python-easy-project","tag-python-project","tag-python-project-for-beginners","tag-python-web-scraping","tag-web-scraping","tag-web-scraping-code","tag-web-scraping-with-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Learn Web Scraping using Python - DataFlair<\/title>\n<meta name=\"description\" content=\"Learn how to do web scraping with Python. Web scraping is an automated method of extracting data from websites.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Learn Web Scraping using Python - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Learn how to do web scraping with Python. Web scraping is an automated method of extracting data from websites.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-09-13T04:30:15+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-25T08:49:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-project-web-scraping.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Learn Web Scraping using Python - DataFlair","description":"Learn how to do web scraping with Python. Web scraping is an automated method of extracting data from websites.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/","og_locale":"en_US","og_type":"article","og_title":"Learn Web Scraping using Python - DataFlair","og_description":"Learn how to do web scraping with Python. Web scraping is an automated method of extracting data from websites.","og_url":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2022-09-13T04:30:15+00:00","article_modified_time":"2026-04-25T08:49:48+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-project-web-scraping.webp","type":"image\/webp"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823"},"headline":"Learn Web Scraping using Python","datePublished":"2022-09-13T04:30:15+00:00","dateModified":"2026-04-25T08:49:48+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/"},"wordCount":2378,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-project-web-scraping.webp","keywords":["Beginner Python Project","python easy project","Python project","python project for beginners","python web scraping","web scraping","web scraping code","Web Scraping with Python"],"articleSection":["Python Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/","url":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/","name":"Learn Web Scraping using Python - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-project-web-scraping.webp","datePublished":"2022-09-13T04:30:15+00:00","dateModified":"2026-04-25T08:49:48+00:00","description":"Learn how to do web scraping with Python. Web scraping is an automated method of extracting data from websites.","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-project-web-scraping.webp","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2022\/06\/python-project-web-scraping.webp","width":1200,"height":628},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/web-scraping-using-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Python Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/python\/"},{"@type":"ListItem","position":3,"name":"Learn Web Scraping using Python"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team creates expert-level guides on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our goal is to empower learners with easy-to-understand content. Explore our resources for career growth and practical learning.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam1\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/110938","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=110938"}],"version-history":[{"count":9,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/110938\/revisions"}],"predecessor-version":[{"id":147891,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/110938\/revisions\/147891"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/111082"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=110938"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=110938"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=110938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}