Solved: how to scrape multiple pages using selenium in Python

The main problem with scraping multiple pages using Selenium in Python is that it can be time-consuming and difficult to get the results you want.

I am trying to scrape multiple pages using selenium in python. I have tried the following code but it only scrapes the first page. How can I make it scrape all the pages?

from selenium import webdriver
from bs4 import BeautifulSoup as soup
import pandas as pd

my_url = 'https://www.flipkart.com/search?q=iphone&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'

#Opening up connection and grabbing the page
uClient = uReq(my_url) #uReq opens up a connection and grabs the page
page_html = uClient.read() #reads the html code for the page and stores it in a variable called "page_html"
uClient.close() #closes the connection to save resources

driver = webdriver.Chrome("C:/Users/user/Downloads/chromedriver") #opens up chrome driver, replace with your own path to chromedriver if needed

driver.get(my_url) #navigates to my_url

content = driver.page_source #gets html from current page that is open in chrome driver

soup1 = BeautifulSoup(content) #creates a beautiful soup object from html on current page open in chrome driver

for a in soup1.findAll('a',href=True, attrs={'class':'Zhf2z-'}): #finds all links with class "Zhf2z-" which are product links on flipkart search results pages, you can inspect element on any of these pages to find this out yourself or change this line of code according to your needs
name=a['title'] #extracts product name from title attribute of link tag <a> which is class "Zhf2z-"
price=a['aria-label'] #extracts product price from aria-label attribute of link tag <a> which is class "Zhf2z-"
products[name]=price #adds extracted data into dictionary called products with key as product name and value as product price

df = pd.DataFrame(list(products.items()),columns = ['Product Name','Product Price']) ##converts dictionary into dataframe for easy manipulation

df['Product Price']=[x[1:] for x in df['Product Price']] ##removes Rs sign from beginning of prices using list comprehension

df['Product Price']=[x[:-3] for x in df['Product Price']] ##removes , from prices using list comprehension

print (df) ##prints final dataframe containing all extracted data `enter code here` `enter code here` `enter code here` `enter code here` `enter code here` `enter code here` `enter code here` `enter code here` enter image description here`` enter image description here`` enter image description here`` enter image description her`` enter image description her`` enter image description her`` enter image description he `` enter image description he `` enter image description he `` Enter text after clickin`````````````````````````````````````````````.click() button on pagination at bottom right side of search results page (refer screenshot 2). This will load more products into view (refer screenshot 3). You can repeat above steps till you reach last page by changing number at end of url in my_url variable accordingly i.e., if you want to scrape second last page then change my_url='https://www.flipkart.com/search?q=iphone&amp;otracker=search&amp;otracker1=search&amp;marketplace=FLIPKART&amp;as-show=on&amp;as=off&amp;page='+str(totalPages-1) where totalPages is total number of pages that show up when you scroll down till end on any search results page (refer screenshot 4). You can find out total number of pages programmatically by inspecting element or manually like this - when you scroll down till end, count number of pagination buttons that show up at bottom right side (there are 40 buttons corresponding to 40 pages so if there are n buttons then there are n pages). Hope this helps! ๐Ÿ™‚ Screenshot 1: Screenshot 2: Screenshot 3: Screenshot 4: <code>Enter text after clicking .click() button on pagination at bottom right side of search results page (refer screenshot 2). This will load more products into view (refer screenshot 3). You can repeat above steps till you reach last page by changing number at end of url in my_url variable accordingly i.e., if you want to scrape second last page then change my_url='https://www.flipkart.com/search?q=iphone&amp;otracker=search&amp;otracker1=search&amp;marketplace=FLIPKART&amp;as-show</code>=on&amp;as=off&amp;page='+str(totalPages-1) where totalPages is total number of pages that show up when you scroll down till end on any search results page (refer screenshot 4). You can find out total number of pages programmatically by inspecting element or manually like this - when you scroll down till end, count number of pagination buttons that show up at bottom right side (there are 40 buttons corresponding to 40 pages so if there are n buttons then there are n pages). Hope this helps! ๐Ÿ™‚ Screenshot 1: Screenshot 2: Screenshot 3: Screenshot 4:

This code is trying to scrape multiple pages on the website Flipkart.com. The first line imports the selenium webdriver, which allows you to automate tasks on a website. The second line imports BeautifulSoup, which is a Python library for parsing HTML and XML documents. The third line imports pandas, which is a Python library for data analysis.

The next block of code opens up a connection to the Flipkart website and grabs the HTML code for the page. It then parses the HTML code using BeautifulSoup and extracts all links with the class “Zhf2z-“. These links correspond to products on the Flipkart website. For each link, it extracts the product name and price from the title and aria-label attributes respectively. It then adds this data to a dictionary called products, with the product name as the key and the product price as the value.

Finally, it converts the dictionary into a pandas dataframe (df) for easy manipulation. It removes the Rs sign from all prices and also removes commas from all prices. It then prints out the final dataframe containing all extracted data.

Scraping

In Python, scraping is the process of extracting data from a web page or document using a script. Scraping can be done manually by using a web browser, or it can be done using a Python script.

Best scrapers

There are many scrapers in Python, but some of the best ones include:

1. scrapy
2. BeautifulSoup
3. requests
4. matplotlib

Selenium

Selenium is a web browser automation tool. It can be used to automate tasks such as testing websites, capturing screenshots, and logging user interactions. Selenium can also be used to create test scripts and perform automated tests.

Related posts:

Leave a Comment