how to extract a web page content from below linkamazon bestsellers smartphoneswanted to extract product name, product price, rating for each product, no. of ratings for all the pagination pagesLink: https://www.amazon.in/gp/bestsellers/electronics/1389432031/ref=amb_link_1?pf_rd_m=A1VBAL9TL5WCBF&pf_rd_s=merchandised-search-1&pf_rd_r=RW19Z5C6EWWZC047P34S&pf_rd_r=RW19Z5C6EWWZC047P34S&pf_rd_t=101&pf_rd_p=984e7ab0-2561-4b96-bfcf-e92188007f0a&pf_rd_p=984e7ab0-2561-4b96-bfcf-e92188007f0a&pf_rd_i=1389432031I am unable to find proper x path for it and extracted some phones but including all descripton also but i dont want alljust 1 product name,2. product price, 3. ratig of product, 4. no. of ratingsplease create a full workflow so that i can understand what steps should be followed and x paths and all in detail.

Solved

Web data extraction

Forum|Forum|7 months ago
July 19, 2025
8 replies
117 views

Prakash 355
Navigator | Tier 3

Best answer by Padmakumar

@Padmakumar right,

I tried disabling all non required things in properties but it wait on that line 8 but it can't extract anything

We not able to see bot extracting or step forward moving. It just wait and throws after waiting time as fix the error.

After creating generic path I tested in web page inpect, it was showing all the specific property fields for which we extracting

Just I am. Having problem to executing at 8th step extraction.

I can provide zip file if you check thoroughly it will help me out with this.

Here is a visual representation of what I meant. I have customized the DOMXPath to get all model names within Page 1. You can see the count also in the screenshot.

Output

Note: Here I am getting a word "Sponsored” at the top as it is also part of the selected Class. This can be eliminated using a simple String manipulation.

+15

Padmakumar
Premier Pathfinder | Tier 7
Forum|Forum|7 months ago
July 21, 2025

Hi @Prakash 355 ,

Kindly provide some details about your concern.

Padmakumar

Prakash 355
Author
Navigator | Tier 3
Forum|Forum|7 months ago
July 21, 2025

Want to extract Product Name, Product price, no. of ratings and rating for products
from given amazon webpage all those smartphones data into excel sheet

+15

Padmakumar
Premier Pathfinder | Tier 7
Forum|Forum|7 months ago
July 21, 2025

Want to extract Product Name, Product price, no. of ratings and rating for products
from given amazon webpage all those smartphones data into excel sheet

Ok, what is the issue you are facing in doing that?

Padmakumar

Prakash 355
Author
Navigator | Tier 3
Forum|Forum|7 months ago
July 23, 2025

@Padmakumar As shown in image I have captured each element properly and as below image i entered all generic xpaths for all the elements, but I am getting error at 8 th step (I tried increasing wait time but didnt worked still throws error)

I need to extract this captured data into excel columns shown below.

please assist me what should i make the flow to get work properly.

Thanks,
Prakash Malshetti

+15

Padmakumar
Premier Pathfinder | Tier 7
Forum|Forum|7 months ago
July 23, 2025

please assist me what should i make the flow to get work properly.

Thanks,
Prakash Malshetti

Thanks for replying with such details. Working in a dynamic web environment is always tricky. But, you should always keep in mind that selection of object properties that are specific to a particular object within the Web page will always cause issues.

Firstly, here on Line 8, I can see that there are other properties selected other than DOMXPath, which can confuse your bot to identify the object, as the selected properties will change upon a screen refresh or a browser restart. So, avoid such practice and always select the property that we can rely on throughout the execution.

Secondly, the best approach to finalize the DOMXPath is to inspect the element within the page and search with the specific DOMXPath that you have mentioned on each capture action. If you are seeing all the required objects highlighted within the page, you can ensure that the bot will also be able to iterate through them when it executes.

Let me know if you need more details on this.

Padmakumar

Prakash 355
Author
Navigator | Tier 3
Forum|Forum|7 months ago
July 23, 2025

@Padmakumar right,

I tried disabling all non required things in properties but it wait on that line 8 but it can't extract anything

We not able to see bot extracting or step forward moving. It just wait and throws after waiting time as fix the error.

After creating generic path I tested in web page inpect, it was showing all the specific property fields for which we extracting

Just I am. Having problem to executing at 8th step extraction.

I can provide zip file if you check thoroughly it will help me out with this.

+15

Padmakumar
Premier Pathfinder | Tier 7
Answer
Forum|Forum|7 months ago
July 23, 2025

@Padmakumar right,

I tried disabling all non required things in properties but it wait on that line 8 but it can't extract anything

We not able to see bot extracting or step forward moving. It just wait and throws after waiting time as fix the error.

After creating generic path I tested in web page inpect, it was showing all the specific property fields for which we extracting

Just I am. Having problem to executing at 8th step extraction.

I can provide zip file if you check thoroughly it will help me out with this.

Here is a visual representation of what I meant. I have customized the DOMXPath to get all model names within Page 1. You can see the count also in the screenshot.

Output

Note: Here I am getting a word "Sponsored” at the top as it is also part of the selected Class. This can be eliminated using a simple String manipulation.

Padmakumar

Prakash 355
Author
Navigator | Tier 3
Forum|Forum|7 months ago
July 23, 2025

@Padmakumar Thanks for replying.

I will check this way and let you know if I get any issues.

Thanks,

Prakash Malshetti

Sign up

Login to the Pathfinder Community

Scanning file for viruses.

This file cannot be downloaded