Skip to main content

how to extract a web page content from below link
amazon bestsellers smartphones
wanted to extract product name, product price, rating for each product, no. of ratings for all the pagination pages
Link: 
https://www.amazon.in/gp/bestsellers/electronics/1389432031/ref=amb_link_1?pf_rd_m=A1VBAL9TL5WCBF&pf_rd_s=merchandised-search-1&pf_rd_r=RW19Z5C6EWWZC047P34S&pf_rd_r=RW19Z5C6EWWZC047P34S&pf_rd_t=101&pf_rd_p=984e7ab0-2561-4b96-bfcf-e92188007f0a&pf_rd_p=984e7ab0-2561-4b96-bfcf-e92188007f0a&pf_rd_i=1389432031
I am unable to find proper x path for it and extracted some phones but including all descripton also but i dont want all
just 1 product name,2. product price, 3. ratig of product, 4. no. of ratings
please create a full workflow so that i can understand what steps should be followed and x paths and all in detail.

Hi ​@Prakash 355 ,

 

Kindly provide some details about your concern.


Want to extract Product Name, Product price, no. of ratings and rating for products
from given amazon webpage all those smartphones data into excel sheet


Want to extract Product Name, Product price, no. of ratings and rating for products
from given amazon webpage all those smartphones data into excel sheet

 

Ok, what is the issue you are facing in doing that?


@Padmakumar As shown in image I have captured each element properly and as below image i entered all generic xpaths for all the elements, but I am getting error at 8 th step (I tried increasing wait time but didnt worked still throws error)

I need to extract this captured data into excel columns shown below.


please assist me what should i make the flow to get work properly.

Thanks,
Prakash Malshetti


@Padmakumar As shown in image I have captured each element properly and as below image i entered all generic xpaths for all the elements, but I am getting error at 8 th step (I tried increasing wait time but didnt worked still throws error)

I need to extract this captured data into excel columns shown below.


please assist me what should i make the flow to get work properly.

Thanks,
Prakash Malshetti

 

Thanks for replying with such details. Working in a dynamic web environment is always tricky. But, you should always keep in mind that selection of object properties that are specific to a particular object within the Web page will always cause issues. 

 

Firstly, here on Line 8, I can see that there are other properties selected other than DOMXPath, which can confuse your bot to identify the object, as the selected properties will change upon a screen refresh or a browser restart. So, avoid such practice and always select the property that we can rely on throughout the execution.

 

Secondly, the best approach to finalize the DOMXPath is to inspect the element within the page and search with the specific DOMXPath that you have mentioned on each capture action. If you are seeing all the required objects highlighted within the page, you can ensure that the bot will also be able to iterate through them when it executes.

 

Let me know if you need more details on this.

 

 


@Padmakumar right, 

I tried disabling all non required things in properties but it wait on that line 8 but it can't extract anything

We not able to see bot extracting or step forward moving. It just wait and throws after waiting time as fix the error. 

After creating generic path I tested in web page inpect, it was showing all the specific property fields for which we extracting

 

Just I am. Having problem to executing at 8th step extraction. 

 

I can provide zip file if you check thoroughly it will help me out with this. 


@Padmakumar right, 

I tried disabling all non required things in properties but it wait on that line 8 but it can't extract anything

We not able to see bot extracting or step forward moving. It just wait and throws after waiting time as fix the error. 

After creating generic path I tested in web page inpect, it was showing all the specific property fields for which we extracting

 

Just I am. Having problem to executing at 8th step extraction. 

 

I can provide zip file if you check thoroughly it will help me out with this. 

 

 

Here is a visual representation of what I meant. I have customized the DOMXPath to get all model names within Page 1. You can see the count also in the screenshot.

 

 

 

Output

 

 

Note: Here I am getting a word "Sponsored” at the top as it is also part of the selected Class. This can be eliminated using a simple String manipulation.


@Padmakumar Thanks for replying. 

I will check this way and let you know if I get any issues. 

 

Thanks, 

Prakash Malshetti