Question

Scraping URLs from Google Search Results

7 months ago
August 14, 2024
2 replies
138 views

Niico_MoS
Cadet | Tier 2
3 replies

I’m trying to build a bot in A360 that searches a string in Google and clicks on each of the links that are returned. The issue is that Google randomises the path each time so it’s almost impossible to loop through each of the links using a counter in domx/path. I also looked at scraping URLs from the source code but it’s all embedded in js. Any ideas?

JMarino
Flight Specialist | Tier 4
63 replies
7 months ago
August 15, 2024

Domx path is working consistently for me on google searches. Try capturing the entire box at the top of a search result.

That has a DomX of: //div[@id='rso']/div[3]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]

The first Div increments with each result so I’ve inserted: //div[@id='rso']/div[$nSearchResultsRow.Number:toString$]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]

Make sure the only things you are using for the object properties are the HTML Tag, the DOMX Path, and maybe the HTML HasFrame.

I’ve placed the recorder action in a loop that goes 5 times. I start with the nSearchResultsRow

equal to 3 since that seems to be the first row and then increment from there in the loop.

To get the URL I’m grabbing the “HTML InnerText” property which for example looks something like this:

Best Vegan Chocolates: Ideal for Plant-Based TreatsDallmann Confectionshttps://dallmannconfections.com › collections › vegan-c..

So you would need to use the string tools to isolate the URL out of there!

Since the number of results will vary, you will need to error trap if you run out of rows and need to click the “More Options” button to expand the results, and then the Next button. Note that clicking next probably resets the rows so you’ll need to set variable back to 3 to start scraping again.

Niico_MoS
Author
Cadet | Tier 2
3 replies
7 months ago
August 16, 2024

This is the first thing I tried but the order of results can be random i.e. if you enter a completely different search string, sometimes the domx might be -1 and not necessarily follow an incremental order. I ended up using the headless browser method of REST GET using the URL in URI and string manipulation to capture all the URLs.

Quick Tip: Headless Web Scraping

Thanks for your suggestion.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

Reply

Related topics

trello to todoist zapicon

How do I update Todoist from Trello card updates?icon

Trello to Todoist - when I am added to a cardicon

A new way to use Zapier from your Gmail inbox

How can I use Zapier to consolidate my kids' school info from various platforms into one place?icon

Popular tags

Sign up

Login to the Pathfinder Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings