Quick Tip: Headless Web Scraping

  • 26 August 2020
  • 0 replies
  • 424 views

Userlevel 7
Badge +10

In this session, we’ll look at 2 approaches for extracting text from a web application - including a unique application of the REST Web Services package to perform browser-less web scraping.

Video Recap:

  1. Recorder
    1. Use the recorder in conjunction with the Automation Anywhere Chrome Extension to highlight object on a webpage and extract them.
    2. The recorder action does require an established browser session to interact with.
    3. Recorder can be a great option for use cases where some navigation through multiple pages is needed or objects on the page are being dynamically loaded at runtime.
  2. REST Get
    1. Use the REST Get method to read the full page's HTML without the need for a browser - returned as a dictionary - where the Body is the dictionary key to the full HTML text.
    2. The String Package can be subsequently used for extracting specific text from the REST response.

Bonus Tip

When using the REST Get method to return the full HTML of a page or when using the Recorder Capture action to return the innerHTML of an object - consider pairing those with the String Split action. String split will allow you to turn what could be repeating HTML elements into a list, which could be iterated through to extract out the contents from repeating divs made to look like a table or bootstrap style cards which repeat across a page.

Be sure to stay tuned for more Quick Tips!


0 replies

Be the first to reply!

Reply