In this session, we’ll look at some advanced error handling techniques that can help keep your bots up and running even when dependent applications misbehave.
Video Recap
- Global Debug/Log Flag
- Most bots that developers create ultimately end up running "headless" on a virtual machine. ("headless" meaning no one is sitting there watching them once in production). Unfortunately that means that when an error occurs, it's not always easy to know exactly what happened, what the machine state was like, or what was actually on screen. One helpful approach around that is to build your bot in a way that its looking for a global value to indicate its log/debug level. In this case, the bot can be set up to do more or less logging dependent on that global log level.
- This additional logging may include writing out info messages to an audit log (as opposed to only errors) and may also include capturing/saving additional screenshots of the bot execution.
- Retries Left Loop Concept
- When working with an application, bot dependency, or data that can sometimes be prone to error, it's important to understand the root cause of that failure and how a human (or bot) could recover from such an interruption. If the human approach is "well, I just refresh and try it again" - then there is no reason the bot can't do that same thing. Consider surrounding certain functions of the bots code in a "retries left" loop for additional bot resiliency and a reduction in automation failure rates.
- Example: The bot logs into application XYZ and needs to extract a purchase order number from each purchase order. Sometimes application XYZ has load failures when trying to display the new tab for each purchase order - but these load failures are typically resolved by refreshing the page and trying again. A "retries left" loop could be used for each purchase order to check to see if the purchase order details have loaded successfully. If they did, great, move forward with extracting the details and taking other actions. If the purchase order details didn't correctly display - refresh the page, decrement the "retries left" by 1, and try it again up to x number of times (3 was used in the example).
- Be sure to log the outcome of each retry (along with the retry count) in testing and in production use so you can easily tell if the retry approach is actually adding value to the stability of the automation.
- When working with an application, bot dependency, or data that can sometimes be prone to error, it's important to understand the root cause of that failure and how a human (or bot) could recover from such an interruption. If the human approach is "well, I just refresh and try it again" - then there is no reason the bot can't do that same thing. Consider surrounding certain functions of the bots code in a "retries left" loop for additional bot resiliency and a reduction in automation failure rates.
- Enhanced Notifications
- This one is a bit more open ended - so let your creativity fly. How are you notifying stakeholders, support, or developers on bot run errors. Who should be notified for each error type? How are we getting a hold of them?
- Emails could be sent to let business stakeholders know of a bots success or failures.
- Text/Telegraph/WhatsApp messages could be sent to the support team in the case of failures
- Robo-dialed voice calls could be sent out if fatal errors are detected
- Use this one appropriately, I still have nightmares from the 3am calls from the creepy robotic xMatters voice asking for me by name at a previous employer.
- This one is a bit more open ended - so let your creativity fly. How are you notifying stakeholders, support, or developers on bot run errors. Who should be notified for each error type? How are we getting a hold of them?
Resources
Want to learn more about using the Error Handling package? Check out the docs portal for additional details on using this package. If you haven't seen it yet, check out the QuickTip session on Error Handling Basics to get a firm foundation on using the Try, Catch, and Finally actions.