Building error-resilient robots with checkpoints

When it comes to RPA, the flexibility of automating repetitive tasks that involve accessing multiple applications is a great feature, but it also brings some challenges to the table. Let’s say, for example, that you add a new ticket into ServiceNow (or another IT Service Management System) and then want to insert the ticket number into a report file. However, the last step fails. At this point, if you have a retry mechanism in place, the whole execution will restart, which means duplicating the created ticket in ServiceNow. This is of course just a basic example, but in real world applications the impact can be much higher, across multiple systems that may even affect end-customers.

In this post, we’re going to propose a checkpoint-based approach that avoids these pitfalls, and even brings some extra performance and reporting features to the table.

What is a checkpoint in RPA?

You may be familiar with the checkpoint concept from gaming, where after finishing a level or even a more difficult sequence in a level, your game progress is saved automatically so you do not have to repeat that same part if you fail at a later stage. Similarly, in RPA, a checkpoint represents a point in the execution flow at which the state of the system is known.

In our ticketing example, a checkpoint might be defined right after the ticket is successfully created in ServiceNow. If a subsequent step fails and a retry mechanism is triggered, the checkpoint will ensure that the robot does not repeat the entire ServiceNow sequence, thus avoiding the generation of duplicate tickets.

How can we implement such checkpoints?

First let’s look at the structure of a checkpoint, which may consist of two parts:

  1. The Gate: we define this as a Boolean variable which, as the name indicates, acts like a gate: once a robot completes a certain part of the process, it will close the gate (i.e. set the Gate_NameOfSubprocess variable to “false”). That way, if a retry is triggered, the robot will know not to execute that subprocess again because the gate is closed.

  2. The Data: this refers to the output data of the subprocess which will be used in the following subprocesses and therefore needs to be saved. This covers any type of data you need, ranging from primitive types such as integers or strings to collections such as dictionaries or tables.

Checkpoints

As depicted in the diagram, when a gate is already closed, the robot can continue with the following sequence in the flow by picking up the needed variables from the (previously saved) checkpoint data.

Case study: Ticketing in ServiceNow

Let’s analyze the case we introduced at the beginning of the article: We must read an email using Outlook and then extract ticket details out of it, such as the user who sent it, the subject or whether it was marked as a high priority or not. Once we have the data, we want to create the ticket in ServiceNow and then add the case details into a report file where the responsible users can have an overview of the robot’s activity.

We have two places in this flow where checkpoints would be helpful:

  • one checkpoint to avoid data extraction reprocessing from email (the most time-consuming part of the process)

  • one checkpoint to avoid the creation of a duplicate ticket in ServiceNow

The process diagram will look like this:

Where do we store checkpoint data?

When it comes to the technical implementation, this of course depends on the RPA platform you are using. Generally, you have two options:

  • In-memory: you can simply use variables (a dictionary is most practical) to store the data for the execution of a single run

  • Persisted: an Excel spreadsheet stored on the robot’s C: drive, a network drive or a SharePoint location would allow you to access a checkpoint across multiple runs of the same process. This can help you in case of a server restart, for example.

Example of checkpoint data stored in an Excel file

Added benefits of using checkpoints

Besides the main benefit of avoiding data inconsistencies, using checkpoints can be helpful in other ways too:

  • Flows that need to handle huge amounts of transactions: e.g., one of your application servers gets restarted and you want to continue where you left off after it’s back online.

  • When a transaction needs to be reverted due to an error: a checkpoint can tell you what data has been created/modified so that in case of a blocking error, a rollback can be performed.

  • Enhanced error logging: you can log transaction data from the relevant checkpoints in case of an error, helping you easily reproduce the scenario without the need to dig for test data.

  • Reporting: the checkpoint usually contains the most relevant data that can be used to send reports to the responsible business user.

  • Performance improvement: having multiple checkpoints that are passed means that the execution is significantly shorter in the event of a retry, since the gates are closed and the corresponding subprocesses are skipped.

  • Cost reduction for AI components: if your flow uses AI SaaS services, you most likely have a pay-per-use pricing model. A checkpoint right after the AI model returns the results would prevent the same request from being sent more than once.

Checkpoints are just one of the strategies we use to build bots that respond to errors in a controlled and efficient way. Be sure to keep an eye on our LinkedIn page for new articles on how we do that.

Zurück
Zurück

How to find the right automation opportunities?

Weiter
Weiter

How to deploy an attended UiPath robot?