Abstract Robotic Process Mining (RPM) leverages User Interface (UI) logs as a source of information to analyze the processes which are to be automated. The UI logs keep a record of user interactions with the graphical UI of an information system during the execution of a process, encapsulating a large amount of data. Prior research has proposed methods to interpret the UI logs by exploiting the structured information available on-screen (e.g., the DOM tree of a Web page) which makes the analysts’ interpretation of the processes behind the logs easier. However, in environments where such structured information is not available (e.g., in virtualized environments), understanding user actions and high-level activities via the elements that the users interact with poses a challenge that remains unsolved. This limitation hinders the application of RPM techniques in these environments, thereby requiring human intervention to analyze and understand the actions carried out within these UI logs. To address this challenge, the authors propose a framework that leverages screenshot-based techniques to generate semantic descriptions of user actions and enable us to generate accurate descriptions of high-level activities by solely relying on the information available in the UI logs. In an organizational context, this approach enables RPA analysts and process managers to analyze user interaction logs and improve the understanding of the candidate business processes for automation. We evaluate our approach using a manually-labeled dataset of screenshots from realistic desktop applications. Our results demonstrate that the method can effectively generate semantic descriptions of user actions which, in turn, enable more precise descriptions of the high-level activities carried out by the user.
Rodríguez‐Ruiz et al. (Thu,) studied this question.