Introduction

Last modified by Erik Bakker on 2024/09/03 09:57

In this introduction to the RCA Knowledge base we will describe a generic process that can be used by a support engineer to better, and hopefully faster, analyze support tickets that are coming in.

Should you have any questions, please get in touch with academy@emagiz.com.

1. Situation

When a ticket comes in it is up to the Support Engineer to determine, once the ticket is accepted, what the cause of the problem described in the ticket. When looking at the context of eMagiz there is a predefined set of steps one can follow to understand the problem better and subsequently arrive at conclusions. In the remainder of this introduction we will describe this step by step approach as a written reminder of how to go about solving a support ticket.

2. Problem

A ticket related to an eMagiz model is registered and needs to be solved.

3. Analysis

3.1 Step by step approach

3.1.1 Question the client

Before you even open eMagiz the first task is to clarify the problem reported by the client. In this discussion, which can happen in person or digitally, the following questions will ensure you learn a lot from the context of the problem. Note that the initial clarification given by the client upon reporting the ticket can be sufficient to answer the questions below.

  • Can you, as the reporter, shed light on what you think the problem is? Is there already information known to the client (i.e. error messages, logging) that will help you understand the problem.
  • Has something changed in the weeks leading to when the problems occurred?
  • Is the problem specific to a single flow (i.e. flow level) or does it appear generic (i.e. runtime level)?
  • Is the flow/runtime running in the eMagiz cloud or on-premise? (or is it a mendix system)
  • In what environment is the problem occurring?
  • Can I get access to the model in which the problem is occuring?

3.1.2 Analyze problem in eMagiz

Now that we have clarity on what we need to investigate and what the apparent problem is it becomes time to log in to eMagiz (my.emagiz.com) and open the model in question. With the information gathered in the first step you navigate to the Manage phase of eMagiz and select the environment in which the problem is occurring (mainly Production). In the Manage phase we have a lot of options at our disposal that will help us zoom in on the problem.

3.1.2.1 Flow level

In case of a flow level problem the following overviews are most interesting when analyzing the ticket:

  • Dashboard
    • In here you can view the error messages on flow level. This way you can easily see whether there are any errors in the reported flow or any flow that is related.
  • Queue/HTTP/Event Streaming Statistics
    • In here you can view (depending on the pattern) whether messages actually ended up in a specific part of the process. When starting at the end of the process you can work your way back to the starting point of the process to see where the message broke down and if it even reached eMagiz in the first place.
  • Logging
    • When the two approaches above don't get the clarity you want it is good to take a look at the logging within eMagiz. In here filter first on type Error and subsequently on type Warning (within the time frame of the reported problem) to see whether any logging can shed a light on the problem.

Should Manage not provide you the answers it becomes time to challenge the assumptions of the client on the problem. This can be done with the help of the Deploy phase (Compare Releases, Start/Stop Flows, Properties) and Create (Flow Designer). In these two phases you need to verify the following.

  • Is the expected flow version running?
  • Is the expected runtime running stable?
  • Is a property value updated?
  • Is there a configuration problem in the current flow version in Create that is also deployed?

With all this information you should be able to solve 99% of all the flow related support tickets.

3.1.2.2 Runtime level

In case of a runtime level problem the following overviews are most interesting when analyzing the ticket:

  • Runtime Statistics
    • In here you can view statistics on runtime level. Here you can see whether a runtime is actually sending metrics and whether it is/was in trouble. This way you can easily see whether the runtime has a configuration or memory problem.
  • Logging
    • Once you have identified that a runtime is/was in trouble you can search for corresponding Error or Warning logs on runtime level. These shed additional light on why the runtime is in problem.

Should Manage not provide you the answers it becomes time to challenge the assumptions of the client on the problem. This can be done with the help of the Deploy phase. In this phase you check on runtime level what the status of the runtime is. You can do this by accessing the context menu on runtime level and selecting the option "Details".

rca-knowledgebase-introduction--details-context-menu-deploy-architecture.png

Selecting this option will lead you to the "Status" tab in which you can see whether the runtime is running (or not) and for how long it is in the current state.

rca-knowledgebase-introduction--details-status-menu-deploy-architecture.png

With all this information you should be able to solve 95% of all the flow related support tickets.

Should you be unable to resolve the problem with only information from eMagiz you can check, assuming you have access, additional information in AWS and Portainer. These are not discussed here any further as they are only accessible for a limited set of users.

4. Result

With the help of this structure and the specified RCA-knowledgebase entries you are able to solve the vast majority of problems on your own or with a little help.

5. Suggested Additional Readings