Wiki source code of Interpreting Queue Statistics
Last modified by Erik Bakker on 2024/02/20 13:53
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | {{container}}{{container layoutStyle="columns"}}((( | ||
2 | |||
3 | In this microlearning, we will focus on how you can read the information available on queue level for all integrations that use the messaging pattern. | ||
4 | |||
5 | Should you have any questions, please contact [[academy@emagiz.com>>mailto:academy@emagiz.com]]. | ||
6 | |||
7 | == 1. Prerequisites == | ||
8 | * Basic knowledge of the eMagiz platform | ||
9 | |||
10 | == 2. Key concepts == | ||
11 | This microlearning centers around how you can read the information in the queue statistics and what you can learn from it. | ||
12 | By queue statistics we mean: Information on queue level that helps you to interpret the data that is passing on that queue | ||
13 | |||
14 | There are four parts to the queue statistics: | ||
15 | * Total messages in queue | ||
16 | * Total messages added to queue | ||
17 | * Number of consumers | ||
18 | * Data measurements | ||
19 | |||
20 | == 3. Interpreting queue statistics == | ||
21 | |||
22 | In many cases, you want to validate your assumptions by checking the queue statistics. The queue statistics section in eMagiz is divided into four parts: | ||
23 | |||
24 | * Total messages in queue | ||
25 | * Total messages added to queue | ||
26 | * Number of consumers | ||
27 | * Data measurements | ||
28 | |||
29 | For example, when you want to verify how many messages have arrived on a certain queue within a certain time window you can use the queue statistics overview for this. | ||
30 | |||
31 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--complete-overview.png]] | ||
32 | |||
33 | Below we will delve into each of these four parts and explain a bit more about them. | ||
34 | That way you can use the queue statistics to interpret what happened within your messaging flows at any given moment. | ||
35 | To assist you in tracking anomalies in your project you can use eMagiz alerting. To learn more on that please take a look at the microlearning on that subject. | ||
36 | |||
37 | === 3.1 Total messages in queue === | ||
38 | |||
39 | The first metric we are going to look at is the total messages in queue. | ||
40 | As the name implies this metric tells us how many messages currently reside on the queue and have **yet to be** processed. | ||
41 | |||
42 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--total-messages-in-queue-flat-line.png]] | ||
43 | |||
44 | Having a flat-line, such as in the picture above can mean two things: | ||
45 | * The queue can keep up with the supply (i.e. the number of messages added to the queue is **lower** than the number of messages that are processed on the queue) | ||
46 | * There is no supply (i.e. the number of messages added to the queue equals zero) | ||
47 | |||
48 | As you can see from the above interpretation one metric by itself will never paint the whole picture. | ||
49 | Let's continue by looking at a scenario in which the total messages in queue are gradually increasing. | ||
50 | |||
51 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--total-messages-in-queue-increase.png]] | ||
52 | |||
53 | Seeing this behavior can also mean two things: | ||
54 | * The queue is **not** able to keep up with the supply (i.e. the number of messages added to the queue is **higher** than the number of messages that are processed on the queue) | ||
55 | * Nobody is processing the data on the queue (i.e. no consumer wants to consume the data from the queue and process it) | ||
56 | |||
57 | The first of these two potential reasons is most likely temporary in nature. | ||
58 | In case you get a sudden burst of data that is supplied to the queue will need some time to process all data. | ||
59 | In such a scenario you will see the metric increase at first and afterward decrease to zero (flat-line) again. | ||
60 | |||
61 | The second of these two potential reasons could be more structural. | ||
62 | This can happen because the consumer is broken or not active. This will lead to a steady increase in data that won't return to zero without a user action correcting the behavior. | ||
63 | |||
64 | === 3.2 Total messages added to queue === | ||
65 | |||
66 | The second metric we are going to look at is the total messages added to queue. | ||
67 | As the name implies this metric tells us how many messages were supplied to the queue within a given timeframe. | ||
68 | |||
69 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--total-messages-added-to-queue-standard.png]] | ||
70 | |||
71 | In the example shown above, you can see that between minute 24 and minute 46 we have a steady flow of data that is supplied to this queue. | ||
72 | Based on the actual implementation of your flows the pattern of how data is supplied to your queue can differ. | ||
73 | The main expectation is that data is supplied to the queue at some point in time. In some cases, you will expect data almost all the time. | ||
74 | In some cases only between 9 and 17 on weekdays. And in other cases, you only expect data in the evening when nobody will be hindered by it. | ||
75 | |||
76 | If this metric shows that data is supplied to the queue and the total messages in queue metric show a flat-line on zero messages everything works as expected. | ||
77 | When this metric shows that data is supplied to the queue and the total messages in the queue metric show the same **exact** increase you can conclude that no data is being consumed from the queue. | ||
78 | In case the metric shows that data is supplied to the queue and the total messages in queue metric shows a **less steep** increase you can conclude that the queue is **not** able to keep up with demand. | ||
79 | |||
80 | Just as with the total messages in queue the number of total messages added to queue maybe not changed over a period leading to a flat-line graph. | ||
81 | In itself this means nothing. However, if you would have expected messages to be supplied to that queue within that time frame | ||
82 | you need to analyze the complete chain of flows more thoroughly to see what is causing the unexpected behavior. | ||
83 | |||
84 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--total-messages-added-to-queue-flat-line.png]] | ||
85 | |||
86 | |||
87 | === 3.3 Number of consumers === | ||
88 | |||
89 | The third metric we are going to look at is the number of consumers. | ||
90 | This metric tells us how many consumers are activated to consume data from that specific queue. | ||
91 | |||
92 | In most asynchronous (messaging and event streaming) flows the expected number is 1 or 2 consumers. In API Gateway exit gates the number can vary between 1 and 5 depending on the supply on the queue. | ||
93 | Knowing the expected number of consumers is crucial for a good interpretation of what the reported number of consumers tells you. | ||
94 | |||
95 | Let's assume for the sake of this microlearning that the expected number of consumers on my queue is 1. | ||
96 | |||
97 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--number-of-consumers-expected.png]] | ||
98 | |||
99 | With that in mind, the number of consumers that is reported can either be 1 (i.e what we expect), 0 (i.e. too low), or 2 and higher (i.e. too much). | ||
100 | |||
101 | If the reported number of consumers equals the expected number of consumers data will be processed as expected. | ||
102 | This means that one message per consumer is processed at any given moment assuming that a message is available to be processed. | ||
103 | |||
104 | However, if the number of consumers is too low or too high you need to analyze the situation. | ||
105 | |||
106 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--number-of-consumers-too-low.png]] | ||
107 | |||
108 | As shown above, you can see that the number of consumers drops from 1 to 0 at a certain point in time. This could be caused by: | ||
109 | |||
110 | * A user stopped the queue ensuring no data would be processed anymore | ||
111 | * The runtime on which the queue is running is stopped (either by the user or by an automatic process) | ||
112 | * The complete eMagiz project is not running at the moment (either due to a user action or by an automatic process) | ||
113 | |||
114 | Whatever the reason maybe you should investigate what the cause is to keep your environment running stably. | ||
115 | |||
116 | Apart from the consumer count being too low, it can also happen that the consumer count is too high. This could be caused by: | ||
117 | |||
118 | * Incorrect configuration of the flow by a user that leads to an unexpected number of consumers | ||
119 | * Linking both a Test, Acceptance, and Production system to the same eMagiz environment | ||
120 | * Fringe situations that appear once in a while | ||
121 | |||
122 | === 3.4 Data measurements === | ||
123 | |||
124 | The fourth and last metric tells us whether metrics (i.e. data measurements) are coming in. | ||
125 | |||
126 | [[image:Main.Images.Microlearning.WebHome@crashcourse-messaging-interpreting-queue-statistics--data-measurements.png]] | ||
127 | |||
128 | With the help of this metric, we can establish if the queue in question is sending data measurements to the portal or not. | ||
129 | As you can see from the example shown above the queue was not activated until somewhere after the 14th minute of the hour. | ||
130 | This is entirely consistent with the behavior of the other graphs. | ||
131 | |||
132 | == 4. Key takeaways == | ||
133 | |||
134 | * There are four parts to the queue statistics: | ||
135 | ** Total messages in queue | ||
136 | ** Total messages added to queue | ||
137 | ** Number of consumers | ||
138 | ** Data measurements | ||
139 | * You can best interpret them together as that approach gives you the most context | ||
140 | * To assist in anomaly detection use the eMagiz alerting | ||
141 | |||
142 | == 5. Suggested Additional Readings == | ||
143 | |||
144 | If you are interested in this topic and want more information on it please read the help text provided by eMagiz when executing these actions.)))((({{toc}}))) |