Considering the impact of message size

Last modified by Erik Bakker on 2024/09/03 08:57

In this microlearning, we’ll dive into the impact of message size on your integration solutions. Understanding how the size of your messages can affect system performance is crucial for designing a robust integration model. We’ll explore key considerations, such as average and peak message sizes, and how these factors influence your system’s stability. We’ll also discuss strategies for managing large messages and the importance of early validation and performance testing. Let’s get started on how to effectively handle message size in your integration solutions.

Should you have any questions, please get in touch with academy@emagiz.com.

1. Prerequisites

  • Advanced knowledge of the eMagiz platform

2. Key concepts

This microlearning centers around considering the impact of message size

Several aspects are relevant when considering message size:

  • What is the average size of the messages?
  • What is the peak size of a message?
  • What do you want to do with the message?
  • Could the delivering system reduce the message size in some way?

3. Considering the impact of message size

In this microlearning, we will consider the importance of determining how large messages need to be exchanged. Having this information early on gives you the ability to model your solution in the best way possible to safeguard (as best as possible) against these messages having a detrimental effect on the stability of the complete integration model. Because of the potentially massive impact of one solution on the rest of your model, it is of utmost importance to consider all aspects.

Several aspects are relevant when considering message size:

  • What is the average size of the messages?
  • What is the peak size of a message?
  • What do you want to do with the message?
  • Could the delivering system reduce the message size in some way?

Each integration pattern has some specific characteristics in terms of size. For example, for Event streaming, messages above 1MB are not accepted on a topic. That is a hard limit. The limits for API Gateway and messaging are not set in stone in the same manner. However, that does not mean you should not consider the implications of sending large messages via any integration pattern. When data flows through a flow in eMagiz (that is running on eMagiz), we used certain assumptions to arrive at a solid and reliable number for the needed memory to run your eMagiz model. These assumptions are as follows:

  • Limited use of 'non-standard' flows. So, are the number of components within a flow comparable to the 'standard' setup (i.e., an autogenerated onramp).
  • Average message size of less than 100Kb
  • Standard number of consumers
  • Use of Java extensions such as Groovy Scripts
  • Limited use of complex transformations
  • Limited use of Xpath and SpEL expression related to large messages

As you can see, the message size plays a factor in this. Furthermore, what you do with the message while processed in eMagiz also plays a role. More information on how to practice these assumptions, validate them against your solution and determine whether memory adjustments are needed can be found in this microlearning.

So let's say you want to process messages with an average size of 50Kb via messaging, and you do not deviate from the standard in eMagiz because you call a REST endpoint that accepts the messages. In those cases, no additional measures need to be taken.

However, when these messages are 250Kb with the same assumptions, you might want a little bit more memory. Note that because we use an average value per flow, it can be that when you have 50 flows running, of which 49 are doing nothing, the problem is not noticed for that one particular flow until it starts transforming messages of 5MB. So the flip side is also true that when you only have two or three flows in your complete solution, the impact of one deviation from the standard is felt way sooner.

Regardless of whether you feel the pain directly, it is still relevant and necessary to register the deviations in the Capture phase of eMagiz. Here you have the option to define the expected average size of messages, and it allows you to write down the specific requirements of that particular integration. Having that information stored is beneficial not only for now but also for the future.

Let us return to the examples for a minute and see what happens when we send large XML files that include pictures stored as base64. As a result, we sometimes see messages up to 5-10MB. Does it mean we cannot handle them? No, things are not that simple. It would be best to consider whether sending the pictures as a separate message can alleviate the problem for these kinds of situations. This can reduce the load on the system. Furthermore, when the only goal of the integration is to transport the messages from A to B, eMagiz does not need to take the XML in memory and, as a result, will be way less impacted by the messages that are interchanged between systems.

As we said at the beginning, the intricate play between message size and what you want to do with the messages is at the heart of determining what the impact is on the eMagiz solution and whether what you envisioned in your mind will work without making manual changes to the memory settings in eMagiz. So we cannot give you clear-cut rules and guidelines and say that when you do A and B, it will work (or vice versa).

Hopefully, this microlearning has given you a bit of insight into adequately considering the impact of your choices on the stability of your complete integration solution when running on Production. We do want to leave you with some general best practices which are always relevant but especially when dealing with processes that are heavy on memory consumption:

  • Performance tests: Make sure you test on the acceptance environment with a production setup. This is to validate whether your solution holds under the duress of real-life scenarios. One aspect of these performance tests would be a load test to determine the number of messages and the average size of each message that can be handled by the solution built and configured in eMagiz.
  • In case the performance test results in issues, you should get back to the drawing board to see how you can improve the platform's stability.
  • Set up an efficient integration. When dealing with large messages, it's vital that the messages only go through strictly necessary components. So minimalize complex transformations and connections.
  • Investigate the possibilities of using the Event Streaming integration pattern when dealing with high volumes of 'smaller' messages.
  • If you exceed the threshold of 1MB and Event Streaming is no option, we strongly advise using the asynchronous messaging pattern compared to a synchronous alternative if this is possible. As the latter consumes more memory.

4. Key takeaways

  • Validate assumptions on message size and what needs to happen with the data early on
  • If the assumptions of eMagiz are violated perform your own memory calculations to see what you need
  • Always test rigourously on Acceptance especially when dealing with processes that are heavy on memory consumption
  • Documentation is key

5. Suggested Additional Readings