State Persistence

Last modified by Danniar Firdausy on 2024/09/19 10:18

In this microlearning session, we will introduce you to the key concept of state persistence and provide instructions on setting it up in your landscape to support the implementation of the State Generation solution. In case you want to learn the basics of the State Generation functionality, please check out eMagiz State Generation.

Should you have any questions, please get in touch with academy@emagiz.com.

1. Prerequisites

  • Basic knowledge of the eMagiz platform
  • Basic knowledge of the eMagiz State Generation

2. Key concepts

In this microlearning, we will introduce you the concept of state persistence and provide instructions on setting it up in your landscape to support the implementation of the State Generation solution. 

  • By state, we mean: a piece of information that describes a particular system or process at a specific point in time. 
  • By state persistence, we mean: a method of storing state data at a specific point in time so that it can be retained over a long period, even if the runtime storing that data shutsdown or restarts. This ensures the state data remains available for future use.

3. State Persistence

The state generation solution works by storing state data at a specific point in time to be used in a process at another point in time. Therefore, to set it up, you need to think about how will you store these state data. eMagiz provides a set of components (in the form of support objects and flow components) that you can use to set up the storage mechanism for your state data. Additionally, when configuring these components, your requirements for persisting these state data also come into play. You can store your state data either in the memory or disk of your runtimes. In general, you can consider storing your state data in the disk of your runtimes when you want these data to be persisted upon shutdowns or restarts. In the following sections, we will discuss the components that eMagiz provides to set up the storage mechanism for your state data.

3.1 Infinispan Cache Manager

Let us first look at the Infinispan cache manager. This support object lets you configure the node (i.e., metadata store or message store) that will store your state data to run stand-alone or join a cluster. You can add this support object in the standard manner by searching for "infinispan cache manager".
 
intermediate-state-generation-infinispan-cache-manager.png

If you want this cache manager node to run stand-alone (i.e., not forming a cluster with other Cache Manager), then you can leave the Cluster name empty. Otherwise, if you want to make this cache manager to form and join a cluster with other nodes under the same cluster name, then you can fill in the cluster name, e.g., sttgnrt-cluster.

3.2 Infinispan Metadata Store

Now that we have the cache manager, we can configure the next support object on our list, the Infinispan metadata store. You can add this support object in the standard manner by searching for "infinispan metadata store". This support object acts as the storage that stores your state data in the form of {"key":"value"} pairs.

The Infinispan Metadata Store, along with the Infinispan Cache Manager, will be needed for the enrichment, change detection, and duplicate detection state generation solution.

intermediate-state-generation-infinispan-metadata-store.png

Here, you need to provide it with a unique cache name, e.g., sttgnrt-cache. Therefore, if you ever require another metadata store (or message store, which will be discussed in the following section), then make sure they do not reuse the same cache name. Next to that, you can select whether this cache is set up as a simple cache or not, as well as whether this cache will use persistent storage or not.

As a rule of thumb, when you run this setup on a single runtime, you can configure it as a simple cache. This means:

  • The state data you store will only be kept in memory.
  • If the runtime or machine is restarted, the stored state data will be lost.

If you have a requirement to avoid this situation, or if you run this setup on multiple runtimes, you can set the simple cache option to "no". This means:

  • It allows you to enable the persistence storage option.
  • State data will also be persisted to disk, albeit at the expense of performance.

Another configuration that you need to set is the Lifespan duration that you can find under the Advanced tab, which specifies how long, in milliseconds, the states will be retained in the metadata store before they are marked as expired and removed from the cache.

The default value is -1, which means that no states data will be discarded. And thus, if this configuration is not specified when simple cache is set to "yes" and there has been no runtime restart, then these states data will overload the memory of the runtime. If the simple cache is set to "no" and this configuration is not specified, then these states data will overload the storage of the runtime. 

When you are done, link this metadata store to the cache manager you have created.

3.2 Infinispan Message Store

Another storage that eMagiz provides is the Infinispan message store. In contrast to the metadata store, the message store receive and store the full payload of your message. Again, you can add this support object in the standard manner by searching for "infinispan message store".

The Infinispan Message Store, along with the Infinispan Cache Manager, will be needed for the aggregation state generation solution.

intermediate-state-generation-infinispan-message-store.png

Similar to the metadata store, you need to provide a unique cache name here. As discussed above, if you are adding a message store to a flow and you already have a metadata store running in the same runtime container, then you need to make sure that they do not reuse the same cache name. Besides that, the rest of the configuration resembles to what the metadata store has, e.g., whether this cache is set up as a simple cache or not as well as whether this cache will use persistent storage or not. And therefore, when you are done, link this message store to the cache manager you have created.

3.3 Metadata Outbound Channel Adapter

Once you have set up the Infinispan Metadata Store, eMagiz provides you with a flow component that you can configure called the Metadata outbound channel adapter. This component is located under the "Outbound Channel Adapters" category. After assigning it a name, you can specify the operation to be performed for each received message by this component. In principle, this component stores messages into the Infinispan metadata store in the form of {"key":"value"} pairs format. Therefore, it will execute the selected operation (i.e., Overwrite, Update, or Remove) on each incoming key-value pair. For more details on what each operation entails, please refer to the provided help text in the component.
 
intermediate-state-generation-enrichment-metadata-outbound-channel-adapter.png

Next to that, you need to specify the values to be assigned into the key-value pairs by means of SpEL expression. Note that the values for the key-value pairs cannot be null. Therefore, the best practice is to assign a fallback value using the Elvis Operator in case the values for these pairs are empty or null. In the example above, "fallbackKey" and "fallbackValue" are used as the fallback values, adapt these values to your use case. When you are done, link this component to the Infinispan metadata store you created.

This flow component is needed for the enrichment and change detection state generation solution to store the state data into your Infinispan metadata store. For the duplicate detection solution this flow component is not needed, since this solution uses the Duplicate detector support object, which already provide a reference to a Infinispan metadata store.

4. Key takeaways

  • State Persistence refers to a method of storing state data at a specific point in time to retain it over long periods, ensuring data availability even after shutdowns or restarts.
  • The Infinispan metadata store stores state data in key-value pairs, while the Infinispan message store stores the full payload of messages. Both require a reference to a Infinispan cache manager and a proper configuration (e.g,. unique cache names) to avoid conflicts and ensure data persistence.

5. Suggested additional readings

If you are interested in this topic and want more information, please read the help text provided by eMagiz and check out these links: