Topic and Topic Properties
In this microlearning, we will dive into the essentials of topics and their properties within the eMagiz platform. Here, you'll learn about the key aspects of topic configuration, including retention hours, retention bytes, and partitions. This knowledge is crucial for optimizing your event streaming setup and managing costs effectively.
If you have any questions along the way, feel free to reach out to us at academy@emagiz.com.
1. Prerequisites
- Basic knowledge of the eMagiz platform
- Understanding of the Event Streaming concept
2. Key concepts
This microlearning centers around topics and the properties of these topics.
- With topic, we mean: A category/feed name to which event records are stored and published.
- With topic properties, we mean: A piece of the whole configuration of a topic.
Knowing what topics are and which properties of these topics you can and should think about before creating them is a crucial part of successfully implementing an event streaming pattern via the eMagiz platform.
Regarding the properties of a topic, two crucial properties can significantly impact the cost of your implementation. These properties are:
- Retention Hours
- Retention Bytes
Below we will discuss a topic in more depth and zoom in on the topic properties, especially on the retention hours and the retention bytes.
After configuring your topic(s) the way you had in mind, you can check your work via the Design Architecture overview.
This overview shows if there is enough room available regarding GBs of Topic Storage.
3. Topic and Topic Properties
Within the eMagiz platform, you can use the Event Streaming pattern to help solve your business case. A crucial part of that solution is the topics and the accompanying properties.
In this section, we will discuss the following:
- What are the topics, and how can I use them.
- Topic properties and their configuration.
3.1 What are topics, and how can I use them
Based on the lines you drew in Capture, eMagiz automatically generates a topic. In other words, for each line you drew in Capture, eMagiz will create an accompanying topic.
As a reminder, a topic is a category/feed name to which event records are stored and published.
As said before, all Kafka records are organized into topics. Producer applications write data to topics, and consumer applications read from topics.
Records published to the cluster stay in the cluster until a configurable retention period has passed by.
Within eMagiz, you can use topics to temporarily store data to make sure that consumers can consume the data at a specific moment in time.
Four characteristics related to topics are:
- Publish/subscribe mechanism
- Asynchronous, realtime
- Dumb broker, smart consumer: Each subscriber can read at their own pace
- Retention
3.2 Topic properties and their configuration
Besides naming the topic, eMagiz also provides you with a set of default settings for your topic. You should not touch the default settings for replication of these settings. The default values for this are correct. Three of these settings need a closer look from you. These settings are:
- Retention Hours
- Retention Bytes
- Partitions
These three settings determine the amount of GB in storage necessary on the eMagiz Event Streaming cluster to run all topics.
As you can imagine, the longer you retain data, and the more data you retain, the higher the costs. On top of that, when you increase the partitions, the retained data configured in the retention bytes is duplicated by the number of partitions. So if you have a retention bytes setting of 500MB and the number of partitions is two, then the configured retention on the topic level is based on this simple configuration of 1GB.
You have the option to change these properties per environment. For example, you must navigate Design Architecture and enter the "Start Editing" mode to change the settings. Once in this mode, open the context menu on the Topic storage and select the option Edit storage. This will lead you to the following pop-up.
You can easily change the configuration of the topic for that particular environment by selecting one of the topics with a double click. Then, navigate to the Advanced tab in the following pop-up.
You can change the settings based on the expected throughput and retention time. More information on what the two essential settings mean can be found below.
3.2.1 Retention Hours
Retention Hours is the number of hours data can reside on the topic before a FiFo principle of removing the first entry in the log kicks in. However, the moment data is still on the topic beyond this threshold. Therefore, it will automatically start deleting the data.
The default setting eMagiz provides you is 168 hours (7 days). For your use case, there might be no need to retain the data for such an extensive period. Instead, you only want to retain the data for 72 hours (3 days), for example, because all consumers can pick up data within that timeframe, and all messages older than three days will be obsolete.
3.2.2 Retention Bytes
Retention Bytes is the number of bytes available per partition on that topic before a FiFo principle of removing the first entry in the log kicks in. The moment your topic holds more bytes than the retention byte setting, it will automatically delete the data.
The default setting eMagiz provides you is roughly 500 MB. This might be too low for you if you have millions of messages passing over your topic a day. If so, you need to adjust this setting here. Calculating the correct value is explained in the following two sections.
3.2.3 Partitions
In Kafka, the data is stored in topics. The topic will further be divided into multiple partitions. The actual messages or the data will be held in the partition.
By default, your topic will be configured with one partition. We have chosen this default setting as you can increase the number of partitions on your topic, but you cannot decrease the number of partitions on a topic without deleting the topic (and losing all your data) first. On top of that, in most scenarios, only one consumer is consuming data, so increasing the partitions has only an effect if you want to increase throughput on the consumer side.
3.2.3.1 Consumer Group Consideration
If you have multiple consumer groups (with numerous consumers), you should consider increasing your number of partitions accordingly. A rule of thumb here is that when you have multiple consumers (in one consumer group), you should increase the number of partitions to match this number. This could be the case when you run your event processor in a double lane setup or your Mendix application multi-instance, for example. In those cases, one of the consumers in the group will consume data from partition A whereas the second consumer will consume data from partition B. This way, both consumers can only process half the data while the group can still collect the complete data set.
3.2.4 Calculating Storage Capacity of a Topic
A calculation example of determining the configuration is shown below:
- 100000 messages per day
- 3 days retention
- 5 kB average size of a message placed on the topic
Results in 100000 * 3 * 5 = 1.5 GB in Storage capacity per topic.
3.2.5 Calculating Retention Bytes based on Storage Capacity
Based on the previous calculation, you can calculate the Retention Bytes on your topic.
- 1.5 GB Storage capacity
Based on the chosen setting for the number of partitions, you need to divide this number by the number of partitions to get to the correct value for the retention bytes. This is because the retention bytes are configured on a partition level and not on a topic level. So when we assume the default of one partition as configured in eMagiz, this will lead to the following calculation.
Results in 1.5 / 1 = 1.5 GB in Retention Bytes. As the name indicates, this value needs to be added in bytes. For this example, we end up with 1524288000 bytes. The replication factor will replicate this number to get the correct amount of storage you could theoretically write to our Event Streaming Cluster.
3.3 Check available topic storage
When you are finished configuring the properties per topic, you can validate whether the expected data storage fits within the available amount of topic storage based on your configuration.
You can do this with the help of Design Architecture.
As you can see in the picture above, the Design Architecture represents not only the runtimes needed to run your flows
but also defines the amount of configured and recommended topic storage by eMagiz.
If we zoom in on the right-hand panel, you can see an entry relevant to Event Streaming at the bottom of that panel.
With this entry, you can easily see the amount of GB configured (based on the topic property settings) and see how much GB is still available based on your current contractual agreements.
Exceeding your contractually allowed storage eMagiz will prevent you from deploying new topics.
3.4 Exclude topics per environment
Apart from configuring your topics, including topic properties in Design Architecture, you can exclude a topic from a specific environment. This way, you can run a particular topic on a specific environment or two of the three environments. This is an easy way to dynamically design your Event Streaming solution to fit your needs per environment. On top of that, it also saves storage capacity if you exclude topics from an environment.
Navigate Design Architecture and enter the "Start Editing" mode to exclude a topic. Once in this mode, open the context menu on the Topic storage and select the option Edit storage. This will lead you to the following pop-up.
Here, you have a button called "Toggle Exclude." By selecting a topic and pressing the button, you exclude or include a topic (depending on the current state).
4. Key takeaways
- A topic is a category/feed name to which event records are stored and published:
- Retention on this topic is based on a FIFO principle (start at the beginning of the log)
- The amount of GB needed for your solution is the most significant cost driver
- Topics are automatically generated in eMagiz when you draw the line in Capture
- Think about your retention policy when implementing the Event Streaming solution to get a grip on the cost aspect of Event Streaming
- Check your configuration with the help of Design Architecture to make sure that the configured amount of GB is allowed under your current contract
5. Suggested Additional Readings
If you are interested in this topic and want more information on it, please read the help text provided by eMagiz when executing these actions and browse through the following links: