Wiki source code of Topic and Topic Properties

Last modified by Eva Torken on 2023/05/11 12:58

Show last authors
1 {{container}}{{container layoutStyle="columns"}}(((
2 In this microlearning, we will focus on topics and their properties.
3
4 Should you have any questions, please get in touch with [[academy@emagiz.com>>mailto:academy@emagiz.com]].
5
6 == 1. Prerequisites ==
7
8 * Basic knowledge of the eMagiz platform
9 * Understanding of the Event Streaming concept
10
11 == 2. Key concepts ==
12
13 This microlearning centers around topics and the properties of these topics. By topic, we mean A category/feed name to which event records are stored and published. With topic properties, we mean A piece of the whole configuration of a topic.
14
15 Knowing what topics are and which properties of these topics you can and should think about before creating them is a crucial part of successfully implementing an event streaming pattern via the eMagiz platform.
16
17 Regarding the properties of a topic, two crucial properties can significantly impact the cost of your implementation. These properties are:
18
19 * Retention Hours
20 * Retention Bytes
21
22 Below we will discuss a topic in more depth and zoom in on the topic properties, especially on the retention hours and the retention bytes.
23
24 After configuring your topic(s) the way you had in mind, you can check your work via the Design Architecture overview.
25 This overview shows if there is enough room available regarding GBs of Topic Storage.
26
27 == 3. Topic and Topic Properties ==
28
29 Within the eMagiz platform, you can use the Event Streaming pattern to help solve your business case. A crucial part of that solution is the topics and the accompanying properties.
30 In this section, we will discuss the following:
31
32 * What are the topics, and how can I use them
33 * Topic properties and their configuration
34
35 === 3.1 What are topics, and how can I use them ===
36
37 Based on the lines you drew in Capture, eMagiz automatically generates a topic. In other words, for each line you drew in Capture, eMagiz will create an accompanying topic.
38
39 As a reminder, a topic is a category/feed name to which event records are stored and published.
40
41 As said before, all Kafka records are organized into topics. Producer applications write data to topics, and consumer applications read from topics.
42 Records published to the cluster stay in the cluster until a configurable retention period has passed by.
43
44 Within eMagiz, you can use topics to temporarily store data to make sure that consumers can consume the data at a specific moment in time.
45 Four characteristics related to topics are:
46
47 * Publish/subscribe mechanism
48 * Asynchronous, realtime
49 * Dumb broker, smart consumer: Each subscriber can read at their own pace
50 * Retention
51
52 === 3.2 Topic properties and their configuration ===
53
54 Besides naming the topic, eMagiz also provides you with a set of default settings for your topic. You should not touch the default settings for replication of these settings. The default values for this are correct. Three of these settings need a closer look from you. These settings are:
55
56 * Retention Hours
57 * Retention Bytes
58 * Partitions
59
60 These three settings determine the amount of GB in storage necessary on the eMagiz Event Streaming cluster to run all topics.
61 As you can imagine, the longer you retain data, and the more data you retain, the higher the costs. On top of that, when you increase the partitions, the retained data configured in the retention bytes is duplicated by the number of partitions. So if you have a retention bytes setting of 500MB and the number of partitions is two, then the configured retention on the topic level is based on this simple configuration of 1GB.
62
63 {{warning}}Due to standard configurations on the broker level, a fixed number of MBs is added to the retention level before the deletion of data kicks in. Note that eMagiz considers this when calculating the configured topic storage of all your topics. {{/warning}}
64
65 You have the option to change these properties per environment. For example, you must navigate Design Architecture and enter the "Start Editing" mode to change the settings. Once in this mode, open the context menu on the Topic storage and select the option Edit storage. This will lead you to the following pop-up.
66
67 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--design-architecture-topic-storage-config.png]]
68
69 You can easily change the configuration of the topic for that particular environment by selecting one of the topics with a double click. Then, navigate to the Advanced tab in the following pop-up.
70
71 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--topic-properties-config-per-environment.png]]
72
73 You can change the settings based on the expected throughput and retention time. More information on what the two essential settings mean can be found below.
74
75 ==== 3.2.1 Retention Hours ====
76
77 Retention Hours is the number of hours data can reside on the topic before a FiFo principle of removing the first entry in the log kicks in. However, the moment data is still on the topic beyond this threshold. Therefore, it will automatically start deleting the data.
78 The default setting eMagiz provides you is 168 hours (7 days). For your use case, there might be no need to retain the data for such an extensive period. Instead, you only want to retain the data for 72 hours (3 days), for example, because all consumers can pick up data within that timeframe, and all messages older than three days will be obsolete.
79
80 ==== 3.2.2 Retention Bytes ====
81
82 Retention Bytes is the number of bytes available per partition on that topic before a FiFo principle of removing the first entry in the log kicks in. The moment your topic holds more bytes than the retention byte setting, it will automatically delete the data.
83 The default setting eMagiz provides you is roughly 500 MB. This might be too low for you if you have millions of messages passing over your topic a day. If so, you need to adjust this setting here. Calculating the correct value is explained in the following two sections.
84
85 ==== 3.2.3 Partitions ====
86
87 In Kafka, the data is stored in topics. The topic will further be divided into multiple partitions. The actual messages or the data will be held in the partition.
88
89 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--topic-properties-partitions-explanation.png]]
90
91 By default, your topic will be configured with one partition. We have chosen this default setting as you can increase the number of partitions on your topic, but you cannot decrease the number of partitions on a topic without deleting the topic (and losing all your data) first. On top of that, in most scenarios, only one consumer is consuming data, so increasing the partitions has only an effect if you want to increase throughput on the consumer side.
92
93 ===== 3.2.3.1 Consumer Group Consideration =====
94
95 If you have multiple consumer groups (with numerous consumers), you should consider increasing your number of partitions accordingly. A rule of thumb here is that when you have multiple consumers (in one consumer group), you should increase the number of partitions to match this number. This could be the case when you run your event processor in a double lane setup or your Mendix application multi-instance, for example. In those cases, one of the consumers in the group will consume data from partition A whereas the second consumer will consume data from partition B. This way, both consumers can only process half the data while the group can still collect the complete data set.
96
97 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--topic-properties-consumer-group-explanation.png]]
98
99 {{warning}}Note that increasing the number of the partition to any number above three has no effect currently due to the configuration of the broker provided by eMagiz.{{/warning}}
100
101 ==== 3.2.4 Calculating Storage Capacity of a Topic ====
102
103 A calculation example of determining the configuration is shown below:
104
105 * 100000 messages per day
106 * 3 days retention
107 * 5 kB average size of a message placed on the topic
108
109 Results in 100000 * 3 * 5 = 1.5 GB in Storage capacity per topic.
110
111 ==== 3.2.5 Calculating Retention Bytes based on Storage Capacity ====
112
113 Based on the previous calculation, you can calculate the Retention Bytes on your topic.
114
115 * 1.5 GB Storage capacity
116
117 Based on the chosen setting for the number of partitions, you need to divide this number by the number of partitions to get to the correct value for the retention bytes. This is because the retention bytes are configured on a partition level and **not** on a topic level. So when we assume the default of one partition as configured in eMagiz, this will lead to the following calculation.
118
119 Results in 1.5 / 1 = 1.5 GB in Retention Bytes. As the name indicates, this value needs to be added in bytes. For this example, we end up with 1524288000 bytes. The replication factor will replicate this number to get the correct amount of storage you could theoretically write to our Event Streaming Cluster.
120
121 === 3.3 Check available topic storage ===
122
123 When you are finished configuring the properties per topic, you can validate whether the expected data storage fits within the available amount of topic storage based on your configuration.
124 You can do this with the help of Design Architecture.
125
126 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--design-architecture.png]]
127
128 As you can see in the picture above, the Design Architecture represents not only the runtimes needed to run your flows
129 but also defines the amount of configured and recommended topic storage by eMagiz.
130
131 If we zoom in on the right-hand panel, you can see an entry relevant to Event Streaming at the bottom of that panel.
132 With this entry, you can easily see the amount of GB configured (based on the topic property settings) and see how much GB is still available based on your current contractual agreements.
133
134 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--design-architecture-topic-storage-available.png]]
135
136 Exceeding your contractually allowed storage eMagiz will prevent you from deploying new topics.
137
138 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--design-architecture-topic-storage-not-enough-available.png]]
139
140 === 3.4 Exclude topics per environment ===
141
142 Apart from configuring your topics, including topic properties in Design Architecture, you can exclude a topic from a specific environment. This way, you can run a particular topic on a specific environment or two of the three environments. This is an easy way to dynamically design your Event Streaming solution to fit your needs per environment. On top of that, it also saves storage capacity if you exclude topics from an environment.
143
144 Navigate Design Architecture and enter the "Start Editing" mode to exclude a topic. Once in this mode, open the context menu on the Topic storage and select the option Edit storage. This will lead you to the following pop-up.
145
146 [[image:Main.Images.Microlearning.WebHome@crashcourse-eventstreaming-topic-and-topic-properties--design-architecture-topic-storage-config.png]]
147
148 Here, you have a button called "Toggle Exclude." By selecting a topic and pressing the button, you exclude or include a topic (depending on the current state).
149
150 {{info}}To effectuate this change, you must apply the changes to the environment. For more information on how to do that, check out this [[microlearning>>doc:Main.eMagiz Academy.Microlearnings.Crash Course.Crash Course Event Streaming.crashcourse-eventstreaming-create-your-topic||target="blank"]].{{/info}}
151
152 == 4. Key takeaways ==
153
154 * A topic is a category/feed name to which event records are stored and published:
155 ** Retention on this topic is based on a FIFO principle (start at the beginning of the log)
156 ** The amount of GB needed for your solution is the most significant cost driver
157 ** Topics are automatically generated in eMagiz when you draw the line in Capture
158 * Think about your retention policy when implementing the Event Streaming solution to get a grip on the cost aspect of Event Streaming
159 * Check your configuration with the help of Design Architecture to make sure that the configured amount of GB is allowed under your current contract
160
161 == 5. Suggested Additional Readings ==
162
163 If you are interested in this topic and want more information on it, please read the help text provided by eMagiz when executing these actions and browse through the following links:
164
165 * [[Kafka explained>>https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html||target="blank"]]
166
167 )))((({{toc/}}))){{/container}}{{/container}}