Wiki source code of Duplicate Detection
Last modified by Danniar Firdausy on 2024/09/19 10:20
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | {{container}}{{container layoutStyle="columns"}}((( | ||
2 | In this microlearning, we will introduce the relevant stateful components to configure the Duplicate Detection operation in the context of State Generation. In case you want to learn the basics of the State Generation functionality, please check out [[eMagiz State Generation>>doc:Main.eMagiz Academy.Fundamentals.fundamental-stateful||target="blank"]]. | ||
3 | |||
4 | Should you have any questions, please get in touch with [[academy@emagiz.com>>mailto:academy@emagiz.com]]. | ||
5 | |||
6 | == 1. Prerequisites == | ||
7 | |||
8 | * Basic knowledge of the eMagiz platform | ||
9 | * Basic knowledge of the eMagiz State Generation | ||
10 | |||
11 | == 2. Key concepts == | ||
12 | |||
13 | In this microlearning, we will introduce the relevant stateful components to configure the Duplicate Detection operation in the context of State Generation functionality. | ||
14 | * By Duplicate Detection, we mean: A form of operation in which we identify and monitor the flow of messages over time so that if all or some part of any new messages match all or some part of those messages previously received, we can tag them as duplicates and take appropriate action such as discarding them. | ||
15 | |||
16 | There are some check points to think about beforehand when setting up the Duplicate Detection operation: | ||
17 | |||
18 | * How should past messages be stored for incoming messages to compare? For how long these past messages will be stored? | ||
19 | * What is the condition for incoming messages to be tagged as duplicates? Which part of the messages will be compared to indicate duplicates? | ||
20 | * When a duplicate is detected, what is the action to be taken and how to set up such an action? | ||
21 | |||
22 | == 3. Setting up Duplicate Detection operation == | ||
23 | |||
24 | To configure the Duplicate Detection operation, you first set up the component to check if the incoming message is a duplicate or not, and then the action when a duplicate is detected. If state persistence during runtime shutdowns or restarts is a requirement, then you can also set up the storage to maintain the states. | ||
25 | |||
26 | === 3.1 Detecting Duplicates === | ||
27 | |||
28 | First of all, you can start with setting up the component that is responsible for detecting duplicating messages and then either mark these messages as duplicates or discard them. eMagiz provides a support object that can do this, namely, the Duplicate Detector that you can add into your flow via the [[standard manner>>doc:Main.eMagiz Academy.Microlearnings.Crash Course.Crash Course Platform.crashcourse-platform-create-support-objects-introduction||target="blank"]] by searching for "Duplicate detector". | ||
29 | |||
30 | [[image:Main.Images.Microlearning.WebHome@intermediate-state-generation-duplicate-detection-duplicate-detector.png]] | ||
31 | |||
32 | The idea here is to apply this support object to any component in your flow that has an input channel, allowing that component to check if the incoming message resembles any or some part of the state data stored in the metadata store. As shown in the screenshot above, from the Endpoint dropdown menu, you can do this by selecting the flow component to which you want to apply this support object. Next, using a SpEL expression in the Key expression field, you can then define which part of the incoming message should be compared with the key of the state data that are already stored in the metadata store. Afterwards, you can either select a metadata store support object you have created if you require state persistence, or leave it empty, which will default to an in-memory store that may result in data loss during runtime shutdowns or restarts. | ||
33 | |||
34 | Let us take the screenshot above as an example. We evaluate an incoming payload based on its {{code}}id{{/code}} field and compare it with the existing state data in the store that has similar {{code}}id{{/code}} as its key. At a certain point, there is an incoming payload such as {{code language="json"}}{"id":"123","name":"John"}{{/code}}. If there is no state data with the key "123" stored in the metadata store yet, and assuming that we set {{code}}payload.name{{/code}} as the Value expression, then a new state data entry of {{code}}<123:"John">{{/code}} will be stored (note that the metadata store stores state data as key-value pairs, and 123 here is the key and "John" is the value). This means that if another payload such as {{code language="json"}}{"id":"123","name":"Doe"}{{/code}} arrives, it will be considered a duplicate because the Duplicate Detector could find an existing state data with the same key as the id field of the incoming payload. | ||
35 | |||
36 | ==== 3.1.1 Handling Duplicates ==== | ||
37 | |||
38 | Once a message is considered a duplicate, the next step is to define the action to take when such a duplicate is detected. To do this, go to the Advanced tab of the Duplicate Detector support object, where you will find the Discard Channel configuration. See the screenshot below for an example for this. | ||
39 | |||
40 | [[image:Main.Images.Microlearning.WebHome@intermediate-state-generation-duplicate-detection-duplicate-detector-advanced.png]] | ||
41 | |||
42 | {{info}} | ||
43 | When a duplicate is detected, this configuration defines the following actions: | ||
44 | * Default Setting: If you leave this field empty (the default setting), the message will proceed to the output channel with a header called {{code}}duplicateMessage{{/code}} set to the boolean value {{code language="java"}}true{{/code}}. This header allows you to define your own action later in your flow. | ||
45 | * Other Channels: If you select any other channel (besides nullChannel), the message will be redirected to the selected channel with the {{code}}duplicateMessage{{/code}} header attached. | ||
46 | * nullChannel: If you select nullChannel, then the message will be discarded. | ||
47 | {{/info}} | ||
48 | |||
49 | {{html}} | ||
50 | <!-- | ||
51 | ==== 3.1.2 Alternative Approach ==== | ||
52 | |||
53 | Lastly, if you need to compare whether any part of an incoming message resembles the value of the state data stored in the metadata store, you can use the Value expression field. Similar to the Key expression field, you can use a SpEL expression here to define which part of the incoming message should be compared with the stored state data. | ||
54 | --> | ||
55 | {{/html}} | ||
56 | |||
57 | === 3.2 Storage Mechanism for State Persistance === | ||
58 | |||
59 | As discussed above, in the case that you require persistence to your state (storing to disk instead of in-memory), then you need to link a Metadata store to your Duplicate Detector support object. Therefore, you will need to set up these support objects as well if you have not done so already: | ||
60 | * Infinispan cache manager | ||
61 | * Infinispan metadata store | ||
62 | |||
63 | Once you have done so, you can set the "Simple cache" option in your metadata store to "no" and then set the "Persistent" option to "yes". For more information on configuring these support objects and understanding their settings, please refer to this [[State Persistence>>doc:Main.eMagiz Academy.Microlearnings.Intermediate Level.State Generation.intermediate-state-persistence||target="blank"]] microlearning. | ||
64 | |||
65 | == 4. Key takeaways == | ||
66 | |||
67 | * With State Generation, you can compare incoming messages to past state data to detect any changes that may trigger specific actions. | ||
68 | * To set up a State Generation - Duplicate Detection operation, you need to configure a Duplicate Detector support object to compare parts of incoming messages with the stored state data, and a flow component to attach the duplicate detector for executing the detection. | ||
69 | * In the case that you require state persistence, you need to set up an Infinispan Metadata Store and its Infinispan Cache Manager and then link the store to your Duplicate Detector. | ||
70 | * To minimize the hurdles and speed up the process of setting up a State Generation - Duplicate Detection operation, we recommend using a store item that is [[available here>>doc:Main.eMagiz Store.Accelerators.Duplicate Detection - State Generation.WebHome||target="blank"]]. | ||
71 | |||
72 | == 5. Suggested additional readings == | ||
73 | |||
74 | If you are interested in this topic and want more information on it, please read the help text provided by eMagiz and read the following microlearning on the related topic: | ||
75 | |||
76 | * [[eMagiz Store (Menu)>>doc:Main.eMagiz Store.WebHome||target="blank"]] | ||
77 | ** [[Accelerators (Navigation)>>doc:Main.eMagiz Store.Accelerators.WebHome||target="blank"]] | ||
78 | *** [[Duplicate Detection - State Generation (Explanation)>>doc:Main.eMagiz Store.Accelerators.Duplicate Detection - State Generation.WebHome||target="blank"]] | ||
79 | * [[Crash Courses (Menu)>>doc:Main.eMagiz Academy.Microlearnings.Crash Course.WebHome||target="blank"]] | ||
80 | ** [[Crash Course Platform (Navigation)>>doc:Main.eMagiz Academy.Microlearnings.Crash Course.Crash Course Platform.WebHome||target="blank"]] | ||
81 | *** [[Support objects - Introduction (Explanation)>>doc:Main.eMagiz Academy.Microlearnings.Crash Course.Crash Course Platform.crashcourse-platform-create-support-objects-introduction||target="blank"]] | ||
82 | * [[Intermediate Level (Menu)>>doc:Main.eMagiz Academy.Microlearnings.Intermediate Level.WebHome||target="blank"]] | ||
83 | ** [[State Generation (Navigation)>>doc:Main.eMagiz Academy.Microlearnings.Intermediate Level.State Generation.WebHome||target="blank"]] | ||
84 | *** [[State Persistence>>doc:Main.eMagiz Academy.Microlearnings.Intermediate Level.State Generation.intermediate-state-persistence||target="blank"]] | ||
85 | * [[State Generation (Search Results)>>url:https://docs.emagiz.com/bin/view/Main/Search?sort=score&sortOrder=desc&highlight=true&facet=true&r=1&f_space_facet=0%2FMain.&l_space_facet=10&f_type=DOCUMENT&f_locale=en&f_locale=&f_locale=en&text=%22state+generation%22||target="blank"]] | ||
86 | )))((({{toc/}}))){{/container}}{{/container}} |