Changes for page Volume Mapping (On-premise)
Last modified by Erik Bakker on 2024/08/26 12:37
From version 41.1
edited by Erik Bakker
on 2022/10/31 09:07
on 2022/10/31 09:07
Change comment:
There is no comment for this version
To version 30.2
edited by Erik Bakker
on 2022/06/10 13:23
on 2022/06/10 13:23
Change comment:
Update document after refactoring.
Summary
-
Page properties (3 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 - VolumeMapping (On-premise)1 +novice-file-based-connectivity-characterset - Default language
-
... ... @@ -1,1 +1,0 @@ 1 -en - Content
-
... ... @@ -1,9 +1,11 @@ 1 1 {{container}}{{container layoutStyle="columns"}}((( 2 +In some cases, you want to treat each unique part of your input file as its message instead of processing the complete file as its message. In this microlearning, we will learn how you can process a (large) file on a per-line basis. 2 2 3 -When you need to read and write files from an on-premise disk, you need to know the path in which the data is stored and make sure that the docker container in your runtime(s) running has access to this path. There are several ways of dealing with this challenge. First, this microlearning will discuss the various alternatives and best approaches in these scenarios. 4 - 5 5 Should you have any questions, please contact [[academy@emagiz.com>>mailto:academy@emagiz.com]]. 6 6 6 +* Last update: May 31th, 2021 7 +* Required reading time: 7 minutes 8 + 7 7 == 1. Prerequisites == 8 8 9 9 * Basic knowledge of the eMagiz platform ... ... @@ -10,99 +10,93 @@ 10 10 11 11 == 2. Key concepts == 12 12 13 -This microlearning centers around learning how to set upyourvolume mappingcorrectlysoyou canexchangefile-baseddata on-premise.15 +This microlearning centers around learning how to process an incoming file per line. 14 14 15 -By volumemapping, we mean:Creatingaconfigurationhroughwhich thedockercontainercan readand write dataon a specific path onanon-premise machine.17 +By processing per line, we mean: Splitting up the input into discernable pieces that each will become a unique message 16 16 17 -There are several options for volume mapping for your on-premise machine. 18 -* Volume 19 -* Bind mount 20 -* Temporary file system 21 -* Named pipe 19 +* Easy way of reading a file line by line and sending it to eMagiz (Low on memory) 20 +* Ability to process each line based on distinctive logic that is relevant on line level 21 +* Can be used for flat file as well as XML input files 22 22 23 -== 3. VolumeMapping(On-premise)==23 +== 3. Processing a File per Line == 24 24 25 - Whenyou need to read and write filesfrom an on-premisedisk, you needtoknowthepathin whichthedataisstoredandmake surethatthedockercontainerin yourruntime(s) running has accesstothispath. Thereareseveral waysof dealing with thischallenge.First,this microlearning willdiscuss thevariousalternatives andbestapproachesinthesescenarios.25 +In some cases, you want to treat each unique part of your input file as its message instead of processing the complete file as its message. In this microlearning, we will learn how you can process a (large) file on a per-line basis. 26 26 27 -There are several options for volume mapping for your on-premise machine. 28 -* Volume 29 -* Bind mount 30 -* Temporary file system 31 -* Named pipe 27 +To make this work in eMagiz you need to navigate to the Create phase of eMagiz and open the entry flow in which you want to retrieve the file to a certain location. Within the context of this flow, we need to add functionality that will ensure that each line is read and processed separately and will become its unique message. To do so first enter "Start Editing" mode on flow level. After you have done so please add a file item reader message source to the flow. We will use this component to read and process our input file on a per-line basis. 32 32 33 - Below we will explain thedifferencesbetweenthevarious options availableforyour volumemapping.Butbeforewedo, we firstexplain howtoset up this configurationwithineMagiz.Then,you must navigatetoDeploy-> Architecture onthemodel level. Inthisoverview,youcan accesstheVolume mapping per runtimedeployed on-premise.Todo so, you canright-clickon theuntime to access the context menu.29 +The first step would be to define the directory from which we read our messages. As always reference to the directory with the help of a property. 34 34 35 -[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity- volume-mapping-on-premise--volume-option-context-menu.png]]31 +[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-processing-a-file-per-line--file-item-reader-directory.png]] 36 36 37 - When youclick thisoption,youwillseethefollowingpop-up. In this pop-up, you candefinethemachine-levelandruntime-levelvolumes.Moreonthatlater.Thisis thestartingpointforconfiguringyourvolumemapping. Wewillwalkthrougheachavailable optionand explain howtheywork andshould benfigured.33 +Secondly, just as when reading the file as a whole ensure that you use a filter to retrieve only the correct files from the directory. 38 38 39 - [[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-volume-mapping-on-premise--volume-mapping-pop-up.png]]35 +=== 3.1 Item reader Type === 40 40 41 - {{info}}Note thatyou shouldbe in"Start editing"mode tomake any changes tothe configuration ofyourvolumemapping.{{/info}}37 +Now it is time to select our Item reader Type. As the help text of the eMagiz component suggest there are two choices with this component. The first (and most frequently used) option is the Flat file item reader. With this option, you can read each line within the flat file input file and output is at a separate message. The second option is called the Stax event item reader. With this option, you can read your input XML and output messages on a per-record basis. 42 42 43 - === 3.1 Volume===39 +[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-processing-a-file-per-line--item-reader-type-options.png]] 44 44 45 - To make thiswork ineMagiz you needtoavigateto the Create phase of eMagiz and open the entryflow in which youwant to archive the files. Within thecontextofthis flow, weneed toadd functionalitythat will ensure that each input file is archived and cleaned up whenolder thanthree days. To do sofirst enter "Start Editing" mode on flow level. The first decision we haveto takeis how we are goingto name the fileswithin the archiving. The best practice, in this case, is the originalfilename+ the current time as a suffix. You candefine this by dragging aformatfilename generator (support object) to the canvas.41 +Based on your choice the exact configuration will differ. 46 46 47 - [[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-archiving--file-name-generator.png]]43 +==== 3.1.1 Stax Event Item Reader ==== 48 48 49 - Afterwehavedonethispleaseaddafileoutbound channeladapterto theflowincludinganinputchannel.Ensurethat youuseapropertyforthe directorythatreferencesanotherdirectorycomparedtothe inputdirectorytopreventcreating aninfiniteloop.45 +For the Stax event item reader, you need to define the name of the element on which you want to split the XML and define whether you want to throw an error in case no such element exists in the input file (By (de)selecting the option Strict). The default setting of eMagiz is advisable for this option. 50 50 51 -[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity- archiving--archiving-config-file-outbound-basic.png]]47 +[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-processing-a-file-per-line--stax-event-item-reader-config.png]] 52 52 53 - Nowthat we have configured the basics let us turn our attention to the advanced configuration.In the advanced tab of this component, we need to select the file name generatorto ensure that the filesare named correctly.In case you process each line separately you have to choose whether to save themas separate files in the archive or by appending them again. This can be achieved by selecting the correctMode. In most cases, however, the default Mode of Replace will suffice.49 +==== 3.1.2 Flat File Item Reader ==== 54 54 55 -[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-archiving--archiving-config-file-outbound-advanced.png]] 51 +For the Flat File item reader, there are some more choices and configurations to be made. There are three options you can choose from: 52 +- Pass through line mapper 53 +- Default line mapper 54 +- Pattern matching composite line mapper 56 56 57 - Themomentyou are satisfied press Save.Nowthatwehave configuredthisitbecomestime todeterminehowwe gettheneeded inputto writetoourarchive.In the example weareusingherewewant to archiveourinputfile so weneedtoensure that thedatawereceivediswritten to the archiveasoonaspossible. To doplacea wiretaponthe firstchannelafterretrieving the file. Thiswillmakeurethat themessage is archivedbefore processedfurther. The result should be somethingas shownbelow.Note thatthissame piece oflogic could beappliedinotherflowswithinthe eMagizplatformin asimilarmanner.56 +Each of these options has some advantages and disadvantages. Adhering to the best practices of eMagiz (i.e. no transformation in the entry) the best option would be to use the pass-through line mapper. As the name suggests this option does nothing except give a string back to the flow on a per line basis. However, choosing this option means that the actual transformation from that string to XML needs to happen later in the process (most likely in the onramp) with the help of a flat-file to XML transformer (more on that component in a later course). 58 58 59 - [[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-archiving--archiving-result.png]]58 +The other two options transform the input line into an XML output. So you win one step in the process. However, no standard eMagiz error handling is advisable when you start transforming data within the entry. So in case, something goes wrong to analyze the error will become more difficult. Furthermore, another potential disadvantage is that when one line fails the processing of the rest of the file also halts. 60 60 61 - ===3.2Clean uptheArchive===60 +For the remainder of this microlearning, we will assume that the option pass through line mapper is chosen. 62 62 63 - To ensure that thedataisnot keptindefinitely we need toclean up the archive.doso to prevent problems with disk spaceutalso to preventdata leaks of old data thatcould impactthe privacy of others. Before wecan set up the logic ineMagiz we need to talk to thecustomerto see what anacceptablermis within which thedata is kept. In most cases,this is a week ortwoweeks. In this example, we have chosenthree days.62 +[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-processing-a-file-per-line--flat-file-item-reader-passthrough.png]] 64 64 65 - Nowthatweknowthelimit itistimeto configurethecomponents.Wetartwith acompositefilefilter(support object).Withinthisfilter, weatleastdefinehowoldafilemustbebefore it can bedeleted(inmilliseconds).Ifweurn threedays intomillisecondsweget259200000. Furthermore,weatleast define thatwe onlywant todeleteregularfiles.64 +As you can see on the Basic level we are done. However, it is always good to check out the settings on the Advanced tab, especially in this case, to see if there are additional configuration options that could benefit us. The setting of most interest, in this case, is the Lines to Skip setting (default setting is 0). With this setting, you can define whether or not you want to process the header line(s) that exists within your input file. The remainder of the settings is (in most cases) good the way eMagiz has set them up. 66 66 67 -[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity- archiving--file-list-filter-for-archive-cleanup.png]]66 +[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-processing-a-file-per-line--flat-file-item-reader-passthrough-advanced.png]] 68 68 69 - Havingdone so we can add a file inbound channel adapter to the canvas including an output channel.Ensure that the property reference for the directory matches the one you have used before in the outbound channeladapter. Furthermorelink thefilterto the component and define the poller according to the best practice.68 +=== 3.2 Poller === 70 70 71 - [[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-archiving--file-inbound-archive-cleanup.png]]70 +Now that we have selected and configured the item reader type it becomes time to fill in the last part of the configuration, the poller. For polling eMagiz offers three options: 72 72 73 -One thing we should not forget within this configuration is to set the Max messages per poll on the Advanced tab of the poller-configuration to a sufficiently high number (i.e. 50). If you forget to do so and you only check once a day it will mean that only one message will be deleted that day. 72 +- Fixed Delay Trigger 73 +- Fixed Rate Trigger 74 +- Cron Trigger 74 74 75 - [[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-archiving--file-inbound-archive-cleanup-max-messages-per-poll.png]]76 +Of these options, the cron trigger is used most frequently in eMagiz. The reason being is that you can define this option via a property that you can alter without having to alter the flow version in Create. 76 76 77 - Now eMagiz will check onaset time interval whether there are filesthat areolder than three days that are ready for deletion.Onelast step togo. This last step will ensure that all files thatfit the bill will bedeleted from thearchive. Simply add a standard service activatortothecanvasand definethefollowing SPeL expression within themponent: payload.delete().78 +[[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-processing-a-file-per-line--poller-config.png]] 78 78 79 - [[image:Main.Images.Microlearning.WebHome@novice-file-based-connectivity-archiving--archive-cleanup-deletion.png]]80 +After finishing all these configuration steps we can press Save to save our work and ensure that we can process the input file on a per-line basis. 80 80 81 -This will ensure that each file that is retrieved will indeed be deleted from the archive. 82 - 83 83 == 4. Assignment == 84 84 85 -Configure an entry in which you build the archivingand thecleanup ofthe archiving.84 +Configure an entry in which you define the component and configuration needed to process a file on a per-line basis. 86 86 This assignment can be completed with the help of the (Academy) project that you have created/used in the previous assignment. 87 87 88 88 == 5. Key takeaways == 89 89 90 -* Archiving isusedforauditpurposes91 -* Archi vingisusedforretryscenarios92 -* Ensurethatdatais cleanedafteraretentioneriodtokeepin controlof thedata93 -* Don't forgetthemaxmessagesperpoll89 +* Easy way of reading a file line by line and sending it to eMagiz (Low on memory) 90 +* Ability to process each line based on distinctive logic that is relevant on line level 91 +* Can be used for flat file as well as XML input files 92 +* Try to avoid complex transformations within the entry 94 94 95 95 == 6. Suggested Additional Readings == 96 96 97 - Ifyouareinterestedin this topicandwant moreinformationon it pleasereadthe help text provided by eMagizandcheck outthe followingstorecontent:96 +There are no suggested additional readings on this topic 98 98 99 -* [[File Archiving>>doc:Main.eMagiz Store.Accelerators.File Archiving.WebHome||target="blank"]] 100 -* [[Delete Folder(s)>>doc:Main.eMagiz Store.Accelerators.Delete Folder(s).WebHome||target="blank"]] 101 - 102 102 == 7. Silent demonstration video == 103 103 104 104 This video demonstrates how you could have handled the assignment and gives you some context on what you have just learned. 105 105 106 -{{video attachment="novice-file-based-connectivity- characterset.mp4" reference="Main.Videos.Microlearning.WebHome"/}}102 +{{video attachment="novice-file-based-connectivity-processing-a-file-per-line.mp4" reference="Main.Videos.Microlearning.WebHome"/}} 107 107 108 108 )))((({{toc/}}))){{/container}}{{/container}}