Character set
In this microlearning, we’ll cover how to define the character set in eMagiz to ensure proper data processing. A character set is essentially a collection of characters supported by software and hardware, and different systems may use different sets. eMagiz defaults to UTF-8, but you may need to adjust this if you're dealing with systems that use other character sets. We’ll guide you through configuring the character set for file-based connectivity to ensure seamless communication with external systems.
Should you have any questions, please contact academy@emagiz.com.
1. Prerequisites
- Basic knowledge of the eMagiz platform
2. Key concepts
This microlearning centers around learning how to define the character set to ensure that eMagiz processes the information correctly.
By character set, we mean: The composite number of different characters that are being used and supported by computer software and hardware. It consists of codes, bit patterns, or natural numbers used in defining some particular character.
- Some external system talk in a different character set
- eMagiz talks in default UTF-8 as a character set and assumes everyone else also does this
- In cases of mismatch correct is at the point where you talk with the other system (i.e. entry or exit)
3. Character set
In some cases, the input you receive or the output that you need to send to an external party cannot handle all characters or the input or output is written with the help of a character set. In this microlearning, we will learn how you can define the character set for file-based connectivity to ensure that you can process and deliver files according to the specifications.
Sometimes external systems only talk in a specific character set. To ensure that all the data is properly communicated between eMagiz and the other system we need to make sure that we define which character set that is so we can tell it to eMagiz via a component. That way eMagiz will deviate from its default (i.e. UTF-8) and will process the file according to that different character set. In practice, we mainly see windows-1252 as an alternative that pops up once in a while. In various components that deal with file handling, you can define the character set on which eMagiz should act. Examples of such components are:
- File to string transformer
- Flat file to XML transformer
- File outbound channel adapter
In all these components you have the option to define the character set within the Advanced tab of the component. In this microlearning, we will use the File to string transformer to illustrate how that will look.
In this field, you can define the character set of your choice. To make this work in eMagiz you need to navigate to the Create phase of eMagiz and open the entry flow in which you want to retrieve the file to a certain location. Within the context of this flow, we need to add functionality that will ensure that the correct character set is used. To do so first enter "Start Editing" mode on flow level. After that open, the File to string transformer, navigate to the Advanced tab, and fill in the correct character set. After you have defined the correct character set the only thing left to do is to Save the component. See the suggested additional readings section on the complete list of character sets that are supported by Java 8.
Congratulations you have successfully learned how to specify the character set.
4. Key takeaways
- Some external system talk in a different character set
- eMagiz talks in default UTF-8 as a character set and assumes everyone else also does this
- In cases of mismatch correct is at the point where you talk with the other system (i.e. entry or exit)
- eMagiz provides several components within which you can define the character set