Data pipeline - Mendix to AWS Redshift

(((

In this microlearning, we will learn how you can set up a data pipeline between Mendix and AWS Redshift with the help of eMagiz.

6

With the help of such a data pipeline, you can transfer large volumes of data between Mendix and AWS Redshift for data warehousing / BI analytics purposes.

7

8

Should you have any questions, please contact academy@emagiz.com.

9

== 1. Prerequisites ==

10

11

* Basic knowledge of the eMagiz platform

12

* Knowledge of AWS Redshift

13

* A AWS Redshift license

14

* Basic knowledge of the Mendix platform

15

16

== 2. Key concepts ==

17

18

This microlearning centers around configuring a data pipeline that will transfer large volumes of data between Mendix and AWS Redshift.

19

20

With data pipeline we mean: A integration pattern that can transfer large volumes of data between a specific set of source and sink systems

21

With AWS Redshift we mean: A AWS offering that is best suited for data warehousing which gives us the ability to create a denser overview based on various source tables.

22

23

To learn about this topic we start with a business question to which we want an answer to make this a little bit more concrete.

24

The business question is as follows:

25

26

* What were the total sales per event?

27

28

== 3. Data pipeline - Mendix to AWS Redshift ==

29

30

Imagine you are a global event organizer and want to track which events were hits and misses during the year.

31

By using a materialized view in AWS Redshift you can easily combine the information from several source tables to create an overview of exactly what you want to see.

32

In this case, a table with two attributes would suffice:

* eventname

* total_sales

As a data pipeline always transfers data from source to sink system we will start at the source of this data pipeline.

38

Afterward, we will continue with the transfer part and we will finish at the sink system.

39

40

=== 3.1 Setting up Mendix ===

41

42

To let the data pipeline retrieve data from your Mendix application you need to publish your data via a Published OData service.

43

44

[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--publish-odata-service-mendix.png]]

45

46

The result of this action should look similar to this

47

48

[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--publish-odata-service-mendix-result.png]]

49

50

To add resources to this OData service you can press the Add icon on this screen or navigate to the domain model to add the relevant resources.

51

Regardless of the option that you choose, the end result should be something like this:

52

53

[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--publish-odata-service-mendix-added-resources.png]]

54

55

Now we have successfully published the relevant data that we need to answer our question.

56

57

=== 3.2 Setting up eMagiz ===

58

59

Now that we have made the relevant information available we need to make sure we can retrieve the information via an eMagiz data pipeline.

60

Because we have two tables (Sales and Events) we need two data pipelines.

61

62

==== 3.2.1 Design ====

63

64

The focus on Design for a Data pipeline is to only select the option Data pipeline when edit the integration. There is no need to set up a data model for it. Don't forget to do the same for the other data pipeline that you need to configure

65

66

==== 3.2.1 Create ====

67

68

After you have transferred the data pipelines to Create you can edit the entry flows. You need to specifically import the Store item for Redshift from the Store using the Store from the left hand-panel. Please consult the eMagiz Store documentation at [[Mendix to Redshift store component>>url:https://docs.emagiz.com/bin/view/Main/eMagiz%20Store/Accelerators/Mendix%20to%20Redshift%20-%20Data%20Pipeline||target="blank"]])

69

70

As you can see the complete setup is already created for you and the various parts have been nicely organized and separated. Let us do a quick walkthrough:

71

72

* Starting at the top left corner we have the Job configuration.

73

* This configuration specifies the source system (Item reader), transformation (Item processor), and the sink system (Item writer). All of them are pre-filled for you

74

* Next to that you have all the support objects needed to run the flow.

75

* One down we have the job launch configuration.

76

* This configuration makes sure that the job, with certain parameters, is launched at a certain point in time. It is up to the user what this point of time is.

77

* In the left corner we have the job listener configuration

78

* This piece of the flow waits for a signal that the job is complete and upon completion will refresh the materialized view (and therefore updating the overview we want).

79

* In the bottom center we have some specific support objects that are relevant for this particular data pipeline implementation

80

* Last but not least, on the right-hand bottom corner we have the functionality that automatically cleans up the job dashboard.

81

82

The only thing you need to do is to rename the properties where it says message to the technical name of the integration that you have just added (i.e. sale, event)

83

84

==== 3.2.2 Deploy ====

85

86

Although you don't need to build this flow by yourself you still need to fill in various property values to make sure that you retrieve the correct data from Mendix and send it to the correct table in AWS Redshift. A complete list of these properties can be found on the flow level (under Other -> Properties). Before you can activate your data pipeline we first have to ensure that AWS Redshift is ready to receive the data.

87

88

=== 3.3 Setting up AWS Redshift ===

89

90

We will start with creating two separate tables (one for sales and one for events). This to make sure that we can later fill and update these tables.

91

An example SQL query could be:

92

93

94

CREATE TABLE IF NOT EXISTS sales_dp

95

(

96

salesid int8 PRIMARY KEY,

97

customer varchar(255),

price decimal,

eventid int8

);

Furthermore, we create a materialized view with the help of the following SQL query:

104

105

106

CREATE MATERIALIZED VIEW tickets_mv AS

107

SELECT event.eventname, SUM(sales.price) as total_sales

108

FROM public.sales_dp sales, public.event_dp event

109

WHERE sales.eventid = event.eventid

110

GROUP BY event.eventname

111

112

113

See below for an explanation of how this SQL query works.

114

115

[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--datapipeline-create-materialized-view.png]]

116

117

=== 3.4 Running the data pipeline ===

118

119

Now that we have configured Mendix (source), eMagiz (transfer), and AWS Redshift (sink) it is time to run the data pipelines. Don't forget to configure the properties correctly.

120

121

To illustrate the effect of setting up your data pipeline via eMagiz to refresh the materialized view that can answer our question I have two pictures. A before and an after picture.

122

123

[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--datapipeline-materialized-view-before.png]]

124

125

[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--datapipeline-materialized-view-after.png]]

126

127

== 4. Key takeaways ==

128

129

* A data pipeline is useful when transferring large volumes of data without the need for transformation

130

* With the help of materialized views in AWS Redshift you can easily zoom in and aggregate on data after it has happened for BI reporting

131

* Data pipelines are a standardized piece of software in eMagiz that can be implemented with ease

132

133

== 5. Suggested Additional Readings ==

134

135

If you are interested in this topic and want more information on it please read the help text provided by eMagiz and visit the following links:

136

137

* [[Materialized view overview>>https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html||target="blank"]]

138

* [[Materialized view SQL Command>>https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-create-sql-command.html||target="blank"]]

139

* [[Published OData Service>>https://docs.mendix.com/refguide/published-odata-services||target="blank"]]

140

)))

141

142

((({{toc/}}))){{/container}}

143

author	version	line-number	content
		1	{{container}}
		2	{{container layoutStyle="columns"}}
		3	(((
		4
		5	In this microlearning, we will learn how you can set up a data pipeline between Mendix and AWS Redshift with the help of eMagiz.
		6	With the help of such a data pipeline, you can transfer large volumes of data between Mendix and AWS Redshift for data warehousing / BI analytics purposes.
		7
		8	Should you have any questions, please contact academy@emagiz.com.
		9	== 1. Prerequisites ==
		10
		11	* Basic knowledge of the eMagiz platform
		12	* Knowledge of AWS Redshift
		13	* A AWS Redshift license
		14	* Basic knowledge of the Mendix platform
		15
		16	== 2. Key concepts ==
		17
		18	This microlearning centers around configuring a data pipeline that will transfer large volumes of data between Mendix and AWS Redshift.
		19
		20	With data pipeline we mean: A integration pattern that can transfer large volumes of data between a specific set of source and sink systems
		21	With AWS Redshift we mean: A AWS offering that is best suited for data warehousing which gives us the ability to create a denser overview based on various source tables.
		22
		23	To learn about this topic we start with a business question to which we want an answer to make this a little bit more concrete.
		24	The business question is as follows:
		25
		26	* What were the total sales per event?
		27
		28	== 3. Data pipeline - Mendix to AWS Redshift ==
		29
		30	Imagine you are a global event organizer and want to track which events were hits and misses during the year.
		31	By using a materialized view in AWS Redshift you can easily combine the information from several source tables to create an overview of exactly what you want to see.
		32	In this case, a table with two attributes would suffice:
		33
		34	* eventname
		35	* total_sales
		36
		37	As a data pipeline always transfers data from source to sink system we will start at the source of this data pipeline.
		38	Afterward, we will continue with the transfer part and we will finish at the sink system.
		39
		40	=== 3.1 Setting up Mendix ===
		41
		42	To let the data pipeline retrieve data from your Mendix application you need to publish your data via a Published OData service.
		43
		44	[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--publish-odata-service-mendix.png]]
		45
		46	The result of this action should look similar to this
		47
		48	[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--publish-odata-service-mendix-result.png]]
		49
		50	To add resources to this OData service you can press the Add icon on this screen or navigate to the domain model to add the relevant resources.
		51	Regardless of the option that you choose, the end result should be something like this:
		52
		53	[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--publish-odata-service-mendix-added-resources.png]]
		54
		55	Now we have successfully published the relevant data that we need to answer our question.
		56
		57	=== 3.2 Setting up eMagiz ===
		58
		59	Now that we have made the relevant information available we need to make sure we can retrieve the information via an eMagiz data pipeline.
		60	Because we have two tables (Sales and Events) we need two data pipelines.
		61
		62	==== 3.2.1 Design ====
		63
		64	The focus on Design for a Data pipeline is to only select the option Data pipeline when edit the integration. There is no need to set up a data model for it. Don't forget to do the same for the other data pipeline that you need to configure
		65
		66	==== 3.2.1 Create ====
		67
		68	After you have transferred the data pipelines to Create you can edit the entry flows. You need to specifically import the Store item for Redshift from the Store using the Store from the left hand-panel. Please consult the eMagiz Store documentation at [[Mendix to Redshift store component>>url:https://docs.emagiz.com/bin/view/Main/eMagiz%20Store/Accelerators/Mendix%20to%20Redshift%20-%20Data%20Pipeline\|\|target="blank"]])
		69
		70	As you can see the complete setup is already created for you and the various parts have been nicely organized and separated. Let us do a quick walkthrough:
		71
		72	* Starting at the top left corner we have the Job configuration.
		73	* This configuration specifies the source system (Item reader), transformation (Item processor), and the sink system (Item writer). All of them are pre-filled for you
		74	* Next to that you have all the support objects needed to run the flow.
		75	* One down we have the job launch configuration.
		76	* This configuration makes sure that the job, with certain parameters, is launched at a certain point in time. It is up to the user what this point of time is.
		77	* In the left corner we have the job listener configuration
		78	* This piece of the flow waits for a signal that the job is complete and upon completion will refresh the materialized view (and therefore updating the overview we want).
		79	* In the bottom center we have some specific support objects that are relevant for this particular data pipeline implementation
		80	* Last but not least, on the right-hand bottom corner we have the functionality that automatically cleans up the job dashboard.
		81
		82	The only thing you need to do is to rename the properties where it says message to the technical name of the integration that you have just added (i.e. sale, event)
		83
		84	==== 3.2.2 Deploy ====
		85
		86	Although you don't need to build this flow by yourself you still need to fill in various property values to make sure that you retrieve the correct data from Mendix and send it to the correct table in AWS Redshift. A complete list of these properties can be found on the flow level (under Other -> Properties). Before you can activate your data pipeline we first have to ensure that AWS Redshift is ready to receive the data.
		87
		88	=== 3.3 Setting up AWS Redshift ===
		89
		90	We will start with creating two separate tables (one for sales and one for events). This to make sure that we can later fill and update these tables.
		91	An example SQL query could be:
		92
		93	{{code language="sql"}}
		94	CREATE TABLE IF NOT EXISTS sales_dp
		95	(
		96	salesid int8 PRIMARY KEY,
		97	customer varchar(255),
		98	price decimal,
		99	eventid int8
		100	);
		101	{{/code}}
		102
		103	Furthermore, we create a materialized view with the help of the following SQL query:
		104
		105	{{code language="sql"}}
		106	CREATE MATERIALIZED VIEW tickets_mv AS
		107	SELECT event.eventname, SUM(sales.price) as total_sales
		108	FROM public.sales_dp sales, public.event_dp event
		109	WHERE sales.eventid = event.eventid
		110	GROUP BY event.eventname
		111	{{/code}}
		112
		113	See below for an explanation of how this SQL query works.
		114
		115	[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--datapipeline-create-materialized-view.png]]
		116
		117	=== 3.4 Running the data pipeline ===
		118
		119	Now that we have configured Mendix (source), eMagiz (transfer), and AWS Redshift (sink) it is time to run the data pipelines. Don't forget to configure the properties correctly.
		120
		121	To illustrate the effect of setting up your data pipeline via eMagiz to refresh the materialized view that can answer our question I have two pictures. A before and an after picture.
		122
		123	[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--datapipeline-materialized-view-before.png]]
		124
		125	[[image:Main.Images.Microlearning.WebHome@intermediate-datapipelines-datapipeline-mendix-to-aws-redshift--datapipeline-materialized-view-after.png]]
		126
		127	== 4. Key takeaways ==
		128
		129	* A data pipeline is useful when transferring large volumes of data without the need for transformation
		130	* With the help of materialized views in AWS Redshift you can easily zoom in and aggregate on data after it has happened for BI reporting
		131	* Data pipelines are a standardized piece of software in eMagiz that can be implemented with ease
		132
		133	== 5. Suggested Additional Readings ==
		134
		135	If you are interested in this topic and want more information on it please read the help text provided by eMagiz and visit the following links:
		136
		137	* [[Materialized view overview>>https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html\|\|target="blank"]]
		138	* [[Materialized view SQL Command>>https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-create-sql-command.html\|\|target="blank"]]
		139	* [[Published OData Service>>https://docs.mendix.com/refguide/published-odata-services\|\|target="blank"]]
		140	)))
		141
		142	((({{toc/}}))){{/container}}
		143	{{/container}}

Wiki source code of Data pipeline - Mendix to AWS Redshift