Building a Batch Data Pipeline with Athena and MySQL | by 💡Mike Shakhomirov | Oct, 2023

October 20, 2023
by Mike Shakhomirov
AI, Syndicated
185 Views

An End-To-End Tutorial for Beginners

In this story I will speak about one of the most popular ways to run data transformation tasks — batch data processing. This data pipeline design pattern becomes incredibly useful when we need to process data in chunks making it very efficient for ETL jobs that require scheduling. I will demonstrate how it can be achieved by building a data transformation pipeline using MySQL and Athena. We will use infrastructure as code to deploy it in the cloud.

Imagine that you have just joined a company as a Data Engineer. Their data stack is modern, event-driven, cost-effective, flexible, and can scale easily to meet the growing data resources you have. External data sources and data pipelines in your data platform are managed by the data engineering team using a flexible environment setup with CI/CD GitHub integration.

As a data engineer you need to create a business intelligence dashboard that displays the geography of company revenue streams as shown below. Raw payment data is stored in the server database (MySQL). You want to build a batch pipeline that extracts data from that database daily, then use AWS S3 to store data files and Athena to process it.

Batch data pipeline

A data pipeline can be considered as a sequence of data processing steps. Due to logical data flow connections between these stages, each stage generates an output that serves as an input for the following stage.

There is a data pipeline whenever there is data processing between points A and B.

Data pipelines might be different due it their conceptual and logical nature. I previously wrote about it here [1]:

Source link

Building a Batch Data Pipeline with Athena and MySQL | by 💡Mike Shakhomirov | Oct, 2023

An End-To-End Tutorial for Beginners

Batch data pipeline

About Us

Our Services

Latest QSOL IT News

Building a Batch Data Pipeline with Athena and MySQL | by 💡Mike Shakhomirov | Oct, 2023

An End-To-End Tutorial for Beginners

Batch data pipeline

Related Post

Less is more: Why your MSP marketing plan

Thank you, PM Narendra Modi ji for your

Every general-purpose technology creates new opportunity across the

Seize the opportunity: Managed platform engineering services