Backfilling Mastery: Elevating Data Engineering Expertise | by Naser Tamimi | Nov, 2023

November 17, 2023
by Naser Tamimi
AI, Syndicated
192 Views

DATA ENGINEERING

A go-to guide for data engineers wading through the backfilling maze

Imagine starting a new data pipeline and getting data from a source you’ve never parsed before (e.g. pulling info from an API or an existing hive table). Now, you’re on a mission to make it seem like you collected this data ages ago. That’s one example of what we call data backfilling in data engineering.

But it’s not just about starting a new data pipeline or table. You could have a table that’s been gathering data for a while, and suddenly, you need to change the data (for example due to a new metric definition), or toss in more data from a new data source. Or maybe there’s an awkward gap in your data, and you just want to patch it up. All these situations are examples of data backfilling. The common thread is turning “back” in time and “filling” up your table with some historical data.

The following figure (Figure 1) shows a straightforward backfilling scenario. In this instance, a daily job retrieves data from two upstream sources (one for platform A and another for platform B). The dataset is structured with the first partition being ‘ds,’ and the second partition (or sub-partitions) representing the platforms. Unfortunately, data for the period from 2023–10–03 to 2023–10–05 is absent due to certain issues. To address this gap, a backfilling operation was initiated (the backfilling job started on 2023–10–08).

A brief heads-up before proceeding further: within the domain of data engineering, we normally encounter two scenarios: “backfilling” a table or “restating” a table. These processes, while sharing some similarities, have some subtle differences. Backfilling, as a practice, is about populating missing or incomplete data in a dataset. Its application is commonly directed towards updating historical data or rectifying gaps. Conversely, restating a table involves effecting substantial…

Source link

Backfilling Mastery: Elevating Data Engineering Expertise | by Naser Tamimi | Nov, 2023

DATA ENGINEERING

A go-to guide for data engineers wading through the backfilling maze

About Us

Our Services

Latest QSOL IT News

Backfilling Mastery: Elevating Data Engineering Expertise | by Naser Tamimi | Nov, 2023

DATA ENGINEERING

A go-to guide for data engineers wading through the backfilling maze

Related Post

Prioritizing UC&C “Experience” to Drive Better Business Outcomes

The foundation models we’ll build with Mayo Clinic

Pearson and Microsoft announce multiyear partnership to transform

Should Your Small Business Adopt An MSP For