Add One Line of SQL to Optimise Your BigQuery Tables | by Matt Chapman | Dec, 2023

December 8, 2023
by Matt Chapman
AI, Syndicated
274 Views

Clustering: A simple way to group similar rows and prevent unnecessary data processing

In my previous article, I explained how to optimise SQL queries using partitioning:

Now, I’m writing the sequel! (Dad joke, anyone?)

This article will look at clustering: another powerful optimisation technique you can use in BigQuery. Like partitioning, clustering can help you write more performant queries that are quicker and cheaper to run. If you want to develop your SQL toolkit and build those higher-level Data Science skills, this is a great place to start.

In BigQuery, a clustered table is a table that keeps similar rows grouped together in physical “blocks”.

For example, picture a table called user_signups that keeps track of all the people registering an account on a fictitious website. It’s got four columns:

registration_date: the date on which the user created an account
country: the country where the user is based
tier: the user’s plan (“Free” or “Paid”)
username: the user’s username

If we wanted, we could cluster the table by country so that users from the same country are stored nearby each other in the table:

Source link

Add One Line of SQL to Optimise Your BigQuery Tables | by Matt Chapman | Dec, 2023

Clustering: A simple way to group similar rows and prevent unnecessary data processing

About Us

Our Services

Latest QSOL IT News

Add One Line of SQL to Optimise Your BigQuery Tables | by Matt Chapman | Dec, 2023

Clustering: A simple way to group similar rows and prevent unnecessary data processing

Related Post

It was great to meet the team at

So long, Kiosk desktop icons

So long, Kiosk desktop icons

Just in time for test-time scaling, we have