Efficient Coding in Data Science: Easy Debugging of Pandas Chained Operations | by Marcin Kozak | Nov, 2023

PYTHON

How to inspect Pandas frames in chained without breaking the chain into separate statements

Marcin Kozak
Towards Data Science
Debugging chained Pandas operations without breaking the chain is possible. Photo by Miltiadis Fragkidis on Unsplash

Debugging lies in the heart of programming. I wrote about this in the following article:

This statement is quite general and language- and framework-. When you use Python for , you need to debug irrespective of whether you’re conducting data analysis, writing an ML software product, or creating a Streamlit or Django app.

This article discusses debugging Pandas code, or rather a specific scenario of debugging Pandas code in which operations are chained into a pipe. Such debugging poses a challenging . When you don’t know how to do it, chained Pandas operations seem to be far more difficult to debug than regular Pandas code, that is, Pandas operations using typical assignment with square brackets.

To debug regular Pandas code using typical assignment with square brackets, it’s enough to add a Python breakpoint — and use the pdb interactive debugger. This would be something like this:

>>> d = pd.DataFrame(dict(
... x=[1, 2, 2, 3, 4],
... y=[.2, .34, 2.3, .11, .101],
... group=["a", "a", "b", "b", "b"]
.. ))
>>> d["xy"] = d.x + d.y
>>> breakpoint()
>>> d = d[d.group == "a"]

Unfortunately, you can’t do that when the code consists of chained operations, like here:

>>> d = d.assign(xy=lambda df: df.x + df.y).query("group == 'a'")

or, depending on your preference, here:

>>> d = d.assign(xy=d.x + d.y).query("group == 'a'")

In this case, there is no place to stop and look at the code — you can only do so before or after the chain. Thus, one of the is to break the main chain into two sub-chains (two pipes) in a…

Source link