Home > other >  Can method chaining be done in-place in pandas?
Can method chaining be done in-place in pandas?

Time:10-21

For instance, if I'm performing a series of actions like so:

df = df.groupby(['A', 'B', 'C']).sum().reset_index()

Could I instead perform these actions in-place without having to reassign to the same variable? I recognise that some of these functions have an 'inplace' argument, but how would this work when chaining functions? Similarly, is it possible when you're slicing and reassigning something?

df[to_float] = df[to_float].round(2).astype(str)

CodePudding user response:

Pandas now offers new methods to build something that feels like pipelines, or a bit closer to the R/Tidyverse experience:

  • .filer() is the same as a SELECT statement in SQL: you can even use regex patterns to select some columns from a pandas.DataFrame.
  • .query() enables filtering like a WHERE clause in SQL: df.query("col == 'value'") or df.query("col == `external_variable`")
  • .assign() is used to create and transform columns, a bit like mutate() in the Tidyverse. It's not really Pythonic, because sometimes it involves double lambda functions: df.assign(newcol = lambda x: x["column"].apply(lambda ...). You can also use this method with user-defined functions, but passing arguments is not as intuitive as with the following method.
  • .pipe() enables you to apply a function to a DataFrame. This is especially useful with custom functions, but you can also use regular functions (e.g.: df.value_counts().pipe(pd.DataFrame).

Out of all these, the simplest one to avoid writing df['newcol'] = df... should be df.assign(). While this still implies assignment, you can simply use these methods right after pd.read_csv() and such to manipulate directly the data into what you want it to look like, without multiple assignments.

From my humble experience, I have come to realise that writing custom functions is one of the better ways to use Pandas in a sort of "pipeline"/"flow" manner (see this series of blog posts on "Modern Pandas" by one of its creators). The core idea is to write functions that return DataFrames and pass them to the original DataFrame with the .pipe() method.

Hope this is close to what you are looking for!

  • Related