Home > front end >  Pandas method chaining when df not assigned yet
Pandas method chaining when df not assigned yet

Time:01-07

Is it possible to do method chaining in pandas when

  • no variable refering to the dataframe has been assigned, yet
  • AND the method needs to refer to the dataframe?

Example: here data frame can be referred to by variable name.

df = pd.DataFrame({"a":[1,2,3], "b":list("abc")})
df = (df
      .drop(df.tail(1).index)
      #.other_methods
      #...
      )
df

Is it possible to do this without having assigned the dataframe to a variable name?

df = (pd.DataFrame({"a":[1,2,3], "b":list("abc")})
      .drop(??.tail(1).index)
      #.other_methods
      #...
      )
df

Thanks!

CodePudding user response:

You need some reference to the dataframe in order to use it in multiple independent places. That means binding a reusable name to the value returned by pd.DataFrame.

A "functional" way to create such a binding is to use a lambda expression instead of an assignment statement.

df = (lambda df: df.drop(df.tail(1).index)....)(pd.DataFrame(...))

The lambda expression defines some function that uses whatever value is passed as an argument as the value of the name df; you then immediately call that function on your original dataframe.

CodePudding user response:

As a complement to @chepner's answer, note that some methods/indexer support passing a function/lambda natively:

Example with assign and loc:

(pd.DataFrame({"A":[1,2,3], "B":list("abc")})
 .assign(C=lambda d: d['A']*2)
 .loc[lambda d: d['B'] == 'a']
)

output:

   A  B  C
0  1  a  2
  •  Tags:  
  • Related