Pandas method chaining when df not assigned yet-CodePudding

Is it possible to do method chaining in pandas when

no variable refering to the dataframe has been assigned, yet
AND the method needs to refer to the dataframe?

Example: here data frame can be referred to by variable name.

df = pd.DataFrame({"a":[1,2,3], "b":list("abc")})
df = (df
      .drop(df.tail(1).index)
      #.other_methods
      #...
      )
df

Is it possible to do this without having assigned the dataframe to a variable name?

df = (pd.DataFrame({"a":[1,2,3], "b":list("abc")})
      .drop(??.tail(1).index)
      #.other_methods
      #...
      )
df

Thanks!

CodePudding user response：

You need some reference to the dataframe in order to use it in multiple independent places. That means binding a reusable name to the value returned by pd.DataFrame.

A "functional" way to create such a binding is to use a lambda expression instead of an assignment statement.

df = (lambda df: df.drop(df.tail(1).index)....)(pd.DataFrame(...))

The lambda expression defines some function that uses whatever value is passed as an argument as the value of the name df; you then immediately call that function on your original dataframe.

CodePudding user response：

As a complement to @chepner's answer, note that some methods/indexer support passing a function/lambda natively:

Example with assign and loc:

(pd.DataFrame({"A":[1,2,3], "B":list("abc")})
 .assign(C=lambda d: d['A']*2)
 .loc[lambda d: d['B'] == 'a']
)

output:

   A  B  C
0  1  a  2