Is it possible to do method chaining in pandas when
- no variable refering to the dataframe has been assigned, yet
- AND the method needs to refer to the dataframe?
Example: here data frame can be referred to by variable name.
df = pd.DataFrame({"a":[1,2,3], "b":list("abc")})
df = (df
.drop(df.tail(1).index)
#.other_methods
#...
)
df
Is it possible to do this without having assigned the dataframe to a variable name?
df = (pd.DataFrame({"a":[1,2,3], "b":list("abc")})
.drop(??.tail(1).index)
#.other_methods
#...
)
df
Thanks!
CodePudding user response:
You need some reference to the dataframe in order to use it in multiple independent places. That means binding a reusable name to the value returned by pd.DataFrame
.
A "functional" way to create such a binding is to use a lambda expression instead of an assignment statement.
df = (lambda df: df.drop(df.tail(1).index)....)(pd.DataFrame(...))
The lambda expression defines some function that uses whatever value is passed as an argument as the value of the name df
; you then immediately call that function on your original dataframe.
CodePudding user response:
As a complement to @chepner's answer, note that some methods/indexer support passing a function/lambda natively:
Example with assign
and loc
:
(pd.DataFrame({"A":[1,2,3], "B":list("abc")})
.assign(C=lambda d: d['A']*2)
.loc[lambda d: d['B'] == 'a']
)
output:
A B C
0 1 a 2