I am currently trying to create a generalised function which subsets a dataset based on the list of column names specified in the argument parameters. This function works well when one column is specified, but fails when more than one column is specified. I would like a function which is able to accommodate multiple columns input as the argument parameter.
import pandas as pd
testdb=pd.DataFrame({'first':[1,3,4],'second':[1,3,4],'last':[1,3,4],'static':[1,3,4]})
def subsetting(df,cols):
print(df.loc[:, [cols, "static"]])
# this works
subsetting(testdb,'first')
# this does not work
subsetting(testdb,str('first','second'))
CodePudding user response:
I would design the API like this.
You can passed in a list of cols
that you want to select from dataframe and then use
df.loc[:, [*cols, "static"]]
syntax to unpack it as separate column names. ie,
>>> import pandas as pd
>>>
>>> testdb = pd.DataFrame(
... {"first": [1, 3, 4], "second": [1, 3, 4], "last": [1, 3, 4], "static": [1, 3, 4]}
... )
>>>
>>>
>>> def subsetting(df, cols):
... print(df.loc[:, [*cols, "static"]])
...
>>>
>>> subsetting(testdb, cols=("first", "second"))
first second static
0 1 1 1
1 3 3 3
2 4 4 4