I have a dataframe which I want to group, filter columns by regex, and then sum.
My code looks like this:
import pandas as pd
df = pd.DataFrame({'ID':[1,1,2,2,3,3],
'Invasive' : [12,1,1,0,1,0],
'invasive': [1,4,5,3,4,6],
'Wild':[4,7,1,0,0,0],
'wild':[0,0,9,8,3,2],
'Crop':[0,0,0,0,0,0],
'Crop_2':[2,3,2,2,1,2]})
df.groupby(['ID']).filter(regex='(Invasive)|(invasive)|(Wild)|(wild)').sum()
The error message I get is:
DataFrameGroupBy.filter() missing 1 required positional argument: 'func'
I get the same Err message if groupby
comes after filter
Why does this happen? Where do I input the func
argument?
EDIT:
My Expected output is one column that has summed across the filtered columns and is grouped by ID. E.g.:
ID Output
0 1 29
1 2 27
2 3 16
CodePudding user response:
What you want to do doesn't make sense, groupby.filter
is to filter rows, not to be confused with DataFrame.filter
.
You likely want to filter the columns, then to aggregate:
df.filter(regex='(?i)(Invasive|Wild)').groupby(df['ID']).sum()
NB. I replaced (Invasive)|(invasive)|(Wild)|(wild)
by (?i)(Invasive|Wild)
, which means 'InvasiveOR
Wild` independently of the case.
Output:
Invasive invasive Wild wild
ID
1 13 5 11 0
2 1 8 1 17
3 1 10 0 5
edit:
the output that you show needs a further summation per row:
out = (df.filter(regex='(?i)(Invasive|Wild)')
.groupby(df['ID']).sum()
.sum(axis=1)
.reset_index(name='Output')
)
# or with summation before:
out = (df.filter(regex='(?i)(Invasive|Wild)')
.sum(axis=1)
.groupby(df['ID']).sum()
.reset_index(name='Output')
)
Output:
ID Output
0 1 29
1 2 27
2 3 16