I have a dataframe, for instance:
df = pd.DataFrame({'Host': {0: 'N',
1: 'B',
2: 'N',
3: 'N',
4: 'N',
5: 'V',
6: 'B'},
'Registration': {0: 'Registered',
1: 'MR',
2: 'Registered',
3: 'Registered',
4: '',
5: 'Registered',
6: 'Registered'},
'Val': {0: 'N',
1: 'B',
2: 'N',
3: 'N',
4: '',
5: 'V',
6: 'B'},
'Sum': {0: 100.0,
1: 0.0,
2: 300.0,
3: 150.0,
4: 0.0,
5: 0.0,
6: 20.0}})
I want to get the count, for each Host. Something like:
df.groupby("Host").count()
"""
Host Registration Val Sum
B 2 2 2
N 4 4 4
V 1 1 1
"""
But I want it conditional as a function of each column. For example, I want to count in Sum
, only those rows that have more than 0.0, and in the others the ones that are not empty. So my expected output would be:
Host Registration Val Sum
B 2 2 1
N 3 3 3
V 1 1 0
"""
Not sure how to do that. My best attempt has been:
df.groupby("Host").agg({'Registration': lambda x: (x != "").count(),
'Val':lambda x: (x != "").count(),
'Sum': lambda x: (x != 0).count()})
But this produces the same output as df.groupby("Host").count()
Any suggestion?
CodePudding user response:
First your solution - for count True
s values use sum
:
df = df.groupby("Host").agg({'Registration': lambda x: (x != "").sum(),
'Val':lambda x: (x != "").sum(),
'Sum': lambda x: (x != 0).sum()})
print (df)
Registration Val Sum
Host
B 2 2 1.0
N 3 3 3.0
V 1 1 0.0
Improved solution - create boolean columns before aggregation sum
:
df = df.assign(Registration = df['Registration'].ne(""),
Val = df['Val'].ne(""),
Sum = df['Sum'].ne(0)).groupby("Host").sum()
print (df)
Registration Val Sum
Host
B 2 2 1
N 3 3 3
V 1 1 0