My goal is to filter rows where only one of the columns has a value that is not NaN. Once I have those rows filtered, I want to create a scorecard for how many instances this happened per column.
Example input:
index id pass shoot flick through-ball
22450 0123 NaN NaN NaN 600
22451 6565 NaN NaN NaN 625
22452 1212 123 NaN 454 NaN
22453 0101 NaN NaN 119 NaN
22454 1234 NaN 056 98 NaN
Expected result from filtering:
index id pass shoot flick through-ball
22450 0123 NaN NaN NaN 600
22451 6565 NaN NaN NaN 625
22453 0101 NaN NaN 119 NaN
Final expected output table:
actions unique_count
pass 0
shoot 0
flick 1
through-ball 2
CodePudding user response:
To filter the dataframe, we can count the not nan value along columns
cols = ['pass', 'shoot', 'flick', 'through-ball']
filtered = df[df[cols].notna().sum(axis=1).eq(1)]
print(filtered)
index id pass shoot flick through-ball
0 22450 123 NaN NaN NaN 600.0
1 22451 6565 NaN NaN NaN 625.0
3 22453 101 NaN NaN 119.0 NaN
We can loop the columns to get unique value count in each column
out = pd.DataFrame([[col, filtered[col].nunique()] for col in cols],
columns=['actions', 'unique_count'])
print(out)
actions unique_count
0 pass 0
1 shoot 0
2 flick 1
3 through-ball 2
CodePudding user response:
We can filter it
df = df[df.notna().sum(1)==3]
Out[262]:
index id pass shoot flick through-ball
0 22450 123 NaN NaN NaN 600.0
1 22451 6565 NaN NaN NaN 625.0
3 22453 101 NaN NaN 119.0 NaN
Then count
df.drop(['index', 'id'],axis=1).count().iloc[2:]
Out[267]:
pass 0
shoot 0
flick 1
through-ball 2
dtype: int64
CodePudding user response:
df.set_index(['index', 'id'], inplace=True)
out = df[df.count(axis=1).eq(1)]
print(out)
counts = out.count().reset_index(name='unique_count').rename(columns={'index':'actions'})
print(counts)
output:
pass shoot flick through-ball
index id
22450 123 NaN NaN NaN 600.0
22451 6565 NaN NaN NaN 625.0
22453 101 NaN NaN 119.0 NaN
actions unique_count
0 pass 0
1 shoot 0
2 flick 1
3 through-ball 2