Filter rows where only one of the column has a value and create column to append scorecard in pandas-CodePudding

My goal is to filter rows where only one of the columns has a value that is not NaN. Once I have those rows filtered, I want to create a scorecard for how many instances this happened per column.

Example input:

index   id      pass shoot flick through-ball
22450   0123    NaN  NaN   NaN   600
22451   6565    NaN  NaN   NaN   625
22452   1212    123  NaN   454   NaN
22453   0101    NaN  NaN   119   NaN
22454   1234    NaN  056   98    NaN

Expected result from filtering:

index   id      pass shoot flick through-ball
22450   0123    NaN  NaN   NaN   600
22451   6565    NaN  NaN   NaN   625
22453   0101    NaN  NaN   119   NaN

Final expected output table:

actions        unique_count 
pass           0
shoot          0
flick          1
through-ball   2

CodePudding user response：

To filter the dataframe, we can count the not nan value along columns

cols = ['pass', 'shoot', 'flick', 'through-ball']

filtered = df[df[cols].notna().sum(axis=1).eq(1)]

print(filtered)

   index    id  pass  shoot  flick  through-ball
0  22450   123   NaN    NaN    NaN         600.0
1  22451  6565   NaN    NaN    NaN         625.0
3  22453   101   NaN    NaN  119.0           NaN

We can loop the columns to get unique value count in each column

out = pd.DataFrame([[col, filtered[col].nunique()] for col in cols],
                   columns=['actions', 'unique_count'])

print(out)

        actions  unique_count
0          pass             0
1         shoot             0
2         flick             1
3  through-ball             2

CodePudding user response：

We can filter it

df = df[df.notna().sum(1)==3]
Out[262]: 
   index    id  pass  shoot  flick  through-ball
0  22450   123   NaN    NaN    NaN         600.0
1  22451  6565   NaN    NaN    NaN         625.0
3  22453   101   NaN    NaN  119.0           NaN

Then count

df.drop(['index', 'id'],axis=1).count().iloc[2:]
Out[267]: 
pass            0
shoot           0
flick           1
through-ball    2
dtype: int64

CodePudding user response：

df.set_index(['index', 'id'], inplace=True)
out = df[df.count(axis=1).eq(1)]
print(out)

counts = out.count().reset_index(name='unique_count').rename(columns={'index':'actions'})
print(counts)

output:

            pass  shoot  flick  through-ball
index id
22450 123    NaN    NaN    NaN         600.0
22451 6565   NaN    NaN    NaN         625.0
22453 101    NaN    NaN  119.0           NaN

        actions  unique_count
0          pass             0
1         shoot             0
2         flick             1
3  through-ball             2