Home > Net >  apply a function over each row to find unique values in multiple columns
apply a function over each row to find unique values in multiple columns

Time:01-19

My data frame looks like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({'ID':[0,1,2,3,4],
                   'date1': ['2020-12-01','2020-12-01',np.nan,'2018-12-01',np.nan],
                   'date2': ['2015-04-01','2015-04-01','2018-12-01','2018-12-01',np.nan],
                   'date3': [np.nan,'2013-12-01','2018-12-01','2018-12-01',np.nan]
})

I'm trying to apply a function like nunique() over each of the data columns for each ID to obtain then sum of distinct dates. I have tried using agg() function in groupby.

Resulted data frame would look like: enter image description here

CodePudding user response:

Use nunique on axis=1 after filtering out the ID column:

out = df[['ID']]
out['unique_sum'] = df.drop(columns='ID').nunique(axis=1)

Or with filter:

out = df[['ID']]
out['unique_sum'] = df.filter(like='date').nunique(axis=1)

Or, as chained commands:

out = (
 df[['ID']]
 .assign(unique_sum=(df.drop(columns='ID').nunique(axis=1)))
 )

Output:


   ID  unique_sum
0   0           2
1   1           3
2   2           1
3   3           1
4   4           0

CodePudding user response:

Apply Dateframe.nunique to selected date values:

df['uniq_sum'] = df.filter(like='date').nunique(axis=1)

   ID       date1       date2       date3  uniq_sum
0   0  2020-12-01  2015-04-01         NaN         2
1   1  2020-12-01  2015-04-01  2013-12-01         3
2   2         NaN  2018-12-01  2018-12-01         1
3   3  2018-12-01  2018-12-01  2018-12-01         1
4   4         NaN         NaN         NaN         0
  • Related