My data frame looks like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[0,1,2,3,4],
'date1': ['2020-12-01','2020-12-01',np.nan,'2018-12-01',np.nan],
'date2': ['2015-04-01','2015-04-01','2018-12-01','2018-12-01',np.nan],
'date3': [np.nan,'2013-12-01','2018-12-01','2018-12-01',np.nan]
})
I'm trying to apply a function like nunique()
over each of the data columns for each ID to obtain then sum of distinct dates. I have tried using agg()
function in groupby.
Resulted data frame would look like:
CodePudding user response:
Use nunique
on axis=1
after filtering out the ID column:
out = df[['ID']]
out['unique_sum'] = df.drop(columns='ID').nunique(axis=1)
Or with filter
:
out = df[['ID']]
out['unique_sum'] = df.filter(like='date').nunique(axis=1)
Or, as chained commands:
out = (
df[['ID']]
.assign(unique_sum=(df.drop(columns='ID').nunique(axis=1)))
)
Output:
ID unique_sum
0 0 2
1 1 3
2 2 1
3 3 1
4 4 0
CodePudding user response:
Apply Dateframe.nunique
to selected date values:
df['uniq_sum'] = df.filter(like='date').nunique(axis=1)
ID date1 date2 date3 uniq_sum
0 0 2020-12-01 2015-04-01 NaN 2
1 1 2020-12-01 2015-04-01 2013-12-01 3
2 2 NaN 2018-12-01 2018-12-01 1
3 3 2018-12-01 2018-12-01 2018-12-01 1
4 4 NaN NaN NaN 0