I have a pandas data frame with unique identifier, values for different measures for different dates as shown below
import pandas as pd
data = {
'ID' : [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3],
'Date' : ['15/01/2022', '15/01/2022', '15/01/2022', '2/2/2022', '2/2/2022', '2/2/2022', '22/01/2022', '22/01/2022', '22/01/2022', '25/12/2021', '25/12/2021', '25/12/2021', '17/02/2022', '17/02/2022', '17/02/2022', '7/1/2022', '7/1/2022', '7/1/2022'],
'Group' : ['M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3'],
'Value' : [0.78, 0.54, 0.36, 0.52, 0.68, 0.24, 0.36, 0.44, 0.17, 0.89, 0.58, 0.91, 0.25, 0.36, 0.42, 0.82, 0.56, 0.37],
'count_unique_date' : [2, 2, 2, 2, 2, 2, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3],
'n' : [1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3] }
df = pd.DataFrame(data)
df
My goal is to create a variable count_unique_date
that counts the number of unique dates for each ID
.
I would appreciate guidance on how to achieve this.
Thanks in advance!
CodePudding user response:
Use GroupBy.transform
with DataFrameGroupBy.nunique
and with lambda function:
df['count_unique_date'] = df.groupby('ID')['Date'].transform('nunique')
df['n'] = df.groupby('ID')['Date'].transform(lambda x: pd.factorize(x)[0]) 1
print (df)
ID Date Group Value count_unique_date n
0 1 15/01/2022 M1 0.78 2 1
1 1 15/01/2022 M2 0.54 2 1
2 1 15/01/2022 M3 0.36 2 1
3 1 2/2/2022 M1 0.52 2 2
4 1 2/2/2022 M2 0.68 2 2
5 1 2/2/2022 M3 0.24 2 2
6 2 22/01/2022 M1 0.36 1 1
7 2 22/01/2022 M2 0.44 1 1
8 2 22/01/2022 M3 0.17 1 1
9 3 25/12/2021 M1 0.89 3 1
10 3 25/12/2021 M2 0.58 3 1
11 3 25/12/2021 M3 0.91 3 1
12 3 17/02/2022 M1 0.25 3 2
13 3 17/02/2022 M2 0.36 3 2
14 3 17/02/2022 M3 0.42 3 2
15 3 7/1/2022 M1 0.82 3 3
16 3 7/1/2022 M2 0.56 3 3
17 3 7/1/2022 M3 0.37 3 3
Or:
df['Date'] = pd.to_datetime(df['Date'])
df['count_unique_date'] = df.groupby('ID')['Date'].transform('nunique')
df['n'] = df.groupby('ID')['Date'].rank('dense')