Home > front end >  Create a variable with number of unique Values by ID
Create a variable with number of unique Values by ID

Time:03-16

I have a pandas data frame with unique identifier, values for different measures for different dates as shown below

import pandas as pd

data = {
    'ID' : [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3],
    'Date' : ['15/01/2022', '15/01/2022', '15/01/2022', '2/2/2022', '2/2/2022', '2/2/2022', '22/01/2022', '22/01/2022', '22/01/2022', '25/12/2021', '25/12/2021', '25/12/2021', '17/02/2022', '17/02/2022', '17/02/2022', '7/1/2022', '7/1/2022', '7/1/2022'],
    'Group' : ['M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3', 'M1', 'M2', 'M3'],
    'Value' : [0.78, 0.54, 0.36, 0.52, 0.68, 0.24, 0.36, 0.44, 0.17, 0.89, 0.58, 0.91, 0.25, 0.36, 0.42, 0.82, 0.56, 0.37],
    'count_unique_date' : [2, 2, 2, 2, 2, 2, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3],
'n' : [1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3] }

df = pd.DataFrame(data)
df

My goal is to create a variable count_unique_date that counts the number of unique dates for each ID. I would appreciate guidance on how to achieve this.

Thanks in advance!

CodePudding user response:

Use GroupBy.transform with DataFrameGroupBy.nunique and with lambda function:

df['count_unique_date'] = df.groupby('ID')['Date'].transform('nunique')
df['n'] = df.groupby('ID')['Date'].transform(lambda x: pd.factorize(x)[0])   1

print (df)
    ID        Date Group  Value  count_unique_date  n
0    1  15/01/2022    M1   0.78                  2  1
1    1  15/01/2022    M2   0.54                  2  1
2    1  15/01/2022    M3   0.36                  2  1
3    1    2/2/2022    M1   0.52                  2  2
4    1    2/2/2022    M2   0.68                  2  2
5    1    2/2/2022    M3   0.24                  2  2
6    2  22/01/2022    M1   0.36                  1  1
7    2  22/01/2022    M2   0.44                  1  1
8    2  22/01/2022    M3   0.17                  1  1
9    3  25/12/2021    M1   0.89                  3  1
10   3  25/12/2021    M2   0.58                  3  1
11   3  25/12/2021    M3   0.91                  3  1
12   3  17/02/2022    M1   0.25                  3  2
13   3  17/02/2022    M2   0.36                  3  2
14   3  17/02/2022    M3   0.42                  3  2
15   3    7/1/2022    M1   0.82                  3  3
16   3    7/1/2022    M2   0.56                  3  3
17   3    7/1/2022    M3   0.37                  3  3

Or:

df['Date'] = pd.to_datetime(df['Date'])

df['count_unique_date'] = df.groupby('ID')['Date'].transform('nunique')
df['n'] = df.groupby('ID')['Date'].rank('dense')
  • Related