Data Frame looks like
Unique Id Date
H1 2/03/2022
H1 2/03/2022
H1 2/03/2022
H1 3/03/2022
H1 4/03/2022
H2 9/03/2022
H2 9/03/2022
H2 10/03/2022
Expected Data Frame
Unique Id Date Count
H1 2/03/2022 1
H1 2/03/2022 1
H1 2/03/2022 1
H1 3/03/2022 2
H1 4/03/2022 3
H2 9/03/2022 1
H2 9/03/2022 1
H2 10/03/2022 2
Repetitive dates should be assigned with number 1 , else other should be assigned some other number
tried multiple approaches , please assist
CodePudding user response:
There are a bunch of ways to do this, the primary issue is going to be that you need to treat the date as a date object so that October doesn't get moved ahead of September in your second group.
import pandas as pd
df = pd.DataFrame({'Unique_Id': ['H1', 'H1', 'H1', 'H1', 'H1', 'H2', 'H2', 'H2'],
'Date': ['2/03/2022',
'2/03/2022',
'2/03/2022',
'3/03/2022',
'4/03/2022',
'9/03/2022',
'9/03/2022',
'10/03/2022']})
Dense Rank
df.groupby('Unique_Id')['Date'].apply(lambda x: pd.to_datetime(x).rank(method='dense'))
Cat Codes
df.groupby('Unique_Id')['Date'].apply(lambda x: pd.to_datetime(x).astype('category').cat.codes 1)
Factorize
df.groupby('Unique_Id')['Date'].transform(lambda x: x.factorize()[0] 1)
CodePudding user response:
here is one way to do it making use of groupby and transform
"Repetitive dates should be assigned with number 1 , else other should be assigned some other number" is what the question stated, so I choose 2 where the values are unique
df['count'] = df.groupby('Date').transform(lambda x: 1 if (x.size > 1) else 2 )
df
Unique_Id Date count
0 H1 2/03/2022 1
1 H1 2/03/2022 1
2 H1 2/03/2022 1
3 H1 3/03/2022 2
4 H1 4/03/2022 2
5 H2 9/03/2022 1
6 H2 9/03/2022 1
7 H2 10/03/2022 2