I have a pandas dataframe like below
Id | Date | Aid |
---|---|---|
0 | 2022-01-01 | one |
1 | 2022-01-01 | two |
2 | 2022-01-05 | one |
3 | 2022-01-06 | three |
4 | 2022-01-02 | one |
5 | 2022-01-01 | one |
I would like to add a column "counter" which, for each row, will contain the number of rows with the same "Aid" and "Date" lower or equal with the row's "Date"
So, for row 4, 'Counter' = 3 (rows 0, 4, 5).
The result should be:
Id | Date | Aid | Counter |
---|---|---|---|
0 | 2022-01-01 | one | 2 |
1 | 2022-01-01 | two | 1 |
2 | 2022-01-05 | one | 4 |
3 | 2022-01-06 | three | 1 |
4 | 2022-01-02 | one | 3 |
5 | 2022-01-01 | one | 2 |
Is there a way to do this count in pandas ?
The original dataframe contains milions of rows so efficiency is very important
CodePudding user response:
rank
with max method
pd.to_datetime(df['Date']).groupby(df['Aid']).rank(method='max').astype('int')
output:
0 2
1 1
2 4
3 1
4 3
5 2
Name: Date, dtype: int32
make output to Counter column
df.assign(Counter=pd.to_datetime(df['Date']).groupby(df['Aid']).rank(method='max').astype('int'))
result:
Id Date Aid Counter
0 2022-01-01 one 2
1 2022-01-01 two 1
2 2022-01-05 one 4
3 2022-01-06 three 1
4 2022-01-02 one 3
5 2022-01-01 one 2