Home > Software engineering >  How to add a row cell value based on other rows cell values in Pandas?
How to add a row cell value based on other rows cell values in Pandas?

Time:12-07

I have a pandas dataframe like below

Id Date Aid
0 2022-01-01 one
1 2022-01-01 two
2 2022-01-05 one
3 2022-01-06 three
4 2022-01-02 one
5 2022-01-01 one

I would like to add a column "counter" which, for each row, will contain the number of rows with the same "Aid" and "Date" lower or equal with the row's "Date"

So, for row 4, 'Counter' = 3 (rows 0, 4, 5).

The result should be:

Id Date Aid Counter
0 2022-01-01 one 2
1 2022-01-01 two 1
2 2022-01-05 one 4
3 2022-01-06 three 1
4 2022-01-02 one 3
5 2022-01-01 one 2

Is there a way to do this count in pandas ?

The original dataframe contains milions of rows so efficiency is very important

CodePudding user response:

rank with max method

pd.to_datetime(df['Date']).groupby(df['Aid']).rank(method='max').astype('int')

output:

0    2
1    1
2    4
3    1
4    3
5    2
Name: Date, dtype: int32

make output to Counter column

df.assign(Counter=pd.to_datetime(df['Date']).groupby(df['Aid']).rank(method='max').astype('int'))

result:

Id  Date        Aid     Counter
0   2022-01-01  one     2
1   2022-01-01  two     1
2   2022-01-05  one     4
3   2022-01-06  three   1
4   2022-01-02  one     3
5   2022-01-01  one     2
  • Related