I have a dataframe that looks like
ID Date feature
1 2020-05-01 2
1 2020-05-01 3
1 2020-05-01 4
2 2019-03-15 3
2 2019-03-15 2
3 2022-04-22 5
3 2022-04-22 8
3 2022-04-22 4
3 2022-04-22 2
4 2015-01-18 4
4 2015-01-18 6
4 2015-01-18 7
I sort it by time in descending order using df.sort_values('Date', ascending=False, inplace=True)
ID Date feature
3 2022-04-22 5
3 2022-04-22 8
3 2022-04-22 4
3 2022-04-22 2
1 2020-05-01 2
1 2020-05-01 3
1 2020-05-01 4
2 2019-03-15 3
2 2019-03-15 2
4 2015-01-18 4
4 2015-01-18 6
4 2015-01-18 7
and I want to add a new column n_ID@t-1
that counts the number of times that an element occur in the last ID
, so the desired output should look like
ID Date feature n_ID@t-1
3 2022-04-22 5 3 #since the last ID 1 has 3 rows
3 2022-04-22 8 3
3 2022-04-22 4 3
3 2022-04-22 2 3
1 2020-05-01 2 2 #since the last ID 2 has 2 rows
1 2020-05-01 3 2
1 2020-05-01 4 2
2 2019-03-15 3 3 #since the last ID 4 has 3 rows
2 2019-03-15 2 3
4 2015-01-18 4 0 #since there is no last ID
4 2015-01-18 6 0
4 2015-01-18 7 0
I know the .value_counts()
and the transform
functions but I don't know how to make use of this method to make the new column.
CodePudding user response:
You can use groupby.size
/value_counts
without sorting to get the size of each group and map
shift
to map the values of the previous groups:
s = df.groupby('ID', sort=False)['ID'].size()
# or
# s = df['ID'].value_counts(sort=False)
df['n_ID@t-1'] = df['ID'].map(s.shift(-1, fill_value=0))
output:
ID Date feature n_ID@t-1
5 3 2022-04-22 5 3
6 3 2022-04-22 8 3
7 3 2022-04-22 4 3
8 3 2022-04-22 2 3
0 1 2020-05-01 2 2
1 1 2020-05-01 3 2
2 1 2020-05-01 4 2
3 2 2019-03-15 3 4
4 2 2019-03-15 2 4
9 4 2015-01-18 4 0
10 4 2015-01-18 6 0
11 4 2015-01-18 7 0
intermediate s
:
ID
1 3
2 2
3 4
4 3
Name: feature, dtype: int64