I would like to create a count function based on a date. So it should start to count from 1 upwards till a new date is found in the dataset and then starts to count from 1 again, see example dataset and expected outcome below for an example:
data= pd.DataFrame(
[[Timestamp('2022-08-05'), 140, 120],
[Timestamp('2022-08-05'), 160, 155],
[Timestamp('2022-08-06'), 230, 156],
[Timestamp('2022-08-06'), 230, 155],
[Timestamp('2022-08-06'), 230, 160],
[Timestamp('2022-08-06'), 140, 130],
[Timestamp('2022-08-07'), 140, 131],
[Timestamp('2022-08-07'), 230, 170]],
columns=['date', 'power', 'heart rate'])
data_expected = pd.DataFrame(
[[Timestamp('2022-08-05'), 140, 120, 1],
[Timestamp('2022-08-05'), 160, 155, 2],
[Timestamp('2022-08-06'), 230, 156, 1],
[Timestamp('2022-08-06'), 230, 155, 2],
[Timestamp('2022-08-06'), 230, 160, 3],
[Timestamp('2022-08-06'), 140, 130, 4],
[Timestamp('2022-08-07'), 140, 131, 1],
[Timestamp('2022-08-07'), 230, 170, 2]],
columns=['date', 'power', 'heart rate', 'count'])
what would be the best way to approach this, with a for loop?
CodePudding user response:
From your DataFrame, we can use a groupby
on the column date
and the method cumcount
to get the expected result :
data['count'] = data.groupby(['date']).cumcount() 1
Output :
date power heart rate count
0 2022-08-05 140 120 1
1 2022-08-05 160 155 2
2 2022-08-06 230 156 1
3 2022-08-06 230 155 2
4 2022-08-06 230 160 3
5 2022-08-06 140 130 4
6 2022-08-07 140 131 1
7 2022-08-07 230 170 2
CodePudding user response:
data['count'] = data.groupby['date'].cumcount()