count function within for loop python-CodePudding

I would like to create a count function based on a date. So it should start to count from 1 upwards till a new date is found in the dataset and then starts to count from 1 again, see example dataset and expected outcome below for an example:

data= pd.DataFrame(
    [[Timestamp('2022-08-05'), 140, 120],
    [Timestamp('2022-08-05'), 160, 155],
    [Timestamp('2022-08-06'), 230, 156],
    [Timestamp('2022-08-06'), 230, 155],
    [Timestamp('2022-08-06'), 230, 160],
    [Timestamp('2022-08-06'), 140, 130],
    [Timestamp('2022-08-07'), 140, 131],
    [Timestamp('2022-08-07'), 230, 170]],
    columns=['date', 'power', 'heart rate'])

data_expected =  pd.DataFrame(
    [[Timestamp('2022-08-05'), 140, 120, 1],
    [Timestamp('2022-08-05'), 160, 155, 2],
    [Timestamp('2022-08-06'), 230, 156, 1],
    [Timestamp('2022-08-06'), 230, 155, 2],
    [Timestamp('2022-08-06'), 230, 160, 3],
    [Timestamp('2022-08-06'), 140, 130, 4],
    [Timestamp('2022-08-07'), 140, 131, 1],
    [Timestamp('2022-08-07'), 230, 170, 2]],
    columns=['date', 'power', 'heart rate', 'count'])

what would be the best way to approach this, with a for loop?

CodePudding user response：

From your DataFrame, we can use a groupby on the column date and the method cumcount to get the expected result :

data['count'] = data.groupby(['date']).cumcount() 1

Output :

    date        power   heart rate  count
0   2022-08-05  140     120         1
1   2022-08-05  160     155         2
2   2022-08-06  230     156         1
3   2022-08-06  230     155         2
4   2022-08-06  230     160         3
5   2022-08-06  140     130         4
6   2022-08-07  140     131         1
7   2022-08-07  230     170         2

CodePudding user response：

data['count'] = data.groupby['date'].cumcount()