Home > Mobile >  Pandas create a column iteratively - increasing after specific threshold
Pandas create a column iteratively - increasing after specific threshold

Time:11-30

I have a simple table which the datetime is formatted correctly on.

Datetime Diff
2021-01-01 12:00:00 0
2021-01-01 12:02:00 2
2021-01-01 12:04:00 2
2021-01-01 12:010:00 6
2021-01-01 12:020:00 10
2021-01-01 12:022:00 2

I would like to add a label/batch name which increases when a specific threshold/cutoff time is the difference. The output (with a threshold of diff > 7) I am hoping to achieve is:

Datetime Diff Batch
2021-01-01 12:00:00 0 A
2021-01-01 12:02:00 2 A
2021-01-01 12:04:00 2 A
2021-01-01 12:010:00 6 A
2021-01-01 12:020:00 10 B
2021-01-01 12:022:00 2 B

Batch doesn't need to be 'A','B','C' - probably easier to increase numerically.

I cannot find a solution online but I'm assuming there is a method to split the table on all values below the threshold, apply the batch label and concatenate again. However I cannot seem to get it working.

Any insight appreciated :)

CodePudding user response:

Since True and False values represent 1 and 0 when summed, you can use this to create a cumulative sum on a boolean column made by df.Diff > 7:

df['Batch'] = (df.Diff > 7).cumsum()

CodePudding user response:

You can achieve this by creating a custom group that has the properties you want. After you group the values your batch is simply group number. You don't have to use groupby with only an existing column. You can give a custom index and it is really powerful.

from datetime import timedelta

df['batch'] == df.groupby(((df['Datetime'] - df['Datetime'].min()) // timedelta(minutes=7)).ngroup()

CodePudding user response:

You can use:

df['Batch'] = df['Datetime'].diff().dt.total_seconds().gt(7*60) \
                            .cumsum().add(65).apply(chr)
print(df)

# Output:
             Datetime  Diff Batch
0 2021-01-01 12:00:00     0     A
1 2021-01-01 12:02:00     2     A
2 2021-01-01 12:04:00     2     A
3 2021-01-01 12:10:00     6     A
4 2021-01-01 12:20:00    10     B
5 2021-01-01 12:22:00     2     B
  • Related