Home > Mobile >  How to get previous min and max values in new columns for each group?
How to get previous min and max values in new columns for each group?

Time:10-28

In each new row of dataframe, I need to keep track of min and max values for a previous group of records.

  1. Create dataframe with input data:
import pandas as pd
columns = ['timestamp','groupid','value']
data =      [['2022-10-14 11:47:38',1000,200],
            ['2022-10-14 11:47:39',1000,210],
            ['2022-10-14 11:47:40',1000,220],
            ['2022-10-14 11:47:41',1000,230],
            ['2022-10-14 11:47:42',1001,240],
            ['2022-10-14 11:47:43',1001,250],
            ['2022-10-14 11:47:44',1002,260],
            ['2022-10-14 11:47:45',1002,270]]

df = pd.DataFrame(data=data,columns=columns)
print(df)
  1. Calculate min and max values for each group:
df['min'] = df.groupby('groupid')['value'].transform('min')
df['max'] = df.groupby('groupid')['value'].transform('max')
  1. Create new columns in dataframe:
df['pmin'] = 0
df['pmax'] = 0
df['ppmin'] = 0
df['ppmax'] = 0
df['ppmin'] = df.groupby('groupid')['pmin'].transform('first').shift(1)
df['ppmax'] = df.groupby('groupid')['pmax'].transform('first').shift(1)
df['pmin'] = df.groupby('groupid')['min'].transform('first').shift(1)
df['pmax'] = df.groupby('groupid')['max'].transform('first').shift(1)
print(df)

The code in step 3 fails to return expected result as shown below:

columns2 = ['timestamp','groupid','value','min','max','pmin','pmax','ppmin','ppmax']
data2 =      [['2022-10-14 11:47:38',1000,200,200,230,0,0,0,0],
            ['2022-10-14 11:47:39',1000,210,200,230,0,0,0,0],
            ['2022-10-14 11:47:40',1000,220,200,230,0,0,0,0],
            ['2022-10-14 11:47:41',1000,230,200,230,0,0,0,0],
            ['2022-10-14 11:47:42',1001,240,240,250,200,230,0,0],
            ['2022-10-14 11:47:43',1001,250,240,250,200,230,0,0],
            ['2022-10-14 11:47:44',1002,260,260,270,240,250,200,230],
            ['2022-10-14 11:47:45',1002,270,260,270,240,250,200,230]]
df = pd.DataFrame(data=data2,columns=columns2)
print(df)

CodePudding user response:

Build the dataframe of aggregation and shifts, and merge to get the final output:

grp = df.groupby('groupid')['value']
min_max = grp.agg(['min', 'max'])
pmin_max = min_max.add_prefix('p').shift(fill_value = 0)
ppmin_max = min_max.add_prefix('p').shift(2, fill_value = 0).add_prefix('p')
other = pd.concat([min_max, pmin_max, ppmin_max], axis = 1)
df.merge(other, on = 'groupid')

             timestamp  groupid  value  min  max  pmin  pmax  ppmin  ppmax
0  2022-10-14 11:47:38     1000    200  200  230     0     0      0      0
1  2022-10-14 11:47:39     1000    210  200  230     0     0      0      0
2  2022-10-14 11:47:40     1000    220  200  230     0     0      0      0
3  2022-10-14 11:47:41     1000    230  200  230     0     0      0      0
4  2022-10-14 11:47:42     1001    240  240  250   200   230      0      0
5  2022-10-14 11:47:43     1001    250  240  250   200   230      0      0
6  2022-10-14 11:47:44     1002    260  260  270   240   250    200    230
7  2022-10-14 11:47:45     1002    270  260  270   240   250    200    230

CodePudding user response:

Maybe not the prettiest solution:

df["tmp"] = (df["groupid"] != df["groupid"].shift()).cumsum()
grps = df.groupby("groupid")["value"]

df["min"] = grps.transform("min")
df["max"] = grps.transform("max")

mapper = df.drop_duplicates("tmp").set_index("tmp")[["min", "max"]]

tmp1 = df["tmp"] - 1
tmp2 = df["tmp"] - 2

df["pmin"] = tmp1.map(mapper["min"])
df["pmax"] = tmp1.map(mapper["max"])

df["ppmin"] = tmp2.map(mapper["min"])
df["ppmax"] = tmp2.map(mapper["max"])

print(df.fillna(0).drop(columns="tmp"))

Prints:

             timestamp  groupid  value  min  max   pmin   pmax  ppmin  ppmax
0  2022-10-14 11:47:38     1000    200  200  230    0.0    0.0    0.0    0.0
1  2022-10-14 11:47:39     1000    210  200  230    0.0    0.0    0.0    0.0
2  2022-10-14 11:47:40     1000    220  200  230    0.0    0.0    0.0    0.0
3  2022-10-14 11:47:41     1000    230  200  230    0.0    0.0    0.0    0.0
4  2022-10-14 11:47:42     1001    240  240  250  200.0  230.0    0.0    0.0
5  2022-10-14 11:47:43     1001    250  240  250  200.0  230.0    0.0    0.0
6  2022-10-14 11:47:44     1002    260  260  270  240.0  250.0  200.0  230.0
7  2022-10-14 11:47:45     1002    270  260  270  240.0  250.0  200.0  230.0
  • Related