Home > Net >  Create new columns in pandas df by grouping and performing operations on an existing column
Create new columns in pandas df by grouping and performing operations on an existing column

Time:06-05

I have a dataframe that looks like this (Minimal Reproducible Example)

thermometers = ['T-10000_0001', 'T-10000_0002','T-10000_0003', 'T-10000_0004', 
                'T-10001_0001', 'T-10001_0002', 'T-10001_0003', 'T-10001_0004', 
                'T-10002_0001', 'T-10002_0003', 'T-10002_0003', 'T-10002_0004']

temperatures = [15.1, 14.9, 12.7, 10.8,
               19.8, 18.3, 17.7, 18.1,
               20.0, 16.4, 17.6, 19.3]

df_set = {'thermometers': thermometers,
         'Temperatures': temperatures}

df = pd.DataFrame(df_set)
Index Thermometer Temperature
0 T-10000_0001 14.9
1 T-10000_0002 12.7
2 T-10000_0003 12.7
3 T-10000_0004 10.8
4 T-10001_0001 19.8
5 T-10001_0002 18.3
6 T-10001_0003 17.7
7 T-10001_0004 18.1
8 T-10002_0001 20.0
9 T-10002_0002 16.4
10 T-10002_0003 17.6
11 T-10002_0004 19.3

I am trying to group the thermometers (i.e 'T-10000', 'T-10001', 'T-10002'), and create new columns with the min, max and average of each thermometer reading. So my final data frame would look like this

Index Thermometer min_temp average_temp max_temp
0 T-10000 10.8 12.8 14.9
1 T-10001 17.7 18.5 19.8
2 T-10002 16.4 18.3 20.0

I tried creating a separate function which I think requires regular expression, but I'm unable to figure out how to go about it. Any help will be much appreciated.

CodePudding user response:

Use groupby by splitting with your delimiter _. Then, just aggregate with whatever functions you need.

>>> df.groupby(df['thermometers']\
               .str.split('_').  \
               .str.get(0)).agg(['min', 'mean', 'max'])

                      min    mean   max
thermometers                           
T-10000              10.8  13.375  15.1
T-10001              17.7  18.475  19.8
T-10002              16.4  18.325  20.0

CodePudding user response:

Another approach with str.extract to avoid the call to str.get:

(df['Temperatures']
 .groupby(df['thermometers'].str.extract('(^[^_] )', expand=False))
 .agg(['min', 'mean'])
 )

Output:

               min    mean
thermometers              
T-10000       10.8  13.375
T-10001       17.7  18.475
T-10002       16.4  18.325
  • Related