Create new columns in pandas df by grouping and performing operations on an existing column-CodePudding

I have a dataframe that looks like this (Minimal Reproducible Example)

thermometers = ['T-10000_0001', 'T-10000_0002','T-10000_0003', 'T-10000_0004', 
                'T-10001_0001', 'T-10001_0002', 'T-10001_0003', 'T-10001_0004', 
                'T-10002_0001', 'T-10002_0003', 'T-10002_0003', 'T-10002_0004']

temperatures = [15.1, 14.9, 12.7, 10.8,
               19.8, 18.3, 17.7, 18.1,
               20.0, 16.4, 17.6, 19.3]

df_set = {'thermometers': thermometers,
         'Temperatures': temperatures}

df = pd.DataFrame(df_set)

Index	Thermometer	Temperature
0	T-10000_0001	14.9
1	T-10000_0002	12.7
2	T-10000_0003	12.7
3	T-10000_0004	10.8
4	T-10001_0001	19.8
5	T-10001_0002	18.3
6	T-10001_0003	17.7
7	T-10001_0004	18.1
8	T-10002_0001	20.0
9	T-10002_0002	16.4
10	T-10002_0003	17.6
11	T-10002_0004	19.3

I am trying to group the thermometers (i.e 'T-10000', 'T-10001', 'T-10002'), and create new columns with the min, max and average of each thermometer reading. So my final data frame would look like this

Index	Thermometer	min_temp	average_temp	max_temp
0	T-10000	10.8	12.8	14.9
1	T-10001	17.7	18.5	19.8
2	T-10002	16.4	18.3	20.0

I tried creating a separate function which I think requires regular expression, but I'm unable to figure out how to go about it. Any help will be much appreciated.

CodePudding user response：

Use groupby by splitting with your delimiter _. Then, just aggregate with whatever functions you need.

>>> df.groupby(df['thermometers']\
               .str.split('_').  \
               .str.get(0)).agg(['min', 'mean', 'max'])

                      min    mean   max
thermometers                           
T-10000              10.8  13.375  15.1
T-10001              17.7  18.475  19.8
T-10002              16.4  18.325  20.0

CodePudding user response：

Another approach with str.extract to avoid the call to str.get:

(df['Temperatures']
 .groupby(df['thermometers'].str.extract('(^[^_] )', expand=False))
 .agg(['min', 'mean'])
 )

Output:

               min    mean
thermometers              
T-10000       10.8  13.375
T-10001       17.7  18.475
T-10002       16.4  18.325