Home > Mobile >  How to group specific items in a column and calculate the mean
How to group specific items in a column and calculate the mean

Time:10-13

I am trying to figure out how to group specific tag names in a column and then to calculate the mean of the raw data that have the same time. My dataframe looks something like this except with 10000 rows:

tag_name time raw_data
happy     5      300
          8      340
angry     5      315
          8      349
sad       5      400
          8      480
etc.
.
.

I wish to keep the mean in the dataframe table, but I can't figure out how. I found that I can create a pivot table to calculate the mean, but I can't figure out how to single out the specific tag names I want. Below is what I have so far:

output = pd.pivot_table(data=dataset,index=['Timestamp'],columns=['Tag_Name'],values='Raw_Data',aggfunc='mean')

I am trying to get one of these outputs when I calculate the average of sad and happy:

1. optimal output:
tag_name    time raw_data  sad_happy_avg
happy        5      300     350
             8      340     410
sad          5      400
             8      480
angry        5      315
             8      349

2. alright output:
tag_name   happy   sad  avg
time
5           300    400  350
8           340    480  410

CodePudding user response:

Try as follows:

  • Use Series.isin to keep only "happy" and "sad", and apply df.pivot to get the data in the correct shape.
  • Next, add a column for the mean on axis=1:
res = df[df.tag_name.isin(['happy','sad'])].pivot(index='time', columns='tag_name', 
                                       values='raw_data')
res['avg'] = res.mean(axis=1)

print(res)

tag_name  happy  sad    avg
time                       
5           300  400  350.0
8           340  480  410.0

Your "optimal" output doesn't seem a very logical way to present/store this data, but you can achieve it as follows:

# assuming you have a "standard" index starting `0,1,2` etc. 
df['sad_happy_avg'] = df[df.tag_name.isin(['happy','sad'])]\
    .groupby('time')['raw_data'].mean().reset_index(drop=True)

print(df)

  tag_name  time  raw_data  sad_happy_avg
0    happy     5       300          350.0
1    happy     8       340          410.0
2    angry     5       315            NaN
3    angry     8       349            NaN
4      sad     5       400            NaN
5      sad     8       480            NaN
  • Related