I am trying to figure out how to group specific tag names in a column and then to calculate the mean of the raw data that have the same time. My dataframe looks something like this except with 10000 rows:
tag_name time raw_data
happy 5 300
8 340
angry 5 315
8 349
sad 5 400
8 480
etc.
.
.
I wish to keep the mean in the dataframe table, but I can't figure out how. I found that I can create a pivot table to calculate the mean, but I can't figure out how to single out the specific tag names I want. Below is what I have so far:
output = pd.pivot_table(data=dataset,index=['Timestamp'],columns=['Tag_Name'],values='Raw_Data',aggfunc='mean')
I am trying to get one of these outputs when I calculate the average of sad and happy:
1. optimal output:
tag_name time raw_data sad_happy_avg
happy 5 300 350
8 340 410
sad 5 400
8 480
angry 5 315
8 349
2. alright output:
tag_name happy sad avg
time
5 300 400 350
8 340 480 410
CodePudding user response:
Try as follows:
- Use
Series.isin
to keep only "happy" and "sad", and applydf.pivot
to get the data in the correct shape. - Next, add a column for the
mean
onaxis=1
:
res = df[df.tag_name.isin(['happy','sad'])].pivot(index='time', columns='tag_name',
values='raw_data')
res['avg'] = res.mean(axis=1)
print(res)
tag_name happy sad avg
time
5 300 400 350.0
8 340 480 410.0
Your "optimal" output
doesn't seem a very logical way to present/store this data, but you can achieve it as follows:
# assuming you have a "standard" index starting `0,1,2` etc.
df['sad_happy_avg'] = df[df.tag_name.isin(['happy','sad'])]\
.groupby('time')['raw_data'].mean().reset_index(drop=True)
print(df)
tag_name time raw_data sad_happy_avg
0 happy 5 300 350.0
1 happy 8 340 410.0
2 angry 5 315 NaN
3 angry 8 349 NaN
4 sad 5 400 NaN
5 sad 8 480 NaN