I have a dataframe where my data looks like the following atm.
date | value | name |
---|---|---|
2022-01-19 | 98 | audiobook_processed |
2022-01-19 | 69 | n_known_errors |
2022-01-19 | 2 | n_unknown_errors |
2022-01-21 | 14 | audiobook_processed |
2022-01-21 | 1 | n_known_errors |
2022-01-21 | 0 | n_unknown_errors |
2022-01-24 | 56 | audiobook_processed |
2022-01-24 | 1 | n_known_errors |
2022-01-24 | 2 | n_unknown_errors |
But I need to get the "total" amount of audiobook_processed
by subtracting n_know_errors
and n_unknown_errors
so the data would be
date | value | name |
---|---|---|
2022-01-19 | 27 | audiobook_processed |
2022-01-19 | 69 | n_known_errors |
2022-01-19 | 2 | n_unknown_errors |
2022-01-21 | 13 | audiobook_processed |
2022-01-21 | 1 | n_known_errors |
2022-01-21 | 0 | n_unknown_errors |
2022-01-24 | 53 | audiobook_processed |
2022-01-24 | 1 | n_known_errors |
2022-01-24 | 2 | n_unknown_errors |
My code:
from xxx.input import get_automation_xxx
def graph_visual_data(script_title):
data = get_automation_xxx(script_title)
data = data[data.name.isin(['audiobook_processed', 'n_unknown_errors', 'n_known_errors'])][['date','value','name']]
data.date = data.date.dt.date
data.value = data.value.astype(int)
data = data.groupby(['date','name'], as_index=False).sum()
data['sub_total'] = (data.name['audiobook_processed'] - (data.name['n_unknown_errors'] data.name['n_known_errors'])) <-- This fails...
fig = alt.Chart(data).mark_bar(size=10).encode(
x='date',
y='value',
color='name',
tooltip=['name', 'value']
).interactive()
st.altair_chart(fig, use_container_width=True)
I've tried to create a new table row with the combined result but with no luck... Anyone have a smart way of solving this?
Thanks in advance
CodePudding user response:
You could use slicing to modify your data in place:
m = df['name'].eq('audiobook_processed')
df.loc[m, 'value'] -= df['value'].mask(m).groupby(df['date']).transform('sum')
output:
date value name
0 2022-01-19 27.0 audiobook_processed
1 2022-01-19 69.0 n_known_errors
2 2022-01-19 2.0 n_unknown_errors
3 2022-01-21 13.0 audiobook_processed
4 2022-01-21 1.0 n_known_errors
5 2022-01-21 0.0 n_unknown_errors
6 2022-01-24 53.0 audiobook_processed
7 2022-01-24 1.0 n_known_errors
8 2022-01-24 2.0 n_unknown_errors
Alternative, you could pivot, apply the calculation and reshape:
(df.pivot('date', 'name', 'value')
.assign(audiobook_processed=lambda d: d['audiobook_processed'] - d[['n_known_errors', 'n_unknown_errors']].sum(1))
.stack()
.reset_index()
.rename({0: 'name'}, axis=1)
)
or with eval
:
(df.pivot('date', 'name', 'value')
.eval('audiobook_processed = audiobook_processed-n_known_errors-n_unknown_errors')
.stack()
.reset_index()
.rename({0: 'name'}, axis=1)
)