Home > Software engineering >  Subtract and add values from rows in dataframe
Subtract and add values from rows in dataframe

Time:02-15

I have a dataframe where my data looks like the following atm.

date value name
2022-01-19 98 audiobook_processed
2022-01-19 69 n_known_errors
2022-01-19 2 n_unknown_errors
2022-01-21 14 audiobook_processed
2022-01-21 1 n_known_errors
2022-01-21 0 n_unknown_errors
2022-01-24 56 audiobook_processed
2022-01-24 1 n_known_errors
2022-01-24 2 n_unknown_errors

But I need to get the "total" amount of audiobook_processed by subtracting n_know_errors and n_unknown_errors so the data would be

date value name
2022-01-19 27 audiobook_processed
2022-01-19 69 n_known_errors
2022-01-19 2 n_unknown_errors
2022-01-21 13 audiobook_processed
2022-01-21 1 n_known_errors
2022-01-21 0 n_unknown_errors
2022-01-24 53 audiobook_processed
2022-01-24 1 n_known_errors
2022-01-24 2 n_unknown_errors

My code:

from xxx.input import get_automation_xxx

def graph_visual_data(script_title):
    data = get_automation_xxx(script_title)
    data = data[data.name.isin(['audiobook_processed', 'n_unknown_errors', 'n_known_errors'])][['date','value','name']]
    data.date = data.date.dt.date
    data.value = data.value.astype(int)

    data = data.groupby(['date','name'], as_index=False).sum()
    data['sub_total'] = (data.name['audiobook_processed'] - (data.name['n_unknown_errors']   data.name['n_known_errors'])) <-- This fails...

    fig = alt.Chart(data).mark_bar(size=10).encode(
        x='date',
        y='value',
        color='name',
        tooltip=['name', 'value']
    ).interactive()
    st.altair_chart(fig, use_container_width=True)

I've tried to create a new table row with the combined result but with no luck... Anyone have a smart way of solving this?

Thanks in advance

CodePudding user response:

You could use slicing to modify your data in place:

m = df['name'].eq('audiobook_processed')
df.loc[m, 'value'] -= df['value'].mask(m).groupby(df['date']).transform('sum')

output:

         date  value                 name
0  2022-01-19   27.0  audiobook_processed
1  2022-01-19   69.0       n_known_errors
2  2022-01-19    2.0     n_unknown_errors
3  2022-01-21   13.0  audiobook_processed
4  2022-01-21    1.0       n_known_errors
5  2022-01-21    0.0     n_unknown_errors
6  2022-01-24   53.0  audiobook_processed
7  2022-01-24    1.0       n_known_errors
8  2022-01-24    2.0     n_unknown_errors

Alternative, you could pivot, apply the calculation and reshape:

(df.pivot('date', 'name', 'value')
   .assign(audiobook_processed=lambda d: d['audiobook_processed'] - d[['n_known_errors', 'n_unknown_errors']].sum(1))
   .stack()
   .reset_index()
   .rename({0: 'name'}, axis=1)
)

or with eval:

(df.pivot('date', 'name', 'value')
   .eval('audiobook_processed = audiobook_processed-n_known_errors-n_unknown_errors')
   .stack()
   .reset_index()
   .rename({0: 'name'}, axis=1)
)
  • Related