Home > OS >  Sum specific columns if they exist with dataframe
Sum specific columns if they exist with dataframe

Time:12-08

Background

I have a dataframe with multiple columns. I am trying to sum all the E and B values into a single column

t_start         t_end           B1  E1      B2  E2
1/11/2021 0:00  1/11/2021 0:05  0   2.03    0   9.01
1/11/2021 0:00  1/11/2021 0:05  0   2.03    0   9.01
1/11/2021 0:00  1/11/2021 0:05  0   2.03    0   9.01

Problem

Thing is, the file might have 10 E and B columns, (B1, B2, B3, etc).

What I've tried

I've tried the below, but it errors out if the columns don't exist

df['sum_b'] = df['b1']   df['b2']   df['b3'] ...

I've also tried using a group by solution which I saw here, but it doesn't work either, as it drops off my date columns

def left(s, amount):
    return s[:amount]

df.T.groupby([left(s, 1) for s in df.T.index.values]).sum().T)

Help Requested

If anyone knows how to make a dynamic sum formula which will add b and e columns, it would be much appreciated!

CodePudding user response:

You can use DataFrame.filter(regex=pattern), where pattern is a regular expression designed to pick out your desired columns by name.

To select columns whose name starts with B followed immediately by a digit (thus matching B1, B2, ..., B10, and so on):

df.filter(regex='^B\d')

To sum the values of all such columns:

df['sum_b'] = df.filter(regex='^B\d').sum(axis=1)
df['sum_e'] = df.filter(regex='^E\d').sum(axis=1)

CodePudding user response:

df.columns = df.columns.str.replace(r'\s*\d $', '', regex=True)
new_df = pd.concat([df[['t_start', 't_end']], df.T.drop(['t_start', 't_end']).groupby(level=0).sum().T], axis=1)

Output:

>>> new_df
          t_start           t_end    B      E
0  1/11/2021 0:00  1/11/2021 0:05  0.0  11.04
1  1/11/2021 0:00  1/11/2021 0:05  0.0  11.04
2  1/11/2021 0:00  1/11/2021 0:05  0.0  11.04
  • Related