Background
I have a dataframe with multiple columns. I am trying to sum all the E and B values into a single column
t_start t_end B1 E1 B2 E2
1/11/2021 0:00 1/11/2021 0:05 0 2.03 0 9.01
1/11/2021 0:00 1/11/2021 0:05 0 2.03 0 9.01
1/11/2021 0:00 1/11/2021 0:05 0 2.03 0 9.01
Problem
Thing is, the file might have 10 E and B columns, (B1, B2, B3, etc).
What I've tried
I've tried the below, but it errors out if the columns don't exist
df['sum_b'] = df['b1'] df['b2'] df['b3'] ...
I've also tried using a group by solution which I saw here, but it doesn't work either, as it drops off my date columns
def left(s, amount):
return s[:amount]
df.T.groupby([left(s, 1) for s in df.T.index.values]).sum().T)
Help Requested
If anyone knows how to make a dynamic sum formula which will add b and e columns, it would be much appreciated!
CodePudding user response:
You can use DataFrame.filter(regex=pattern)
, where pattern
is a regular expression designed to pick out your desired columns by name.
To select columns whose name starts with B
followed immediately by a digit (thus matching B1
, B2
, ..., B10
, and so on):
df.filter(regex='^B\d')
To sum the values of all such columns:
df['sum_b'] = df.filter(regex='^B\d').sum(axis=1)
df['sum_e'] = df.filter(regex='^E\d').sum(axis=1)
CodePudding user response:
df.columns = df.columns.str.replace(r'\s*\d $', '', regex=True)
new_df = pd.concat([df[['t_start', 't_end']], df.T.drop(['t_start', 't_end']).groupby(level=0).sum().T], axis=1)
Output:
>>> new_df
t_start t_end B E
0 1/11/2021 0:00 1/11/2021 0:05 0.0 11.04
1 1/11/2021 0:00 1/11/2021 0:05 0.0 11.04
2 1/11/2021 0:00 1/11/2021 0:05 0.0 11.04