I have a data frame as below:
A-open | A-close | A-total | G-open | G-close | G-total | F-open | F-close | F-total |
---|---|---|---|---|---|---|---|---|
1 | 4 | 0 | 3 | 5 | 0 | 8 | 2 | 0 |
I want to sum all open and close for each title, my desirable table is:
A-open | A-close | A-total | G-open | G-close | G-total | F-open | F-close | F-total |
---|---|---|---|---|---|---|---|---|
1 | 4 | 5 | 3 | 5 | 8 | 8 | 2 | 10 |
CodePudding user response:
You can extract the id before the dash and use it to groupby
sum
on the columns:
groups = df.columns.str.split('-').str[0]
d = df.groupby(groups, axis=1).sum().add_suffix('-total')
output:
A-total F-total G-total
0 5 10 8
to insert/replace as columns:
df[d.columns] = d
output:
A-open A-close A-total G-open G-close G-total F-open F-close F-total
0 1 4 5 3 5 8 8 2 10
NB. this is assuming the 'X-total' columns are initially absent or empty. If not empty:
df[d.columns] = d-df[d.columns]
CodePudding user response:
We can convert to a MultiIndex using str.rsplit
on the columns the swaplevel
so that the top level columns are "open", "close", and "total". The benefit of a MultiIndex is that we can use index alignment to do the computations and the computations can be done idiomatically:
# Create MultiIndex
df.columns = df.columns.str.rsplit('-', n=1, expand=True).swaplevel()
# Multi Index allows for idiomatic computations like this:
df['total'] = df['open'] df['close']
# Collapse MultiIndex
df.columns = [f'{b}-{a}' for (a, b) in df.columns]
df
:
A-open A-close A-total G-open G-close G-total F-open F-close F-total
0 1 4 5 3 5 8 8 2 10
Setup:
import pandas as pd
df = pd.DataFrame({
'A-open': [1], 'A-close': [4], 'A-total': [0], 'G-open': [3],
'G-close': [5], 'G-total': [0], 'F-open': [8], 'F-close': [2],
'F-total': [0]
})
CodePudding user response:
Get a list of the different sets of letter columns, by splitting all the column names by '-' and keeping the first letter/letters, putting the output into a set to remove duplicates:
letters = list({i.split('-')[0] for i in df.columns.to_list()})
print(letters)
{'A', 'F', 'G'}
Then you can loop through these, and calculate the total column for each, by adding the open and close:
for l in letters:
df[l '-total'] = df[l '-open'] df[l '-close']
Final df:
A-open A-close A-total G-open G-close G-total F-open F-close F-total
0 1 4 5 3 5 8 8 2 10