Sum some of Dataframe with special characters-CodePudding

I have a data frame as below:

A-open	A-close	A-total	G-open	G-close	G-total	F-open	F-close	F-total
1	4	0	3	5	0	8	2	0

I want to sum all open and close for each title, my desirable table is:

A-open	A-close	A-total	G-open	G-close	G-total	F-open	F-close	F-total
1	4	5	3	5	8	8	2	10

CodePudding user response：

You can extract the id before the dash and use it to groupby sum on the columns:

groups = df.columns.str.split('-').str[0]
d = df.groupby(groups, axis=1).sum().add_suffix('-total')

output:

   A-total  F-total  G-total
0        5       10        8

to insert/replace as columns:

df[d.columns] = d

output:

   A-open  A-close  A-total  G-open  G-close  G-total  F-open  F-close  F-total
0       1        4        5       3        5        8       8        2       10

NB. this is assuming the 'X-total' columns are initially absent or empty. If not empty:

df[d.columns] = d-df[d.columns]

CodePudding user response：

We can convert to a MultiIndex using str.rsplit on the columns the swaplevel so that the top level columns are "open", "close", and "total". The benefit of a MultiIndex is that we can use index alignment to do the computations and the computations can be done idiomatically:

# Create MultiIndex
df.columns = df.columns.str.rsplit('-', n=1, expand=True).swaplevel()

# Multi Index allows for idiomatic computations like this:
df['total'] = df['open']   df['close']

# Collapse MultiIndex
df.columns = [f'{b}-{a}' for (a, b) in df.columns]

df:

   A-open  A-close  A-total  G-open  G-close  G-total  F-open  F-close  F-total
0       1        4        5       3        5        8       8        2       10

Setup:

import pandas as pd

df = pd.DataFrame({
    'A-open': [1], 'A-close': [4], 'A-total': [0], 'G-open': [3],
    'G-close': [5], 'G-total': [0], 'F-open': [8], 'F-close': [2],
    'F-total': [0]
})

CodePudding user response：

Get a list of the different sets of letter columns, by splitting all the column names by '-' and keeping the first letter/letters, putting the output into a set to remove duplicates:

letters = list({i.split('-')[0] for i in df.columns.to_list()})
print(letters)
{'A', 'F', 'G'}

Then you can loop through these, and calculate the total column for each, by adding the open and close:

for l in letters:
    df[l   '-total'] = df[l   '-open']   df[l   '-close']

Final df:

   A-open  A-close  A-total  G-open  G-close  G-total  F-open  F-close  F-total
0       1        4        5       3        5        8       8        2       10