Home > Enterprise >  Sum some of Dataframe with special characters
Sum some of Dataframe with special characters

Time:12-02

I have a data frame as below:

A-open A-close A-total G-open G-close G-total F-open F-close F-total
1 4 0 3 5 0 8 2 0

I want to sum all open and close for each title, my desirable table is:

A-open A-close A-total G-open G-close G-total F-open F-close F-total
1 4 5 3 5 8 8 2 10

CodePudding user response:

You can extract the id before the dash and use it to groupby sum on the columns:

groups = df.columns.str.split('-').str[0]
d = df.groupby(groups, axis=1).sum().add_suffix('-total')

output:

   A-total  F-total  G-total
0        5       10        8

to insert/replace as columns:

df[d.columns] = d

output:

   A-open  A-close  A-total  G-open  G-close  G-total  F-open  F-close  F-total
0       1        4        5       3        5        8       8        2       10

NB. this is assuming the 'X-total' columns are initially absent or empty. If not empty:

df[d.columns] = d-df[d.columns]

CodePudding user response:

We can convert to a MultiIndex using str.rsplit on the columns the swaplevel so that the top level columns are "open", "close", and "total". The benefit of a MultiIndex is that we can use index alignment to do the computations and the computations can be done idiomatically:

# Create MultiIndex
df.columns = df.columns.str.rsplit('-', n=1, expand=True).swaplevel()

# Multi Index allows for idiomatic computations like this:
df['total'] = df['open']   df['close']

# Collapse MultiIndex
df.columns = [f'{b}-{a}' for (a, b) in df.columns]

df:

   A-open  A-close  A-total  G-open  G-close  G-total  F-open  F-close  F-total
0       1        4        5       3        5        8       8        2       10

Setup:

import pandas as pd

df = pd.DataFrame({
    'A-open': [1], 'A-close': [4], 'A-total': [0], 'G-open': [3],
    'G-close': [5], 'G-total': [0], 'F-open': [8], 'F-close': [2],
    'F-total': [0]
})

CodePudding user response:

Get a list of the different sets of letter columns, by splitting all the column names by '-' and keeping the first letter/letters, putting the output into a set to remove duplicates:

letters = list({i.split('-')[0] for i in df.columns.to_list()})
print(letters)
{'A', 'F', 'G'}

Then you can loop through these, and calculate the total column for each, by adding the open and close:

for l in letters:
    df[l   '-total'] = df[l   '-open']   df[l   '-close']

Final df:

   A-open  A-close  A-total  G-open  G-close  G-total  F-open  F-close  F-total
0       1        4        5       3        5        8       8        2       10
  • Related