I have a data frame with 9 columns (my real data is very big). I want to consider 4 by 4 columns and build a new dataframe with 2 columns which shows the summation of those 4 columns. Here is a simple example: I want to have the id column.
import pandas as pd
df = pd.DataFrame()
df['id'] = [1, 2, 3, 4]
df['a'] = [10, 0, 1, 3]
df['b'] = [-10, 0, 2, 2]
df['c'] = [0, 1, 3, 3]
df['d'] = [0, 0, 4, 4]
df['e'] = [10, 0, 1, 3]
df['f'] = [10, 0, 2, 2]
df['g'] = [0, -1, 0, 0]
df['h'] = [0, 0, 0, 0]
df
CodePudding user response:
You can use the underlying numpy array for an easy way to reshape:
a = df.drop(columns='id').to_numpy()
df2 = pd.DataFrame(a.reshape((-1, 2, len(df))).sum(2),
columns=['value1', 'value2'],
index=df['id']).reset_index()
output:
id value1 value2
0 1 0 20
1 2 1 -1
2 3 10 3
3 4 12 5