Home > database >  How to sum two consequent columns in pandas and retrieve one as result?
How to sum two consequent columns in pandas and retrieve one as result?

Time:01-17

I have a large dataset of 670 columns and 2856 rows. The idea is to sum two consequent rows and retrieve a single column and value as result. It's important to not have replacement in the way the fist column second, then the third the fourth not the second third.

Index ID1 ID2 ID3 ID4
First 0 1 0 1
Second 0 0 1 1

the result should be

Index ID12 ID34
First 1 1
Second 0 2

The example dataframe:

df = pd.DataFrame({"ID1" : [0,0,0,1,1,1] , "ID2" :[1,1,1,0,0,0], "ID3" : [0,1,1,1,0,1]},"ID4" : [0,0,0,0,0,0])
result = pd.DataFrame({"ID1/2" : [1,1,1,0,0,0] , "ID3/4" :[0,1,1,1,0,1]})

I have tried:

res = []
for i in range(len(df)):
              for j in range(1,len(df.columns),2):
                            res.append(data.iloc[i,j] data.iloc[i,j-1])
result = pd.DataFrame(res)

In R the result is:

result <- matrix(nrow = nrow(df), ncol = ncol(df),)
for (i in seq(1,ncol(df),2)){
  result[,i] <- df[,i] df[,i 1]
}
#Erasing the NAs columns
result <- result [,-seq(2,ncol(result ),2)]

CodePudding user response:

N = len(df.columns)
new_df = df.groupby(np.arange(N) // 2, axis="columns").sum()
new_df.columns = [f"ID{j}{j 1}" for j in range(1, N, 2)]
  • groupby every-2 columns and sum
    • floordivision of the range(len(columns)) by 2 gives the group numbers: 0, 0, 1, 1, 2, 2...
  • form new columns with stepping by 2 over range(N) to get 1, 3...

to get

>>> new_df

        ID12  ID34
Index
First      1     1
Second     0     2

CodePudding user response:

With magic slicing:

res = pd.DataFrame(df.values[:, ::2]   df.values[:, 1::2],
                   columns=df.columns[::2]   df.columns[1::2].str[2:], index=df.index)

         ID12  ID34
First       1     1
Second      0     2
  • Related