Home > Mobile >  pandas: return mutated column into original dataframe
pandas: return mutated column into original dataframe

Time:02-18

Ive attempted to search the forum for this question, but, I believe I may not be asking it correctly. So here it goes.

I have a large data set with many columns. Originally, I needed to sum all columns for each row by multiple groups based on a name pattern of variables. I was able to do so via:

cols = data.filter(regex=r'_name$').columns
data['sum'] = data.groupby(['id','group'],as_index=False)[cols].sum().assign(sum = lambda x: x.sum(axis=1))

By running this code, I receive a modified dataframe grouped by my 2 factor variables (group & id), with all the columns, and the final sum column I need. However, now, I want to return the final sum column back into the original dataframe. The above code returns the entire modified dataframe into my sum column. I know this is achievable in R by simply adding a .$sum at the end of a piped code. Any ideas on how to get this in pandas?

My hopeful output is just a the addition of the final "sum" variable from the above lines of code into my original dataframe.

Edit: To clarify, the code above returns this entire dataframe: enter image description here

All I want returned is the column in yellow

CodePudding user response:

is this what you need?

data['sum'] = data.groupby(['id','group'])[cols].transform('sum').sum(axis = 1)
  • Related