Ive attempted to search the forum for this question, but, I believe I may not be asking it correctly. So here it goes.
I have a large data set with many columns. Originally, I needed to sum all columns for each row by multiple groups based on a name pattern of variables. I was able to do so via:
cols = data.filter(regex=r'_name$').columns
data['sum'] = data.groupby(['id','group'],as_index=False)[cols].sum().assign(sum = lambda x: x.sum(axis=1))
By running this code, I receive a modified dataframe grouped by my 2 factor variables (group & id), with all the columns, and the final sum column I need. However, now, I want to return the final sum column back into the original dataframe. The above code returns the entire modified dataframe into my sum column. I know this is achievable in R by simply adding a .$sum at the end of a piped code. Any ideas on how to get this in pandas?
My hopeful output is just a the addition of the final "sum" variable from the above lines of code into my original dataframe.
Edit: To clarify, the code above returns this entire dataframe:
All I want returned is the column in yellow
CodePudding user response:
is this what you need?
data['sum'] = data.groupby(['id','group'])[cols].transform('sum').sum(axis = 1)