Python how to merge and sum rows in dataframe based on a grouping of words?-CodePudding

I have a dataframe whose index column contains a list of names, and a column which contains a list of numbers. I want to merge and sum certain columns based on their grouping, and replace the index with the first name.

E.g.

grouping_list = [
   [A, B, C],
   [D, E, F],
   ...
]

# Original Dataframe
index  count
B      1    
C      3
A      2
A      4
D      3
F      5
E      3


# Final Dataframe
index  count
A      10    
D      11

The index whose values are A, B and C will all be summed up in the count column, and it will be replaced by the first value in the list, giving a total of 10. The same is done for D, E and F.

I could only find answers for grouping of same valued rows, and not this. Does anyone have an idea of how to do this?

CodePudding user response：

You can do it like this:

map = {it:lst[0] for lst in grouping_list for it in lst}
df = df.assign(index=df['index'].map(map)).groupby('index').sum().reset_index()

Output:

  index  count
0     A     10
1     D     11

Explanation:

create a mapping from names in list to first name in list for grouping_list
use this mapping as an argument to Series.map() to update the index column
use groupby(), sum() and reset_index() to get rows for each first name encountered with the sum of the count column for corresponding rows from the input dataframe.

CodePudding user response：

You can first replace all the values to same value using dataframes replace method

replace method also accepts lists into inputs so lets loop over every list in grouping_list

grouping_list = [["A", "B", "C"], ["D", "E", "F"]]

# the value we want to replace is the first value in list
for array in grouping_list:
  dataframe.replace(to_replace=array, value=array[0], inplace=True)

then we want to group by index column and then sum all the count values

dataframe = dataframe.groupby("index").sum().reset_index()