Home > Mobile >  Python how to merge and sum rows in dataframe based on a grouping of words?
Python how to merge and sum rows in dataframe based on a grouping of words?

Time:08-22

I have a dataframe whose index column contains a list of names, and a column which contains a list of numbers. I want to merge and sum certain columns based on their grouping, and replace the index with the first name.

E.g.

grouping_list = [
   [A, B, C],
   [D, E, F],
   ...
]

# Original Dataframe
index  count
B      1    
C      3
A      2
A      4
D      3
F      5
E      3


# Final Dataframe
index  count
A      10    
D      11

The index whose values are A, B and C will all be summed up in the count column, and it will be replaced by the first value in the list, giving a total of 10. The same is done for D, E and F.

I could only find answers for grouping of same valued rows, and not this. Does anyone have an idea of how to do this?

CodePudding user response:

You can do it like this:

map = {it:lst[0] for lst in grouping_list for it in lst}
df = df.assign(index=df['index'].map(map)).groupby('index').sum().reset_index()

Output:

  index  count
0     A     10
1     D     11

Explanation:

  • create a mapping from names in list to first name in list for grouping_list
  • use this mapping as an argument to Series.map() to update the index column
  • use groupby(), sum() and reset_index() to get rows for each first name encountered with the sum of the count column for corresponding rows from the input dataframe.

CodePudding user response:

You can first replace all the values to same value using dataframes replace method

replace method also accepts lists into inputs so lets loop over every list in grouping_list

grouping_list = [["A", "B", "C"], ["D", "E", "F"]]

# the value we want to replace is the first value in list
for array in grouping_list:
  dataframe.replace(to_replace=array, value=array[0], inplace=True)

then we want to group by index column and then sum all the count values

dataframe = dataframe.groupby("index").sum().reset_index()
  • Related