I have a dataframe whose index column contains a list of names, and a column which contains a list of numbers. I want to merge and sum certain columns based on their grouping, and replace the index with the first name.
E.g.
grouping_list = [
[A, B, C],
[D, E, F],
...
]
# Original Dataframe
index count
B 1
C 3
A 2
A 4
D 3
F 5
E 3
# Final Dataframe
index count
A 10
D 11
The index whose values are A, B and C will all be summed up in the count column, and it will be replaced by the first value in the list, giving a total of 10. The same is done for D, E and F.
I could only find answers for grouping of same valued rows, and not this. Does anyone have an idea of how to do this?
CodePudding user response:
You can do it like this:
map = {it:lst[0] for lst in grouping_list for it in lst}
df = df.assign(index=df['index'].map(map)).groupby('index').sum().reset_index()
Output:
index count
0 A 10
1 D 11
Explanation:
- create a mapping from names in list to first name in list for
grouping_list
- use this mapping as an argument to
Series.map()
to update theindex
column - use
groupby()
,sum()
andreset_index()
to get rows for each first name encountered with the sum of thecount
column for corresponding rows from the input dataframe.
CodePudding user response:
You can first replace all the values to same value using dataframes replace method
replace method also accepts lists into inputs so lets loop over every list in grouping_list
grouping_list = [["A", "B", "C"], ["D", "E", "F"]]
# the value we want to replace is the first value in list
for array in grouping_list:
dataframe.replace(to_replace=array, value=array[0], inplace=True)
then we want to group by index column and then sum all the count values
dataframe = dataframe.groupby("index").sum().reset_index()