Grouping with Python-CodePudding

I have a dataset I am trying to group by some common values and then sum up some other values. The tricky part is I want to add some sort of weighting that keeps the largest number, I'll try to elaborate more below:

I've created a dummy data frame that is along the lines of my data just for example purposes:

df = pd.DataFrame({'Family': ['Contactors', 'Contactors', 'Contactors'], 
                   'Cell': ['EP&C', 'EXR', 'C&S'],
                    'Visits': ['25620', '626', '40']})

This produces a table like so:

So, in this example I would want all of the 'Contactors' to be grouped up by EP&C (as this has the highest visits to start with) but I would like all of the visits summed up and the other 'Cell' values dropped, so I would be left with something like this:

Could anyone advise?

Thanks.

CodePudding user response：

IIUC, you can use:

(df
 # convert to numeric
 .assign(Visits=pd.to_numeric(df['Visits']))
 # ensure the top row per group is the highest visits
 .sort_values(by=['Family', 'Visits'], ascending=False)
 # for groups per Family
 .groupby('Family', sort=False, as_index=False)
 # aggregate per group: Cell (first row, i.e top) and Visits (sum of rows)
 .agg({'Cell': 'first', 'Visits': sum})
)

output:

       Family  Cell  Visits
0  Contactors  EP&C   26286