Home > Net >  Find a row closest to the mean of a DataFrame column
Find a row closest to the mean of a DataFrame column

Time:02-26

I am working with pandas using a small dataset and I am stuck somewhere.

Here is the data after merging:

head of the dataframe

Using this data, the code below give the minimum area of each region and complete with the corresponding country name on the same line in the Dataframe obtained.

Area_min=Africa.groupby('Region').Area.agg([min])
Area_min['Country']=(Africa.loc[Africa.groupby('Region').Area.idxmin(), 'Names']).values
Area_min

And this one give the maximum population of each region and complete with the corresponding country name on the same line in Dataframe obtained.

Pop_max=Africa.groupby('Region').Population.agg([max])
Pop_max['Country']=(Africa.loc[Africa.groupby('Region').Population.idxmax(), 'Names']).values
Pop_max

Now I am trying to get the average population of each region and complete with the name of country having the population closest to the average of the corresponding group, on the same line in the Dataframe obtained.

The code below give the average population of each region but I am stuck on corresponding with the country name.

Pop_average=Africa.groupby('Region').Population.agg(['mean'])

I am thinking about .map() and .apply() function but I have tried without success. Any hint will be helpful.

CodePudding user response:

Since you're grouping by only one column, it's more efficient to do it once.

Also, since you're using idxmin anyway, it seems it's redundant to do the first groupby.agg, since you can directly access the column names.

g = Africa.groupby('Region')
Area_min = Africa.loc[g['Area'].idxmin(), ['Names', 'Area']]
Pop_max = Africa.loc[g['Population'].idxmax(), ['Names', 'Population']]

Then for your question, here's one approach. Transform the population mean and find the difference between the mean and the population and find the location where the difference is the smallest using abs groupby idxmin; then use the loc accessor like above to get the desired outcome:

Pop_average = Africa.loc[((g['Population'].transform('mean') - Africa['Population']).abs()
                          .groupby(Africa['Region']).idxmin()), 
                         ['Names','Population']]
  • Related