Home > database >  How do I return the rows of DataFrame where every Country in each Continent has a Population of less
How do I return the rows of DataFrame where every Country in each Continent has a Population of less

Time:11-21

df = pd.DataFrame({
    "Continent": list("AAABBBCCD"), 
    "Country": list("FGHIJKLMN"), 
    "Population": [90, 140, 50, 80, 80, 70, 50, 125, 50]})

As explained, I want to return all of the rows, where all countries in each continent are less than 100.

  Continent Country  Population
0         A       F          90
1         A       G         140
2         A       H          50
3         B       I          80
4         B       J          80
5         B       K          70
6         C       L          50
7         C       M         125
8         D       N          50

Every row in Continent A is removed because Country G has a population greater than 100. Every row in Continent C is removed because of Country M. I want the returned DataFrame to look like below:

  Continent Country  Population
3         B       I          80
4         B       J          80
5         B       K          70
8         D       N          50

I tried df[df["Population"] <= 100] but couldn't determine how to adjust for Continent.

CodePudding user response:

here is one way to do it

# groupby on continent
# using makes the row True/False, whether max for the group is below 100
out=df[df.groupby(['Continent'])['Population'].transform(lambda x: x.max()<100)]
out

Continent   Country     Population
3   B   I   80
4   B   J   80
5   B   K   70
8   D   N   50

CodePudding user response:

Here is another way to accomplish it

import pandas as pd

df = pd.DataFrame({
    "Continent": list("AAABBBCCD"), 
    "Country": list("FGHIJKLMN"), 
    "Population": [90, 140, 50, 80, 80, 70, 50, 125, 50]})

df.loc[df.groupby(['Continent'])['Population'].transform('max') <= 100]

I usually don't like using lambda since it is so slow, but the above answer also works. This is just another option

  • Related