df = pd.DataFrame({
"Continent": list("AAABBBCCD"),
"Country": list("FGHIJKLMN"),
"Population": [90, 140, 50, 80, 80, 70, 50, 125, 50]})
As explained, I want to return all of the rows, where all countries in each continent are less than 100.
Continent Country Population
0 A F 90
1 A G 140
2 A H 50
3 B I 80
4 B J 80
5 B K 70
6 C L 50
7 C M 125
8 D N 50
Every row in Continent A is removed because Country G has a population greater than 100. Every row in Continent C is removed because of Country M. I want the returned DataFrame to look like below:
Continent Country Population
3 B I 80
4 B J 80
5 B K 70
8 D N 50
I tried df[df["Population"] <= 100]
but couldn't determine how to adjust for Continent.
CodePudding user response:
here is one way to do it
# groupby on continent
# using makes the row True/False, whether max for the group is below 100
out=df[df.groupby(['Continent'])['Population'].transform(lambda x: x.max()<100)]
out
Continent Country Population
3 B I 80
4 B J 80
5 B K 70
8 D N 50
CodePudding user response:
Here is another way to accomplish it
import pandas as pd
df = pd.DataFrame({
"Continent": list("AAABBBCCD"),
"Country": list("FGHIJKLMN"),
"Population": [90, 140, 50, 80, 80, 70, 50, 125, 50]})
df.loc[df.groupby(['Continent'])['Population'].transform('max') <= 100]
I usually don't like using lambda since it is so slow, but the above answer also works. This is just another option