I have the following Pandas data frame df
:
import pandas as pd
df = pd.DataFrame({
'city': ['New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Houston', 'Houston', 'Houston', 'Boston', 'Boston', 'Boston', 'Boston'],
'airport': ['LGA', 'EWR', 'JFK', 'TEB', 'CWD', 'TTN', 'LAX', 'BUR', 'IAH', 'HOU', 'EFD', 'BOS', 'ACK', 'MVY', 'WST'],
'distance': [38, 32, 8, 78, 120, 180, 8, 19, 90, 78, 120, 9, 97, 72, 150]
})
df
city airport distance
0 New York LGA 38
1 New York EWR 32
2 New York JFK 8
3 New York TEB 78
4 New York CWD 120
5 New York TTN 180
6 Los Angeles LAX 8
7 Los Angeles BUR 19
8 Houston IAH 90
9 Houston HOU 78
10 Houston EFD 120
11 Boston BOS 9
12 Boston ACK 97
13 Boston MVY 72
14 Boston WST 150
I create a sorted pivot table using the following:
pivot_table = pd.pivot_table(df, index = ['city', 'airport'], values = 'distance')
sorted_table = pivot_table.reset_index().sort_values(['city', 'distance'], ascending=[1,0]).set_index(['city', 'airport'])
distance
city airport
Boston WST 150
ACK 97
MVY 72
BOS 9
Houston EFD 120
IAH 90
HOU 78
Los Angeles BUR 19
LAX 8
New York TTN 180
CWD 120
TEB 78
LGA 38
EWR 32
JFK 8
As you can see, some cities have more than 3 associated airports (e.g. Boston and New York).
How can I limit the number of results for city
to a maximum of three (3)?
The desired pivot table would look like this:
distance
city airport
Boston WST 150
ACK 97
MVY 72
Houston EFD 120
IAH 90
HOU 78
Los Angeles BUR 19
LAX 8
New York TTN 180
CWD 120
TEB 78
Thanks!
CodePudding user response:
df = pd.DataFrame({
'city': ['New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Houston', 'Houston', 'Houston', 'Boston', 'Boston', 'Boston', 'Boston'],
'airport': ['LGA', 'EWR', 'JFK', 'TEB', 'CWD', 'TTN', 'LAX', 'BUR', 'IAH', 'HOU', 'EFD', 'BOS', 'ACK', 'MVY', 'WST'],
'distance': [38, 32, 8, 78, 120, 180, 8, 19, 90, 78, 120, 9, 97, 72, 150]
})
pivot_table = pd.pivot_table(df, index = ['city', 'airport'], values = 'distance')
sorted_table = pivot_table.reset_index().sort_values(['city', 'distance'], ascending=[1,0]).set_index(['city', 'airport'])
limited_table = sorted_table.groupby('city').head(3)
print(limited_table)
distance
city airport
Boston WST 150
ACK 97
MVY 72
Houston EFD 120
IAH 90
HOU 78
Los Angeles BUR 19
LAX 8
New York TTN 180
CWD 120
TEB 78
CodePudding user response:
You can also do it like this without the creation of multiindex:
df.sort_values(['city', 'distance'], ascending=[True, False])\
.groupby(['city']).head(3)
Output:
city airport distance
14 Boston WST 150
12 Boston ACK 97
13 Boston MVY 72
10 Houston EFD 120
8 Houston IAH 90
9 Houston HOU 78
7 Los Angeles BUR 19
6 Los Angeles LAX 8
5 New York TTN 180
4 New York CWD 120
3 New York TEB 78