Pandas: How to limit the number of rows for a given index in a pivot table-CodePudding

I have the following Pandas data frame df:

import pandas as pd

df = pd.DataFrame({
'city': ['New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Houston', 'Houston', 'Houston', 'Boston', 'Boston', 'Boston', 'Boston'],
'airport': ['LGA', 'EWR', 'JFK', 'TEB', 'CWD', 'TTN', 'LAX', 'BUR', 'IAH', 'HOU', 'EFD', 'BOS', 'ACK', 'MVY', 'WST'],
'distance': [38, 32, 8, 78, 120, 180, 8, 19, 90, 78, 120, 9, 97, 72, 150]
})
df

    city          airport     distance
0   New York      LGA         38
1   New York      EWR         32
2   New York      JFK     8
3   New York      TEB         78
4   New York      CWD         120
5   New York      TTN         180
6   Los Angeles   LAX         8
7   Los Angeles   BUR         19
8   Houston       IAH         90
9   Houston       HOU         78
10  Houston       EFD         120
11  Boston        BOS         9
12  Boston        ACK         97
13  Boston        MVY         72
14  Boston        WST         150

I create a sorted pivot table using the following:

pivot_table = pd.pivot_table(df, index = ['city', 'airport'], values = 'distance')

sorted_table = pivot_table.reset_index().sort_values(['city', 'distance'], ascending=[1,0]).set_index(['city', 'airport'])


                            distance
       city    airport  
     Boston        WST           150
                   ACK            97
                   MVY            72
                   BOS             9
    Houston        EFD           120
                   IAH            90
                   HOU            78
Los Angeles        BUR            19
                   LAX             8
   New York        TTN           180
                   CWD           120
                   TEB            78
                   LGA            38
                   EWR            32
                   JFK             8

As you can see, some cities have more than 3 associated airports (e.g. Boston and New York).

How can I limit the number of results for city to a maximum of three (3)?

The desired pivot table would look like this:

                            distance
       city    airport  
     Boston        WST           150
                   ACK            97
                   MVY            72
    Houston        EFD           120
                   IAH            90
                   HOU            78
Los Angeles        BUR            19
                   LAX             8
   New York        TTN           180
                   CWD           120
                   TEB            78

Thanks!

CodePudding user response：


df = pd.DataFrame({
'city': ['New York', 'New York', 'New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Houston', 'Houston', 'Houston', 'Boston', 'Boston', 'Boston', 'Boston'],
'airport': ['LGA', 'EWR', 'JFK', 'TEB', 'CWD', 'TTN', 'LAX', 'BUR', 'IAH', 'HOU', 'EFD', 'BOS', 'ACK', 'MVY', 'WST'],
'distance': [38, 32, 8, 78, 120, 180, 8, 19, 90, 78, 120, 9, 97, 72, 150]
})
pivot_table = pd.pivot_table(df, index = ['city', 'airport'], values = 'distance')
sorted_table = pivot_table.reset_index().sort_values(['city', 'distance'], ascending=[1,0]).set_index(['city', 'airport'])

limited_table = sorted_table.groupby('city').head(3)
print(limited_table)

                     distance
city        airport          
Boston      WST           150
            ACK            97
            MVY            72
Houston     EFD           120
            IAH            90
            HOU            78
Los Angeles BUR            19
            LAX             8
New York    TTN           180
            CWD           120
            TEB            78

CodePudding user response：

You can also do it like this without the creation of multiindex:

df.sort_values(['city', 'distance'], ascending=[True, False])\
  .groupby(['city']).head(3)

Output:

           city airport  distance
14       Boston     WST       150
12       Boston     ACK        97
13       Boston     MVY        72
10      Houston     EFD       120
8       Houston     IAH        90
9       Houston     HOU        78
7   Los Angeles     BUR        19
6   Los Angeles     LAX         8
5      New York     TTN       180
4      New York     CWD       120
3      New York     TEB        78