Home > Software design >  Pandas sorting DF by values and indexes
Pandas sorting DF by values and indexes

Time:08-11

I'm trying to sort list by frequency and then by name (pandas 1.3.2, python 3.10).

Firstly, I count each occurence in list, then, if amount is equal, names must be ordered alphabetically.

I found out that all works when len(list) < 19. Magic...

Code:

import pandas
        
df_data = pandas.DataFrame({
                'data':
                    ['14209adobepremiere', 'adobe-flash-player', 'adobe-flash-player-cis', 
                     'adobe-photoshop-cc-cis', 'discord', 'discord', 'driverpack', 
                     'freeoffice', 'freeoffice2018', 'generals',
                     'tiktok-for-pc-cis', 'tlauncher', 'utorrent', 'viber', 
                     'winrar', 'zoom', 'zoom', 'zoom-client-for-conferences', 
                     'zoom-client-for-conferences-cis']
            })

with pandas.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df_data['data'].value_counts().sort_index(
            ascending=True,
        ).sort_values(ascending=False))

Expected output (by count desc, then alphabetically asc):

discord                            2
zoom                               2
14209adobepremiere                 1
adobe-flash-player                 1
adobe-flash-player-cis             1
adobe-photoshop-cc-cis             1
driverpack                         1
freeoffice                         1
freeoffice2018                     1
generals                           1
tiktok-for-pc-cis                  1
tlauncher                          1
utorrent                           1
viber                              1
winrar                             1
zoom-client-for-conferences        1
zoom-client-for-conferences-cis    1
Name: data, dtype: int64

Real output (by count desc, but not alphabetically asc):

zoom                               2
discord                            2
14209adobepremiere                 1
tiktok-for-pc-cis                  1
zoom-client-for-conferences        1
winrar                             1
viber                              1
utorrent                           1
tlauncher                          1
generals                           1
adobe-flash-player                 1
freeoffice2018                     1
freeoffice                         1
driverpack                         1
adobe-photoshop-cc-cis             1
adobe-flash-player-cis             1
zoom-client-for-conferences-cis    1
Name: data, dtype: int64

Thnx in advance for any help.

CodePudding user response:

I don't think you can chain the .sort_values operations on the index and then data, one method could be to reset the index, sort and reapply the index.

df_data['data'].value_counts()\
        .reset_index().sort_values(['data','index'],
          ascending=[False,True]).set_index('index')

                                data
index
discord                             2
zoom                                2
14209adobepremiere                  1
adobe-flash-player                  1
adobe-flash-player-cis              1
adobe-photoshop-cc-cis              1
driverpack                          1
freeoffice                          1
freeoffice2018                      1
generals                            1
tiktok-for-pc-cis                   1
tlauncher                           1
utorrent                            1
viber                               1
winrar                              1
zoom-client-for-conferences         1
zoom-client-for-conferences-cis     1
  • Related