How to display the order of number of elements in pandas dataframe-CodePudding

I am trying to handle the following dataframe

import pandas as pd
import io

csv_data = '''
ID,age,Year
100,75,2020
100,76,2021
200,64,2020
200,65,2021
200,66,2022
300,69,2020
300,70,2021
300,71,2022
300,72,2023
'''
df = pd.read_csv(io.StringIO(csv_data))
df = df.set_index(['ID', 'age'])
df
         Year
ID  age 
100 75  2020
    76  2021
200 64  2020
    65  2021
    66  2022
300 69  2020
    70  2021
    71  2022
    72  2023

The ID in this data frame represents the same person, and Year means the year of the visit.

This will be multi-indexed based on ID and age. At this point, is it possible to display from the ID with more age elements?

The ideal display is shown below.

        Year
ID  age 
300 69  2020
    70  2021
    71  2022
    72  2023
200 64  2020
    65  2021
    66  2022
100 75  2020
    76  2021

CodePudding user response：

Unless it's a requirement you immediately set_index after your .read_csv, then the following should be reasonably okay...

Get a count of how many times an ID occurs - we can use .value_counts for this which handily automatically sorts by descending order...

id_freq = df['ID'].value_counts()

Then index your DF and .reindex using the index of id_freq, eg:

df = df.set_index(['ID', 'age']).reindex(id_freq.index, level=0)

This'll give you:

         Year
ID  age      
300 69   2020
    70   2021
    71   2022
    72   2023
200 64   2020
    65   2021
    66   2022
100 75   2020
    76   2021

Might also have a side effect of being useful as you could also run id_freq.value_counts() to get a distribution of how many times each patient in general appears.

If you do have to index from the start, then you might as well provide it to .read_csv, such as:

df = pd.read_csv(io.StringIO(csv_data), index_col=['ID', 'age'])

Then similar to above, reindex but on the level values of the first level of the index, eg:

df = df.reindex(df.index.get_level_values(0).value_counts().index, level=0)

CodePudding user response：

You can first compute the length of each ID group using groupby and size, and then sort the ID values based on the length:

s = df.groupby('ID').size()
df.sort_values('ID', key=lambda i:s[i], ascending=False)

Output:

         Year
ID  age
300 69   2020
    70   2021
    71   2022
    72   2023
200 64   2020
    65   2021
    66   2022
100 75   2020
    76   2021

CodePudding user response：

You can sort based on two columns and set which column ascending and which not ascending like below:

>>> print(df.sort_values(['ID', 'age'], ascending=[False, True]))

         Year
ID  age      
300 69   2020
    70   2021
    71   2022
    72   2023
200 64   2020
    65   2021
    66   2022
100 75   2020
    76   2021

CodePudding user response：

You could sort these values like this:

df = df.sort_values(['ID', 'age'],
          ascending = [False, True])

CodePudding user response：

pandas sort_index() is useful for sorting dataframe by index values.

df.sort_index(ascending=[False, True])