Home > Blockchain >  Sort dataframe by value in a tuple
Sort dataframe by value in a tuple

Time:02-17

Consider this dataframe:

     name  age                           scores
0   Alice    8                 (False, 0, 89.1)
1     Bob    7               (True, 136, 79.05)
2   Chuck    9                (True, 138, 75.0)
3   Daren   12               (True, 146, 77.25)
3   Elisa   13               (True, 146, 77.25)

Now, I want to filter the dataframe to include only those entries with True in the first position of the 'scores' tuple, to obtain this dataframe :

     name  age                           scores
1     Bob    7               (True, 136, 79.05)
2   Chuck    9                (True, 138, 75.0)
3   Daren   12               (True, 146, 77.25)
3   Elisa   13               (True, 146, 77.25)

I have tried both of these:

df = df[df.scores[0] == True]

and

df = df.drop(df[df.scores[0] == False].index)

But I keep getting errors. Does anyone know of an efficient way to filter by value in a tuple? Thanks!

CodePudding user response:

You can try using Series.str:

df.loc[df.scores.str[0]==True]

CodePudding user response:

A reason of the error is because df.score is Series but you are trying to use it such as a list or tuple (e.g., scores[0], scores[1] ...).

Instead, you can use lambda here, as follows:

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Chuck', 'Daren', 'Elisa'],
    'age': [8, 7, 9, 12, 13],
    'scores': [(False, 0, 89.1), (True, 136, 79.05), (True, 138, 75.0), (True, 146, 77.25), (True, 146, 77.25)],
})

print(df)
#    name  age              scores
#0  Alice    8    (False, 0, 89.1)
#1    Bob    7  (True, 136, 79.05)
#2  Chuck    9   (True, 138, 75.0)
#3  Daren   12  (True, 146, 77.25)
#4  Elisa   13  (True, 146, 77.25)

df2 = df[df['scores'].apply(lambda x: x[0])]

print(df2)
#    name  age              scores
#1    Bob    7  (True, 136, 79.05)
#2  Chuck    9   (True, 138, 75.0)
#3  Daren   12  (True, 146, 77.25)
#4  Elisa   13  (True, 146, 77.25)
  • Related