Consider this dataframe:
name age scores
0 Alice 8 (False, 0, 89.1)
1 Bob 7 (True, 136, 79.05)
2 Chuck 9 (True, 138, 75.0)
3 Daren 12 (True, 146, 77.25)
3 Elisa 13 (True, 146, 77.25)
Now, I want to filter the dataframe to include only those entries with True in the first position of the 'scores' tuple, to obtain this dataframe :
name age scores
1 Bob 7 (True, 136, 79.05)
2 Chuck 9 (True, 138, 75.0)
3 Daren 12 (True, 146, 77.25)
3 Elisa 13 (True, 146, 77.25)
I have tried both of these:
df = df[df.scores[0] == True]
and
df = df.drop(df[df.scores[0] == False].index)
But I keep getting errors. Does anyone know of an efficient way to filter by value in a tuple? Thanks!
CodePudding user response:
You can try using Series.str
:
df.loc[df.scores.str[0]==True]
CodePudding user response:
A reason of the error is because df.score is Series
but you are trying to use it such as a list
or tuple
(e.g., scores[0], scores[1] ...).
Instead, you can use lambda
here, as follows:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Chuck', 'Daren', 'Elisa'],
'age': [8, 7, 9, 12, 13],
'scores': [(False, 0, 89.1), (True, 136, 79.05), (True, 138, 75.0), (True, 146, 77.25), (True, 146, 77.25)],
})
print(df)
# name age scores
#0 Alice 8 (False, 0, 89.1)
#1 Bob 7 (True, 136, 79.05)
#2 Chuck 9 (True, 138, 75.0)
#3 Daren 12 (True, 146, 77.25)
#4 Elisa 13 (True, 146, 77.25)
df2 = df[df['scores'].apply(lambda x: x[0])]
print(df2)
# name age scores
#1 Bob 7 (True, 136, 79.05)
#2 Chuck 9 (True, 138, 75.0)
#3 Daren 12 (True, 146, 77.25)
#4 Elisa 13 (True, 146, 77.25)