Home > Software design >  Selecting specific values out of a column in pandas dataframe
Selecting specific values out of a column in pandas dataframe

Time:11-28

I have a column, 'state', that has the values 'failed', 'successful', and two or three other values.

I am trying to create a dataframe with only the rows that contain 'failed' and 'successful' in the 'state' column.

I have implemented the following code:

df = df[df['state'].str.contains('failed' or 'successful', na = False)]

but I am only receiving 'failed' rows, not 'successful'.

Any suggestions? I have used this same format on other datasets with success

CodePudding user response:

because ("failed" or "successful") == "failed", check the short circuit behavior doc here.

CodePudding user response:

The issue is that the expression "failed" or "successful" evaluates to "failed" since the non-empty string "failed" is truthy. Read this question to learn why this happens.

What you really need to do is evaluate the column on 2 conditions: str.contains("failed") and str.contains("successful") and combine those results together. You can do this using the | operator on the dataframes.

df[df["state"].str.contains("failed", na=False) | df["state"].str.contains("successful", na=False)]

EDIT: As Henry mentioned below, you can get a more succinct answer using regex with df.str.contains.

df[df["state"].str.contains("failed|success", na=False)]
  • Related