Home > Blockchain >  How to check if a row of a Pandas dataframe has a cell with a specific value and if it does modify t
How to check if a row of a Pandas dataframe has a cell with a specific value and if it does modify t

Time:09-20

I have a dataframe df:

name age_5_9 age_10_14 age_15_19
Alice no bones broken no bones broken broke 1 bone
Bob no bones broken broke 2 bones no bones broken
Charles no bones broken no bones broken no bones broken

I would like to create a column broke_a_bone that is 1 when any of the rows has a value 'broke 1 bone' or 'broke 2 bones' in any of the columns age_5_9, age_10_14, or age_15_19; otherwise it should be 0.

It should look like this:

name age_5_9 age_10_14 age_15_19 broke_a_bone
Alice no bones broken no bones broken broke 1 bone 1
Bob no bones broken broke 2 bones no bones broken 1
Charles no bones broken no bones broken no bones broken 0

I tried to use .iterrows or .apply() but I just can't seem to make it work.

Thanks in advance.

CodePudding user response:

You can use filter to select the "age" columns, then check if any value per row is not (ne) "no bones broken".

Convert the resulting boolean to integer for 0/1:

df['broke_a_bone'] = (df.filter(like='age_').ne('no bones broken')
                        .any(axis=1).astype(int)
                     )

output:

      name          age_5_9        age_10_14        age_15_19  broke_a_bone
0    Alice  no bones broken  no bones broken     broke 1 bone             1
1      Bob  no bones broken    broke 2 bones  no bones broken             1
2  Charles  no bones broken  no bones broken  no bones broken             0

CodePudding user response:

I'd advise converting everything to numeric:

age_cols = df.filter(like='age').columns

df[age_cols] = df[age_cols].apply(lambda x: pd.to_numeric(x.str.replace('\D', '', regex=True)).fillna(0).astype(int))

print(df)

Output:

      name  age_5_9  age_10_14  age_15_19
0    Alice        0          0          1
1      Bob        0          2          0
2  Charles        0          0          0

Now, we can sum rows easily!

df['num_broken'] = df[age_cols].sum(axis=1)

print(df)

# Output:

      name  age_5_9  age_10_14  age_15_19  num_broken
0    Alice        0          0          1           1
1      Bob        0          2          0           2
2  Charles        0          0          0           0

CodePudding user response:

you can use np.where (faster)


a1 = df["age_5_9"].isin(['broke 1 bone', 'broke 2 bones'])
a2 = df["age_10_14"].isin(['broke 1 bone', 'broke 2 bones'])
a3 = df["age_15_19"].isin(['broke 1 bone', 'broke 2 bones'])

df['broke_a_bone'] = np.where((a1|a2|a3), 1, 0)

#or:
a1 = df["age_5_9"].eq(['no bones broken'])
a2 = df["age_10_14"].eq(['no bones broken'])
a3 = df["age_15_19"].eq(['no bones broken'])

df['broke_a_bone'] = np.where((a1&a2&a3), 0, 1)
  • Related