I have a dataframe df
:
name | age_5_9 | age_10_14 | age_15_19 |
---|---|---|---|
Alice | no bones broken | no bones broken | broke 1 bone |
Bob | no bones broken | broke 2 bones | no bones broken |
Charles | no bones broken | no bones broken | no bones broken |
I would like to create a column broke_a_bone
that is 1
when any of the rows has a value 'broke 1 bone' or 'broke 2 bones' in any of the columns age_5_9
, age_10_14
, or age_15_19
; otherwise it should be 0.
It should look like this:
name | age_5_9 | age_10_14 | age_15_19 | broke_a_bone |
---|---|---|---|---|
Alice | no bones broken | no bones broken | broke 1 bone | 1 |
Bob | no bones broken | broke 2 bones | no bones broken | 1 |
Charles | no bones broken | no bones broken | no bones broken | 0 |
I tried to use .iterrows
or .apply()
but I just can't seem to make it work.
Thanks in advance.
CodePudding user response:
You can use filter
to select the "age" columns, then check if any
value per row is not (ne
) "no bones broken".
Convert the resulting boolean to integer for 0/1:
df['broke_a_bone'] = (df.filter(like='age_').ne('no bones broken')
.any(axis=1).astype(int)
)
output:
name age_5_9 age_10_14 age_15_19 broke_a_bone
0 Alice no bones broken no bones broken broke 1 bone 1
1 Bob no bones broken broke 2 bones no bones broken 1
2 Charles no bones broken no bones broken no bones broken 0
CodePudding user response:
I'd advise converting everything to numeric:
age_cols = df.filter(like='age').columns
df[age_cols] = df[age_cols].apply(lambda x: pd.to_numeric(x.str.replace('\D', '', regex=True)).fillna(0).astype(int))
print(df)
Output:
name age_5_9 age_10_14 age_15_19
0 Alice 0 0 1
1 Bob 0 2 0
2 Charles 0 0 0
Now, we can sum rows easily!
df['num_broken'] = df[age_cols].sum(axis=1)
print(df)
# Output:
name age_5_9 age_10_14 age_15_19 num_broken
0 Alice 0 0 1 1
1 Bob 0 2 0 2
2 Charles 0 0 0 0
CodePudding user response:
you can use np.where (faster)
a1 = df["age_5_9"].isin(['broke 1 bone', 'broke 2 bones'])
a2 = df["age_10_14"].isin(['broke 1 bone', 'broke 2 bones'])
a3 = df["age_15_19"].isin(['broke 1 bone', 'broke 2 bones'])
df['broke_a_bone'] = np.where((a1|a2|a3), 1, 0)
#or:
a1 = df["age_5_9"].eq(['no bones broken'])
a2 = df["age_10_14"].eq(['no bones broken'])
a3 = df["age_15_19"].eq(['no bones broken'])
df['broke_a_bone'] = np.where((a1&a2&a3), 0, 1)