My original data frame is this
data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111']}
df = pd.DataFrame(data)
It presents a set of patients and three types of difficulties. '111' indicates that the patient does not have any difficulties, whereas the other codes (112,113,114) indicates that they do.
What I want to do is I want to iterate over the three columns to find patients that have at least one type of difficulty and save the result to a new column "Difficulty_status" with (yes/no) values.
The desired output is as below
data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111'], 'Difficulty_status':['yes','no','yes','yes']}
df_output = pd.DataFrame(data)
What I have achieved so far is this
df['Difficylty_status'] = ['yes' if x != '111' else 'no' for x in df['Vision_Difficulty']]
I want to generalize this code to check all three columns (Vision_Difficulty, Hearing_Difficulty, Moving_Difficulty)
CodePudding user response:
Use numpy.where
with test if equal by all columns with Difficulty
in columns names filtered by DataFrame.filter
and DataFrame.eq
, for test if all True
s use DataFrame.all
:
df['Difficulty_status'] = np.where(df.filter(like='Difficulty').eq('111').all(axis=1),
'no',
'yes')
Or use DataFrame.ne
, for test if at least one True
s use DataFrame.any
and swap yes, no
:
df['Difficulty_status'] = np.where(df.filter(like='Difficulty').ne('111').any(axis=1),
'yes',
'no')
print (df)
Patient_ID Vision_Difficulty Hearing_Difficulty Moving_Difficulty \
0 A 111 112 111
1 B 111 111 111
2 C 113 112 112
3 D 114 113 111
Difficylty_status
0 yes
1 no
2 yes
3 yes
EDIT: If need specify columns names for test difficulties use:
cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].eq('111').all(axis=1), 'no','yes')
Or:
cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].ne('111').any(axis=1), 'yes','no')
CodePudding user response:
data["difficulty_status"]="NA"
for i in range(len(data)):
if '111' in [data["Vision_Difficulty"][i],data["Hearing_Difficulty"][i],data["Moving_Difficulty"][i]]:
data["difficulty_status"][i]="no"
else:
data["difficulty_status"][i]="yes"
I'm pretty sure there are tonnes of other ways to do it, but let me know if this works.