Iterate over Multiple Columns to Find a Value then Create a New Column-CodePudding

My original data frame is this

data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111']}  

df = pd.DataFrame(data)

It presents a set of patients and three types of difficulties. '111' indicates that the patient does not have any difficulties, whereas the other codes (112,113,114) indicates that they do.

What I want to do is I want to iterate over the three columns to find patients that have at least one type of difficulty and save the result to a new column "Difficulty_status" with (yes/no) values.

The desired output is as below

data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111'], 'Difficulty_status':['yes','no','yes','yes']}

df_output = pd.DataFrame(data)

What I have achieved so far is this

df['Difficylty_status'] = ['yes' if x != '111' else 'no' for x in df['Vision_Difficulty']]

I want to generalize this code to check all three columns (Vision_Difficulty, Hearing_Difficulty, Moving_Difficulty)

CodePudding user response：

Use numpy.where with test if equal by all columns with Difficulty in columns names filtered by DataFrame.filter and DataFrame.eq, for test if all Trues use DataFrame.all:

df['Difficulty_status'] = np.where(df.filter(like='Difficulty').eq('111').all(axis=1), 
                                   'no',
                                   'yes')

Or use DataFrame.ne, for test if at least one Trues use DataFrame.any and swap yes, no:

df['Difficulty_status'] = np.where(df.filter(like='Difficulty').ne('111').any(axis=1),
                                    'yes',
                                    'no')

print (df)
  Patient_ID Vision_Difficulty Hearing_Difficulty Moving_Difficulty  \
0          A               111                112               111   
1          B               111                111               111   
2          C               113                112               112   
3          D               114                113               111   

  Difficylty_status  
0               yes  
1                no  
2               yes  
3               yes

EDIT: If need specify columns names for test difficulties use:

cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].eq('111').all(axis=1), 'no','yes')

Or:

cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].ne('111').any(axis=1), 'yes','no')

CodePudding user response：

data["difficulty_status"]="NA"
for i in range(len(data)):
    if '111' in [data["Vision_Difficulty"][i],data["Hearing_Difficulty"][i],data["Moving_Difficulty"][i]]:
        data["difficulty_status"][i]="no"
    else:
        data["difficulty_status"][i]="yes"

I'm pretty sure there are tonnes of other ways to do it, but let me know if this works.