data = pd.DataFrame({'id':[1, 2 , 3],
'question': ['first country visited?', 'first city visited?' , 'two cities we love?'],
'answer1': ['UK', 'Paris', 'CA'],
'answer2': ['US', 0.4, 'Paris'],
'answer3': ['CA', 'London', 'London'],
'correct': [['UK'], [0.4], ['London, Paris, 0.4']]
})
data:
id question answer1 answer2 answer3 correct
0 1 first country visited? UK US CA [UK]
1 2 first city visited? Paris 0.4 London [0.4]
2 3 two cities we love? CA Paris London [London, Paris, 0.4]
I am creating a new column to check if values in correct column are found in answer1 or answer2 or answer3 columns.
cols = data.filter(like='answer').columns
data['correct_column'] = data[cols].apply(lambda s: ','.join((m:=s.isin(data.loc[s.name, 'correct']))[m].index), axis=1)
output:
id question answer1 answer2 answer3 correct correct_column
0 1 first country visited? UK US CA [UK] answer1
1 2 first city visited? Paris 0.4 London [0.4 answer2
2 3 two cities we love? CA Paris London [London, Paris, 0.4]
I get an empty value in the third row. I have been trying for hours without success on my original data! Is there any better approach to achieve this? considering different data types in my original df like floats, int & Str ..
CodePudding user response:
Here is a longer version:
cols = data.filter(like='answer').columns
def app(s):
(m:=[s[col] in (data.loc[s.name, 'correct']) for col in cols])
return ', '.join(cols[m])
data['correct_column'] = data[cols].apply(app, axis=1)
data['correct_column']
and shorter version that will accomplish the same thing:
data['correct_column'] = data[cols].apply(lambda s: ', '.join(cols[(m:=[s[col] in (data.loc[s.name, 'correct']) for col in cols])]) , axis=1)
data['correct_column']
which will produce:
0 answer1
1 answer2
2 answer2, answer3
Name: correct_column, dtype: object