Home > Back-end >  Get columns names as values if a condition is met in rows
Get columns names as values if a condition is met in rows

Time:01-02

I have this example df:

data = pd.DataFrame({'id':[1,  2 , 3],

                   'question': ['first country visited?', 'first city visited?' , 'two cities we love?'],
                   'answer1': ['UK', 'Paris', 'CA'],
                   'answer2': ['US', 'New York', 'Paris'],
                   'answer3': ['CA', 'London', 'London'],
                   'answer4': ['JP', 'Toronto', 'Los Angeles'],
                   'correct': [['UK'], ['London'], ['London','Paris']]
                   })

gives:

    id  question                answer1 answer2    answer3  answer4     correct
0   1   first country visited?  UK        US         CA       JP        [UK]
1   2   first city visited?     Paris     New York   London Toronto     [London]
2   3   two cities we love?     CA        Paris      London Los Angeles [London, Paris]

I am trying to identify the columns names (answer1 or 2 .. etc) with the correct answer if found in the data['correct'] column in a new column called data['correct_column']

here what I did so far:

data['correct_column'] = data.loc[:,'answer1':'answer4'].isin(data['correct']).idxmax(1)

I got all the same result just the value answer1 in the data['correct_column'] I do not know why

desired output:

       id  question                answer1      answer2    answer3    answer4      correct              correct_column
0   1   first country visited?      UK          US           CA         JP          [UK]                   answer1
1   2   first city visited?         Paris       New York    London    Toronto       [London]               answer3
2   3   two cities we love?         CA          Paris       London    Los Angeles   [London, Paris]        answer3,answer2

CodePudding user response:

I see several ways to achieve this task:

Using apply:

cols = data.filter(like='answer').columns
data['correct_column'] = data[cols].apply(lambda s: ','.join((m:=s.isin(data.loc[s.name, 'correct']))[m].index), axis=1)

Using a more complex approach by exploding, checking identity and merging again per group:

cols = data.filter(like='answer').columns
df2 = data.explode('correct')
mask = (df2[cols].filter(like='answer').eq(df2['correct'].values, axis=0)
           .groupby(level=0).any()
        )
data.join(mask.mul(cols).where(mask).apply(lambda x: x.str.cat(sep=','), axis=1).rename('correct_column'))

output:

   id                question answer1   answer2 answer3      answer4          correct   correct_column
0   1  first country visited?      UK        US      CA           JP             [UK]          answer1
1   2     first city visited?   Paris  New York  London      Toronto         [London]          answer3
2   3     two cities we love?      CA     Paris  London  Los Angeles  [London, Paris]  answer2,answer3
  • Related