Home > database >  Create a new column A, where the value will be taken from a specific column based on the value in co
Create a new column A, where the value will be taken from a specific column based on the value in co

Time:11-22

I want to create a new column A (verbatim), where the value will be taken from a specific column based on the value in column C (category)

The data I have:

ID  pos          neg  better_than_comp  less_well_than_comp Category            code
1   good service      quick response    price and range     POSITIVE            Satsfied
2   good service      quick response    price and range     BETTER THAN COMP    Speed
3   good service      quick response    price and range     LESS WELL THAN COMP Cost
4   good service      quick response    price and range     LESS WELL THAN COMP Choice

Desired output:
ID  pos          neg    better_than_comp    less_well_than_comp Category            code     verbatim
1   good service        quick response      price and range     POSITIVE            Satsfiedgood service
2   good service        quick response      price and range     BETTER THAN COMP    Speed   quick response
3   good service        quick response      price and range     LESS WELL THAN COMP Cost    price and range
4   good service        quick response      price and range     LESS WELL THAN COMP Choice  price and range

I've tried something like:

df['verbatim']=df['Category'].apply(lambda x: x['better_than_comp'] if 
                                           x == 'BETTER THAN COMP'
                                           else x['less_well_than_comp']
                                           if x=='LESS WELL THAN COMP'
                                           else x['pos'] if
                                           x=='POSITIVE'
                                           else x)

But I get an error: TypeError: string indices must be integers

The data is actually a melt of another dataset, if that matters, that's why the values are repeated in columns 1:5.

CodePudding user response:

you should use it like this.

df['verbatim']=df.apply(lambda x: x['better_than_comp'] if 
                                           x['Category'] == 'BETTER THAN COMP'
                                           else x['less_well_than_comp']
                                           if x['Category']=='LESS WELL THAN COMP'
                                           else x['pos'] if
                                           x['Category']=='POSITIVE'
                                           else x['Category'],axis=1)

Another option np.select (much faster):

condlist=[df['Category']=='BETTER THAN COMP',df['Category']=='LESS WELL THAN COMP',df['Category']=='POSITIVE']
choicelist=[df['better_than_comp'],df['less_well_than_comp'],df['pos']]
df['verbatim']=np.select(condlist,choicelist)
  • Related