I want to create a new column A (verbatim
), where the value will be taken from a specific column based on the value in column C (category
)
The data I have:
ID pos neg better_than_comp less_well_than_comp Category code
1 good service quick response price and range POSITIVE Satsfied
2 good service quick response price and range BETTER THAN COMP Speed
3 good service quick response price and range LESS WELL THAN COMP Cost
4 good service quick response price and range LESS WELL THAN COMP Choice
Desired output:
ID pos neg better_than_comp less_well_than_comp Category code verbatim
1 good service quick response price and range POSITIVE Satsfiedgood service
2 good service quick response price and range BETTER THAN COMP Speed quick response
3 good service quick response price and range LESS WELL THAN COMP Cost price and range
4 good service quick response price and range LESS WELL THAN COMP Choice price and range
I've tried something like:
df['verbatim']=df['Category'].apply(lambda x: x['better_than_comp'] if
x == 'BETTER THAN COMP'
else x['less_well_than_comp']
if x=='LESS WELL THAN COMP'
else x['pos'] if
x=='POSITIVE'
else x)
But I get an error: TypeError: string indices must be integers
The data is actually a melt of another dataset, if that matters, that's why the values are repeated in columns 1:5.
CodePudding user response:
you should use it like this.
df['verbatim']=df.apply(lambda x: x['better_than_comp'] if
x['Category'] == 'BETTER THAN COMP'
else x['less_well_than_comp']
if x['Category']=='LESS WELL THAN COMP'
else x['pos'] if
x['Category']=='POSITIVE'
else x['Category'],axis=1)
Another option np.select (much faster):
condlist=[df['Category']=='BETTER THAN COMP',df['Category']=='LESS WELL THAN COMP',df['Category']=='POSITIVE']
choicelist=[df['better_than_comp'],df['less_well_than_comp'],df['pos']]
df['verbatim']=np.select(condlist,choicelist)