I have a pandas df, something like this:
col1 col2
ABC [hello, hi, hey, hiya]
my task is to extract the first three words of col2 into a new column with a hyphen in between. Something like this:
col1 col2 col3
ABC [hello, hi, hey, hiya] hello-hi-hey
this seemed simple enough, but I am not able to remove the square brackets anyway I try in new column. Is this possible to do? Any help will be appreciated.
CodePudding user response:
Assuming a Series of lists, slice and join:
df['col3'] = df['col2'].str[:3].agg('-'.join)
If you rather have string representations of lists:
import re
df['col3'] = ['-'.join(re.split(', ', s[1:-1])[:3]) for s in df['col2']]
output:
col1 col2 col3
0 ABC [hello, hi, hey, hiya] hello-hi-hey