I have the following dataframe
df=pd.DataFrame({'col1': ['aaaa', 'aabb', 'bbcc', 'ccdd'],
'col2': ['ab12', 'cd15', 'kf25', 'zx78']})
df
col1 col2
0 aaaa ab12
1 aabb cd15
2 bbcc kf25
3 ccdd zx78
I want to create 'col3' based on 'col1' and 'col2', I want to get:
df
col1 col2 col3
0 aaaa ab12 aa-12
1 aabb cd15 aa-15
2 bbcc kf25 bb-25
3 ccdd zx78 cc-78
I tried to use list comprehension but I got the error: ValueError: Length of values (16) does not match length of index (4)
The code I used is :
df['col3']=[x[0:2] '-' y[2:4] for x in df['col1'] for y in df['col2']]
CodePudding user response:
Use simple slicing with the str
accessor, and concatenation:
df['col3'] = df['col1'].str[:2] '-' df['col2'].str[2:4]
Or, if you want the last two characters of col2:
df['col3'] = df['col1'].str[:2] '-' df['col2'].str[-2:]
Output:
col1 col2 col3
0 aaaa ab12 aa-12
1 aabb cd15 aa-15
2 bbcc kf25 bb-25
3 ccdd zx78 cc-78
why your approach did not work
You would have needed to zip
:
df['col3'] = [x[0:2] '-' y[2:4] for x,y in zip(df['col1'], df['col2'])]