How can I index values in a column in pandas and make it into a new column? This is what I'm trying to do:
Original:
Data
0 0010-AAAA
1 0010-BBBB
2 0010-CCCC
3 0011-DDDD
4 0011-EEEE
Adding two columns:
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010-BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011-DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE
CodePudding user response:
df[['col_2','col_3']]= df['Data'].str.split("-",expand=True)
df
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010-BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011-DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE
CodePudding user response:
Looks like you need a split
:
df[['col_2', 'col_3']] = df['Data'].str.split('-', n=1, expand=True)
output:
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010-BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011-DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE
no dash
Then use a regex with str.extract
.
In this case: numbers \d
, followed by non numbers \D
:
df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d )(\D )')
output:
Data col_2 col_3
0 0010AAAA 0010 AAAA
1 0010BBBB 0010 BBBB
2 0010CCCC 0010 CCCC
3 0011DDDD 0011 DDDD
4 0011EEEE 0011 EEEE
or even: r'(\d )\W*(\D )'
(digits / optional non-alphanum / non-digits) to handle both cases at once:
df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d )\W*(\D )')
example:
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE