I have a dataframe as below:
--------
| Key|
--------
| x10x60|
|x1x19x33|
| x12x6|
| a14x4|
|x1x1x1x6|
|x2a23x30|
--------|
And I want the output like this:
The Key
column can be divide by each x
element and put into xa
/xb
/xc
/xd
by order, but if there are a
elements then place it into ta
/tb
/tc
/td
by order.
-------- ----- ------ ----- ----- ----- ---- ---- -----
| Key| xa| xb| xc| xd| ta| tb| tc| td|
-------- ----- ------ ----- ----- ----- ---- ---- -----
| x10x60| x10| x60| | | | | | |
|x1x19x33| x1| x19| x33| | | | | |
| x12x6| x12| x6| | | | | | |
| a14x4| | x4| | | a14| | | |
|x1x1x1x6| x1| x1| x1| x6| | | | |
|x2a23x30| x2| | x30| | | a23| | |
--------|----- ------ ----- ----- ----- ---- ---- -----
I tried substr()
or substring()
cannot have the output, but seems stop at the divide.
CodePudding user response:
Normally you can just use Series.str.split
with expand=True
, and pandas will auto-expand the substrings into columns.
But since you want to place the substrings into very specific columns, use Series.str.extractall
:
import string
m = df['Key'].str.extractall(r'([xa][0-9] )').reset_index() # match the x* and a* substrings
m['match'] = m['match'].map(dict(enumerate(string.ascii_lowercase))) # map 0,1,2... -> a,b,c...
m['match'] = np.where(m[0].str.startswith('x'), 'x', 't') m['match'] # add x or t prefix
out = df[['Key']].join(
m.pivot(index='level_0', columns='match', values=0) # reshape into wide form
.sort_index(key=lambda c: c.str.startswith('t'), axis=1) # put t columns at end
.fillna('')
)
Output:
>>> print(out)
Key xa xb xc xd ta tb
0 x10x60 x10 x60
1 x1x19x33 x1 x19 x33
2 x12x6 x12 x6
3 a14x4 x4 a14
4 x1x1x1x6 x1 x1 x1 x6
5 x2a23x30 x2 x30 a23