I have DataFrame in Python Pandas like below (of course in real DataFrame I have many more columns):
COL1 | COL2 | ... | COLn
-----------------------------------|-------|------|--------
ABC_20220830_CP_6M_BEFORE_100_200 |XXX | .... | ...
XXA_20220830_CP_6M_BEFORE_150_300 |AAA | .... | ...
KKTY_20220830_CP_6M_BEFORE_150_300 |TTT | .... | ...
OOP_20220830_CP_6M_BEFORE_500_600 |TYTT | .... | ...
And I would like to modify column "COL1" to have something like below based on following conditions:
- in COL1 the center of each value is the same, i.e: "20220830_CP_6M_BEFORE" only values before and after mentioned part of string could be different
- I need to create 2 columns based on values in "COL1":
- the first column "COL1a": from the begining of value to the "_20220830"
- the second column "COL1b": from "CP_6M_BEFORE_" to the end of value
COL1_a | COL1_b | COL2 | .... | COLn | COL1 |
---|---|---|---|---|---|
ABC_20220830 | CP_6M_BEFORE_100_200 | XXX | ... | ... | ABC_20220830_CP_6M_BEFORE_100_200 |
XXA_20220830 | CP_6M_BEFORE_150_300 | AAA | ... | ... | XXA_20220830_CP_6M_BEFORE_150_300 |
KKTY_20220830 | CP_6M_BEFORE_150_300 | TTT | ... | ... | KKTY_20220830_CP_6M_BEFORE_150_300 |
OOP_20220830 | CP_6M_BEFORE_500_600 | TYTT | ... | ... | OOP_20220830_CP_6M_BEFORE_500_600 |
How can I do that in Python Pandas ?
CodePudding user response:
Why not use CP_6M_BEFORE
as your delimiter?
You should be able to extract the first part of your string by using pandas's split
method:
mydelimiter = 'CP_6M_BEFORE'
df['COL1_a'] = df['COL1'].str.split(mydelimiter).str[0]
The second part you can build using your delimiter as a prefix:
df['COL1_b'] = mydelimiter df['COL1'].str.split(mydelimiter).str[1].astype(str)
If you are looking for a more robust solution, you may use the underscore (_
) as your delimiter and then create column COL1_a
and COL1_b
from the individual strings output by the split method.