How to divide values in one column to create 2 new columns from DataFrame in Python Pandas?-CodePudding

I have DataFrame in Python Pandas like below (of course in real DataFrame I have many more columns):

COL1                               | COL2  | ...  | COLn
-----------------------------------|-------|------|--------
ABC_20220830_CP_6M_BEFORE_100_200  |XXX    | .... | ...
XXA_20220830_CP_6M_BEFORE_150_300  |AAA    | .... | ...
KKTY_20220830_CP_6M_BEFORE_150_300 |TTT    | .... | ...
OOP_20220830_CP_6M_BEFORE_500_600  |TYTT   | .... | ...

And I would like to modify column "COL1" to have something like below based on following conditions:

in COL1 the center of each value is the same, i.e: "20220830_CP_6M_BEFORE" only values before and after mentioned part of string could be different
I need to create 2 columns based on values in "COL1":
- the first column "COL1a": from the begining of value to the "_20220830"
- the second column "COL1b": from "CP_6M_BEFORE_" to the end of value

COL1_a	COL1_b	COL2	....	COLn	COL1
ABC_20220830	CP_6M_BEFORE_100_200	XXX	...	...	ABC_20220830_CP_6M_BEFORE_100_200
XXA_20220830	CP_6M_BEFORE_150_300	AAA	...	...	XXA_20220830_CP_6M_BEFORE_150_300
KKTY_20220830	CP_6M_BEFORE_150_300	TTT	...	...	KKTY_20220830_CP_6M_BEFORE_150_300
OOP_20220830	CP_6M_BEFORE_500_600	TYTT	...	...	OOP_20220830_CP_6M_BEFORE_500_600

How can I do that in Python Pandas ?

CodePudding user response：

Why not use CP_6M_BEFORE as your delimiter? You should be able to extract the first part of your string by using pandas's split method:

mydelimiter = 'CP_6M_BEFORE'
df['COL1_a'] = df['COL1'].str.split(mydelimiter).str[0]

The second part you can build using your delimiter as a prefix:

df['COL1_b'] = mydelimiter df['COL1'].str.split(mydelimiter).str[1].astype(str)

If you are looking for a more robust solution, you may use the underscore (_) as your delimiter and then create column COL1_a and COL1_b from the individual strings output by the split method.