Home > Software engineering >  How to divide values in one column to create 2 new columns from DataFrame in Python Pandas?
How to divide values in one column to create 2 new columns from DataFrame in Python Pandas?

Time:10-18

I have DataFrame in Python Pandas like below (of course in real DataFrame I have many more columns):

COL1                               | COL2  | ...  | COLn
-----------------------------------|-------|------|--------
ABC_20220830_CP_6M_BEFORE_100_200  |XXX    | .... | ...
XXA_20220830_CP_6M_BEFORE_150_300  |AAA    | .... | ...
KKTY_20220830_CP_6M_BEFORE_150_300 |TTT    | .... | ...
OOP_20220830_CP_6M_BEFORE_500_600  |TYTT   | .... | ...

And I would like to modify column "COL1" to have something like below based on following conditions:

  • in COL1 the center of each value is the same, i.e: "20220830_CP_6M_BEFORE" only values before and after mentioned part of string could be different
  • I need to create 2 columns based on values in "COL1":
    • the first column "COL1a": from the begining of value to the "_20220830"
    • the second column "COL1b": from "CP_6M_BEFORE_" to the end of value
COL1_a COL1_b COL2 .... COLn COL1
ABC_20220830 CP_6M_BEFORE_100_200 XXX ... ... ABC_20220830_CP_6M_BEFORE_100_200
XXA_20220830 CP_6M_BEFORE_150_300 AAA ... ... XXA_20220830_CP_6M_BEFORE_150_300
KKTY_20220830 CP_6M_BEFORE_150_300 TTT ... ... KKTY_20220830_CP_6M_BEFORE_150_300
OOP_20220830 CP_6M_BEFORE_500_600 TYTT ... ... OOP_20220830_CP_6M_BEFORE_500_600

How can I do that in Python Pandas ?

CodePudding user response:

Why not use CP_6M_BEFORE as your delimiter? You should be able to extract the first part of your string by using pandas's split method:

mydelimiter = 'CP_6M_BEFORE'
df['COL1_a'] = df['COL1'].str.split(mydelimiter).str[0]

The second part you can build using your delimiter as a prefix:

df['COL1_b'] = mydelimiter df['COL1'].str.split(mydelimiter).str[1].astype(str)

If you are looking for a more robust solution, you may use the underscore (_) as your delimiter and then create column COL1_a and COL1_b from the individual strings output by the split method.

  • Related