Home > Net >  How to convert the dataframe column with delimiters into three columns?
How to convert the dataframe column with delimiters into three columns?

Time:11-03

How to split the last three values of the Reference Column dataframe into three columns such as Code, PRODUCT, TYPE as shown below

Df1:
    **Reference**
    Customer/FBG/COMC/APPLE/INTEREST/
    Customer/FBG/PORT/APPLE/INTEREST/
    Customer/FBG/PORT/APPLE/INTEREST/
    Customer/FBG/AUD/5397/APPLE/INTEREST/
    Customer/FBG/BDC/APPLE/FRA/
    Customer/FBG/DBG/APPLE/FRA/
    Customer/FBG/BDI/APPLE/INTEREST/
    Customer/FBG/BDR/APPLE/INTEREST/
    Customer/FBG/BDR/APPLE/INTEREST/
    Customer/FBG/SVZ/APPLE/FRA/
    Customer/FBG/SANT/APPLE/INTEREST/
    Customer/FBG/SAR/APPLE/FRA/
    Customer/FBG/BEA/494823/APPLE/INTEREST/
    Customer/FBG/CDA/APPLE/INTEREST/

Expected Output:

DF2:
    **CODE  PRODUCT TYPE**
    COMC    APPLE   INTEREST
    PORT    APPLE   INTEREST
    PORT    APPLE   INTEREST
    5397    APPLE   INTEREST
    BDC     APPLE   FRA
    DBG     APPLE   FRA
    BDI     APPLE   INTEREST
    BDR     APPLE   INTEREST
    BDR     APPLE   INTEREST
    SVZ     APPLE   FRA
    SANT    APPLE   INTEREST
    SAR     APPLE   FRA
    494823  APPLE   INTEREST
    CDA     APPLE   INTEREST

Tried code: df2[['CODE','PRODUCT','TYPE']]=df1['Reference'].str.split('/',-3,expand=True) , not working!

CodePudding user response:

Better use str.extract:

df[['CODE', 'PRODUCT', 'TYPE']] = df['Reference'].str.extract('([^/] )/([^/] )/([^/] )/?$')

With str.split you would have needed to slice afterwards, which is less efficient:

df[['CODE', 'PRODUCT', 'TYPE']] = df['Reference'].str.split('/', expand=True).iloc[:, -4:-1]

output:

                                  Reference    CODE PRODUCT      TYPE
0         Customer/FBG/COMC/APPLE/INTEREST/    COMC   APPLE  INTEREST
1         Customer/FBG/PORT/APPLE/INTEREST/    PORT   APPLE  INTEREST
2         Customer/FBG/PORT/APPLE/INTEREST/    PORT   APPLE  INTEREST
3     Customer/FBG/AUD/5397/APPLE/INTEREST/    5397   APPLE  INTEREST
4               Customer/FBG/BDC/APPLE/FRA/     BDC   APPLE       FRA
5               Customer/FBG/DBG/APPLE/FRA/     DBG   APPLE       FRA
6          Customer/FBG/BDI/APPLE/INTEREST/     BDI   APPLE  INTEREST
7          Customer/FBG/BDR/APPLE/INTEREST/     BDR   APPLE  INTEREST
8          Customer/FBG/BDR/APPLE/INTEREST/     BDR   APPLE  INTEREST
9               Customer/FBG/SVZ/APPLE/FRA/     SVZ   APPLE       FRA
10        Customer/FBG/SANT/APPLE/INTEREST/    SANT   APPLE  INTEREST
11              Customer/FBG/SAR/APPLE/FRA/     SAR   APPLE       FRA
12  Customer/FBG/BEA/494823/APPLE/INTEREST/  494823   APPLE  INTEREST
13         Customer/FBG/CDA/APPLE/INTEREST/     CDA   APPLE  INTEREST

CodePudding user response:

You can use pandas.Series.str.rsplit.

df[['CODE','PRODUCT','TYPE']] = df.pop('Reference').str.rsplit('/', 4, expand=True).iloc[:, -4:-1]
print(df)

Output:

      CODE PRODUCT      TYPE
0     COMC   APPLE  INTEREST
1     PORT   APPLE  INTEREST
2     PORT   APPLE  INTEREST
3     5397   APPLE  INTEREST
4      BDC   APPLE       FRA
5      DBG   APPLE       FRA
6      BDI   APPLE  INTEREST
7      BDR   APPLE  INTEREST
8      BDR   APPLE  INTEREST
9      SVZ   APPLE       FRA
10    SANT   APPLE  INTEREST
11     SAR   APPLE       FRA
12  494823   APPLE  INTEREST
13     CDA   APPLE  INTEREST
  • Related