How to split the last three values of the Reference Column dataframe into three columns such as Code, PRODUCT, TYPE as shown below
Df1:
**Reference**
Customer/FBG/COMC/APPLE/INTEREST/
Customer/FBG/PORT/APPLE/INTEREST/
Customer/FBG/PORT/APPLE/INTEREST/
Customer/FBG/AUD/5397/APPLE/INTEREST/
Customer/FBG/BDC/APPLE/FRA/
Customer/FBG/DBG/APPLE/FRA/
Customer/FBG/BDI/APPLE/INTEREST/
Customer/FBG/BDR/APPLE/INTEREST/
Customer/FBG/BDR/APPLE/INTEREST/
Customer/FBG/SVZ/APPLE/FRA/
Customer/FBG/SANT/APPLE/INTEREST/
Customer/FBG/SAR/APPLE/FRA/
Customer/FBG/BEA/494823/APPLE/INTEREST/
Customer/FBG/CDA/APPLE/INTEREST/
Expected Output:
DF2:
**CODE PRODUCT TYPE**
COMC APPLE INTEREST
PORT APPLE INTEREST
PORT APPLE INTEREST
5397 APPLE INTEREST
BDC APPLE FRA
DBG APPLE FRA
BDI APPLE INTEREST
BDR APPLE INTEREST
BDR APPLE INTEREST
SVZ APPLE FRA
SANT APPLE INTEREST
SAR APPLE FRA
494823 APPLE INTEREST
CDA APPLE INTEREST
Tried code: df2[['CODE','PRODUCT','TYPE']]=df1['Reference'].str.split('/',-3,expand=True) , not working!
CodePudding user response:
Better use str.extract
:
df[['CODE', 'PRODUCT', 'TYPE']] = df['Reference'].str.extract('([^/] )/([^/] )/([^/] )/?$')
With str.split
you would have needed to slice afterwards, which is less efficient:
df[['CODE', 'PRODUCT', 'TYPE']] = df['Reference'].str.split('/', expand=True).iloc[:, -4:-1]
output:
Reference CODE PRODUCT TYPE
0 Customer/FBG/COMC/APPLE/INTEREST/ COMC APPLE INTEREST
1 Customer/FBG/PORT/APPLE/INTEREST/ PORT APPLE INTEREST
2 Customer/FBG/PORT/APPLE/INTEREST/ PORT APPLE INTEREST
3 Customer/FBG/AUD/5397/APPLE/INTEREST/ 5397 APPLE INTEREST
4 Customer/FBG/BDC/APPLE/FRA/ BDC APPLE FRA
5 Customer/FBG/DBG/APPLE/FRA/ DBG APPLE FRA
6 Customer/FBG/BDI/APPLE/INTEREST/ BDI APPLE INTEREST
7 Customer/FBG/BDR/APPLE/INTEREST/ BDR APPLE INTEREST
8 Customer/FBG/BDR/APPLE/INTEREST/ BDR APPLE INTEREST
9 Customer/FBG/SVZ/APPLE/FRA/ SVZ APPLE FRA
10 Customer/FBG/SANT/APPLE/INTEREST/ SANT APPLE INTEREST
11 Customer/FBG/SAR/APPLE/FRA/ SAR APPLE FRA
12 Customer/FBG/BEA/494823/APPLE/INTEREST/ 494823 APPLE INTEREST
13 Customer/FBG/CDA/APPLE/INTEREST/ CDA APPLE INTEREST
CodePudding user response:
You can use pandas.Series.str.rsplit
.
df[['CODE','PRODUCT','TYPE']] = df.pop('Reference').str.rsplit('/', 4, expand=True).iloc[:, -4:-1]
print(df)
Output:
CODE PRODUCT TYPE
0 COMC APPLE INTEREST
1 PORT APPLE INTEREST
2 PORT APPLE INTEREST
3 5397 APPLE INTEREST
4 BDC APPLE FRA
5 DBG APPLE FRA
6 BDI APPLE INTEREST
7 BDR APPLE INTEREST
8 BDR APPLE INTEREST
9 SVZ APPLE FRA
10 SANT APPLE INTEREST
11 SAR APPLE FRA
12 494823 APPLE INTEREST
13 CDA APPLE INTEREST