I need to split the Product and Quantity
column. The new column name is Quantity
.
If you see the example below, some rows will begin with quantity information in [2] and others in [1]. Also, I cannot use [-] because in the below example, the second split on '-' will work, but 3rd and 4th row will be incorrect
Product and Quantity
ABC-BBC-Bottle- 1 - 30 mg
BBC-44-Capsule- 10 - 500mg
KKP-Bottle- 5 - 30 mg
R2B-Powder-500mg
I need help with below:
- How can I split when the position of '-' is not always the same in all rows?
- How do I store all the values either before or after [-]. I know I can use [-2] for after and [2] for before. But its not storing all the values after I split using [2] or before when I use [-2]?
Currently, it looks like the below.
df = source_df[['Product and Quantity']]
df['Quantity'] = df['Product and Quantity'].str.split('-').str[2]
The output looks like the below.
Quantity
Bottle
Capsule
5
500mg
I want it to look like the below.
Quantity
Bottle - 1 - 30 mg
Capsule - 10 - 500mg
Bottle - 5 - 30 mg
Powder - 500mg
CodePudding user response:
df['Quantity'] = df[0].str.replace(' ', '').str.findall('\w -\d*-*\d*mg').str[0].str.replace('-', ' - ')
Output:
Product and Quantity Quantity
0 ABC-BBC-Bottle- 1 - 30 mg Bottle - 1 - 30mg
1 BBC-44-Capsule- 10 - 500mg Capsule - 10 - 500mg
2 KKP-Bottle- 5 - 30 mg Bottle - 5 - 30mg
3 R2B-Powder-500mg Powder - 500mg
CodePudding user response:
Reliable method: use a regex!
regex = r'[^-] -((?:[^-] -){,2}[^-] )$'
df['Quantity'] = df['Product and Quantity'].str.extract(regex)
Output:
Product and Quantity Quantity
0 ABC-BBC-Bottle- 1 - 30 mg Bottle- 1 - 30 mg
1 BBC-44-Capsule- 10 - 500mg Capsule- 10 - 500mg
2 KKP-Bottle- 5 - 30 mg Bottle- 5 - 30 mg
3 R2B-Powder-500mg Powder-500mg