How to split and keep all the values from a dataframe in new column?-CodePudding

I need to split the Product and Quantity column. The new column name is Quantity.

If you see the example below, some rows will begin with quantity information in [2] and others in [1]. Also, I cannot use [-] because in the below example, the second split on '-' will work, but 3rd and 4th row will be incorrect

Product and Quantity
ABC-BBC-Bottle- 1 - 30 mg
BBC-44-Capsule- 10 - 500mg
KKP-Bottle- 5 - 30 mg
R2B-Powder-500mg

I need help with below:

How can I split when the position of '-' is not always the same in all rows?
How do I store all the values either before or after [-]. I know I can use [-2] for after and [2] for before. But its not storing all the values after I split using [2] or before when I use [-2]?

Currently, it looks like the below.

df = source_df[['Product and Quantity']]
df['Quantity'] = df['Product and Quantity'].str.split('-').str[2]

The output looks like the below.

Quantity

Bottle
Capsule
5
500mg

I want it to look like the below.

Quantity

Bottle - 1 - 30 mg
Capsule - 10 - 500mg
Bottle - 5 - 30 mg
Powder - 500mg

CodePudding user response：

df['Quantity'] = df[0].str.replace(' ', '').str.findall('\w -\d*-*\d*mg').str[0].str.replace('-', ' - ')

Output:

         Product and Quantity              Quantity
0   ABC-BBC-Bottle- 1 - 30 mg     Bottle - 1 - 30mg
1  BBC-44-Capsule- 10 - 500mg  Capsule - 10 - 500mg
2       KKP-Bottle- 5 - 30 mg     Bottle - 5 - 30mg
3            R2B-Powder-500mg        Powder - 500mg

CodePudding user response：

Reliable method: use a regex!

regex = r'[^-] -((?:[^-] -){,2}[^-] )$'
df['Quantity'] = df['Product and Quantity'].str.extract(regex)

Output:

         Product and Quantity             Quantity
0   ABC-BBC-Bottle- 1 - 30 mg    Bottle- 1 - 30 mg
1  BBC-44-Capsule- 10 - 500mg  Capsule- 10 - 500mg
2       KKP-Bottle- 5 - 30 mg    Bottle- 5 - 30 mg
3            R2B-Powder-500mg         Powder-500mg

regex demo