python changing the numerical values of a string based on calculations within its own string-CodePudding

I'm working with a dataframe with medicinal products and I have to extract the dosage out of the name (string), and later change the original product name with the reduced form of the dosage.

Example of what I have:

Name
'Prenoxad 2mg/2ml solution for injection pre-filled syringes'

I want to have, stored in a new column:

Name_reduced
'Prenoxad 1mg/ml solution for injection pre-filled syringes'

Another example is having 250mg/5ml, and wanting to have 50mg/ml.

I want to do this for every product in the dataframe that needs to have its dosage reduced. Not all products have the dosage in their name, and also some products have different dosages in their name that don't need any reduction, for example:

Co-amoxiclav 250mg/125mg tablets

So I think the best way to do this might be to only apply the reduction method on products containing '/', 'mg' and 'ml', since this reduction only needs to happen when 'mg' and 'ml' are present. And also for products that don't have the exact string 'mg/ml' in their name, as it only happens for the dosages already in their reduced form.

I can extract the section of the string that I want to use like this:

txt = "Prenoxad 2mg/2ml solution for injection pre-filled syringes"
x = re.findall("\d. /*\d.{2}",txt)
print(x)

#['2mg/2ml']

But I don't know what to do after this, what is the best way to do this 'reduction' method?

CodePudding user response：

Assuming a DataFrame as input, you can use a custom function with str.replace:

def simplify(m):
    q1, u1, q2, u2 = m.groups()
    q1, q2 = int(q1), int(q2)
    if set([u1,u2])>{'mg', 'ml'}:
        return f'{q1}{u1}/{q2}{u2}'
    else:
        q = q1/q2
        if int(q) == q:
            q = int(q)
        return f'{q}{u1}/{u2}'
        
df['Name'] = df['Name'].str.replace('(\d )(..)/(\d )(..)', simplify, regex=True)

output (as Name2 column for comparison):

                        Name                     Name2
0  Prenoxad 2mg/2ml solution  Prenoxad 1mg/ml solution

Used input:

df = pd.DataFrame({'Name': ['Prenoxad 2mg/2ml solution']})

CodePudding user response：

You can do it like this:

import re
txt = "Prenoxad 2mg/2ml solution for injection pre-filled syringes"
x = re.findall("\d. /*\d.{2}",txt)[0]
items = ['mg', 'ml', '/']
if all([item in x for item in items]):
    mg, ml = re.findall(r"(\d )",txt)
    ratio = float(mg)/float(ml)
    txt.replace(x, f'{ratio}mg/ml')
txt

Output:

'Prenoxad 1.0mg/ml solution for injection pre-filled syringes'

CodePudding user response：

First you need the function that properly processes you text:

def reduce_name(txt):
    re_digits = '(\d )mg/(\d )ml'
    x = re.findall(re_digits,txt)
    if len(x) > 0:
        reduced_value = int(x[0][0]) // int(x[0][1])
        reduced_txt = re.sub(re_digits, f'{reduced_value}mg/ml', txt)
        return reduced_txt
    else:
        return txt

And if you want to apply in to the entire column, you can do it like this:

df['column_name'].apply(reduce_name)

Please pay attention to the possible specific cases and adjust the code accordingly:

сheck if numeric values can contain thousands separators, e.g. 2,500mg/...
сheck if spaces can appear inside a regular expression
сheck if the first value is always divisible by the second without remainder

CodePudding user response：

You can use the fractions module to write a function that reduces the ratios. Then, you can use re.sub with a lambda to find the ratio and replace it.

import re
from fractions import Fraction

def replace_ratio(ratio):
    fraction = Fraction(int(ratio.group(1)), int(ratio.group(2)))
    numerator = fraction.numerator
    denominator = "" if fraction.denominator == 1 else fraction.denominator
    return f"{numerator}ml/{denominator}mg"

def process_text(text):
    return re.sub("(\d )mg/(\d )ml", lambda ratio: replace_ratio(ratio), text)

print(process_text("Prenoxad 2mg/2ml solution for injection pre-filled syringes")) 
# -> Prenoxad 1ml/mg solution for injection pre-filled syringes

print(process_text("Prenoxad 10mg/3ml solution for injection pre-filled syringes"))
# -> Prenoxad 10ml/3mg solution for injection pre-filled syringes

print(process_text("Prenoxad 120mg/6ml solution for injection pre-filled syringes"))
# -> Prenoxad 20ml/mg solution for injection pre-filled syringes