Pandas: Changing values of a column based on the string of another column-CodePudding

I have a dataframe with two columns, one with the name of several products[SKU Name] and the other with the units of these products[Quantity_SKU].

SKU Name	Quantity_SKU
Box chocolate 24 unds	1
Box chocolate 12 unds	1
Box Apple and Bananas 24 unds	1
Box Apple and Bananas 12 unds	1
Apple and Bananas	1
chocolate	1

The problem is that some sku have more than one unit, but in the units column it is only counting as one. I need to create a rule that when something like: "24 units" appears, in the units column it changes from number 1 to 24. What I thought quickly was to create a conditional, where this term appeared, it would create a new column[units] where it would add 24, for example. So I created a list comprehension, with a conditional of 24 und, but when I went to create another if, the syntax was wrong.

 df["units"] = [
24
if (
    (x == "Box chocolate 24 unds")
    or (x == "Box Apples e Bananas 24 unds")
)
 12 (
   if (x == "Box chocolate 12 unds)")
    or (x == "Box Apples e Bananas 12 unds")
)
else 800
for x in df["SKU Name"]

]

Can someone help me?

CodePudding user response：

If you want to purely use list comprehension, to include elif scenarios you have to put them inside another else and use if again:

df["unidades"] = [24 if (x == "chocolate 24" or x== "Bananas 24") \
                  else 12 if (x == "chocolate 12" or x == "Bananas 12") \ 
                  else 800 \
                  for x in df["SKU Name"]]

Or for a better more pythonic use of pandas, use column indexing and filtering:

mask = (df["SKU NAME"] == "chocolate 24") or (df["SKU Name"] == "Bananas 24")
df.loc[mask, "unidades"] = 24

And repeating the same filtering for 12 and so on. You can read more about pandas innate vectorized filtering and indexing on columns here

CodePudding user response：

You can use a regex match to extract the pattern of numbers followed by unds , and assign it. If there is no match it returns NaN, so you can set the default to 800 using fillna.

pattern = r'(\d ) unds'
df['units'] = df['SKU Name'].str.extract(pattern).fillna(800).astype(int)