Home > Enterprise >  Pandas: Changing values of a column based on the string of another column
Pandas: Changing values of a column based on the string of another column

Time:12-08

I have a dataframe with two columns, one with the name of several products[SKU Name] and the other with the units of these products[Quantity_SKU].

SKU Name Quantity_SKU
Box chocolate 24 unds 1
Box chocolate 12 unds 1
Box Apple and Bananas 24 unds 1
Box Apple and Bananas 12 unds 1
Apple and Bananas 1
chocolate 1

The problem is that some sku have more than one unit, but in the units column it is only counting as one. I need to create a rule that when something like: "24 units" appears, in the units column it changes from number 1 to 24. What I thought quickly was to create a conditional, where this term appeared, it would create a new column[units] where it would add 24, for example. So I created a list comprehension, with a conditional of 24 und, but when I went to create another if, the syntax was wrong.

 df["units"] = [
24
if (
    (x == "Box chocolate 24 unds")
    or (x == "Box Apples e Bananas 24 unds")
)
 12 (
   if (x == "Box chocolate 12 unds)")
    or (x == "Box Apples e Bananas 12 unds")
)
else 800
for x in df["SKU Name"]

]

Can someone help me?

CodePudding user response:

If you want to purely use list comprehension, to include elif scenarios you have to put them inside another else and use if again:

df["unidades"] = [24 if (x == "chocolate 24" or x== "Bananas 24") \
                  else 12 if (x == "chocolate 12" or x == "Bananas 12") \ 
                  else 800 \
                  for x in df["SKU Name"]]

Or for a better more pythonic use of pandas, use column indexing and filtering:

mask = (df["SKU NAME"] == "chocolate 24") or (df["SKU Name"] == "Bananas 24")
df.loc[mask, "unidades"] = 24

And repeating the same filtering for 12 and so on. You can read more about pandas innate vectorized filtering and indexing on columns here

CodePudding user response:

You can use a regex match to extract the pattern of numbers followed by unds , and assign it. If there is no match it returns NaN, so you can set the default to 800 using fillna.

pattern = r'(\d ) unds'
df['units'] = df['SKU Name'].str.extract(pattern).fillna(800).astype(int)
  • Related