I have a dataframe with two columns, one with the name of several products[SKU Name] and the other with the units of these products[Quantity_SKU].
SKU Name | Quantity_SKU |
---|---|
Box chocolate 24 unds | 1 |
Box chocolate 12 unds | 1 |
Box Apple and Bananas 24 unds | 1 |
Box Apple and Bananas 12 unds | 1 |
Apple and Bananas | 1 |
chocolate | 1 |
The problem is that some sku have more than one unit, but in the units column it is only counting as one. I need to create a rule that when something like: "24 units" appears, in the units column it changes from number 1 to 24. What I thought quickly was to create a conditional, where this term appeared, it would create a new column[units] where it would add 24, for example. So I created a list comprehension, with a conditional of 24 und, but when I went to create another if, the syntax was wrong.
df["units"] = [
24
if (
(x == "Box chocolate 24 unds")
or (x == "Box Apples e Bananas 24 unds")
)
12 (
if (x == "Box chocolate 12 unds)")
or (x == "Box Apples e Bananas 12 unds")
)
else 800
for x in df["SKU Name"]
]
Can someone help me?
CodePudding user response:
If you want to purely use list comprehension, to include elif
scenarios you have to put them inside another else
and use if
again:
df["unidades"] = [24 if (x == "chocolate 24" or x== "Bananas 24") \
else 12 if (x == "chocolate 12" or x == "Bananas 12") \
else 800 \
for x in df["SKU Name"]]
Or for a better more pythonic use of pandas, use column indexing and filtering:
mask = (df["SKU NAME"] == "chocolate 24") or (df["SKU Name"] == "Bananas 24")
df.loc[mask, "unidades"] = 24
And repeating the same filtering for 12 and so on. You can read more about pandas innate vectorized filtering and indexing on columns here
CodePudding user response:
You can use a regex match to extract the pattern of numbers followed by unds , and assign it. If there is no match it returns NaN, so you can set the default to 800 using fillna.
pattern = r'(\d ) unds'
df['units'] = df['SKU Name'].str.extract(pattern).fillna(800).astype(int)