I have a csv file that I need to change some values in based on 2 other values. Normally i would use if statements but as we all know pandas is not that easy to do that with.
but how can I now change Cat_Tier_2
to wireless or corded etc based on the word wireless or corded in the description for that line only?
Name x brand mouse
Description I am a wireless mouse
Cat_Tier_1 Peripherals
Cat_Tier_2 Mouse
So I can change Cat_Tier_1
to mouse by
df["Cat_Tier_1"] = df["Cat_Tier_1"].str.replace("Peripherals", "Mouse")
using pandas I cant use the good old if
statements e.g
if "wireless" in df["Description"]:
df["Cat_Tier_2"].str.replace("Mouse", "Wireless")
elif "corded" in df["Description"]:
df["Cat_Tier_2"].str.replace("Mouse", "Corded")
else:
pass
CodePudding user response:
This will do what your question asks:
df.loc[df.Description.str.contains('wireless'), 'Cat_Tier_2'] = 'wireless'
df.loc[df.Description.str.contains('corded'), 'Cat_Tier_2'] = 'corded'
Input:
Name Description Cat_Tier_1 Cat_Tier_2
0 x brand mouse I am a wireless mouse Peripherals Mouse
Output:
Name Description Cat_Tier_1 Cat_Tier_2
0 x brand mouse I am a wireless mouse Peripherals wireless
CodePudding user response:
Use Series.str.contains
for filter only matched rows by condition in DataFrame.loc
if need replace substring:
m1 = df["Description"].str.contains('wireless')
m2 = df["Description"].str.contains('corded')
df.loc[m1, "Cat_Tier_2"] = df.loc[m1, "Cat_Tier_2"].str.replace("Mouse", "Wireless")
df.loc[m2, "Cat_Tier_2"] = df.loc[m2, "Cat_Tier_2"].str.replace("Mouse", "Corded")
CodePudding user response:
There are multiple ways you can deal with that.
If you still want to use your if-else or maybe in the future you want to build complex logic to apply to columns one of the ways is pandas apply
where you can pass columns and functions you want to apply, for example:
def func(value):
new_val = None
if "wireless" in value:
new_val = value.replace("Mouse", "Wireless")
elif "corded" in value:
new_val = value.replace("Mouse", "Corded")
else:
new_val = value
return new_val
df['Cat_Tier_2'] = df['Description'].apply(func)
but you need this if you have very complex logic. This problem can be fixed with a more simple approach with loc
like jezrael supposed.