I'd like to update values in a column ['D'] of my data frame based on the substrings contained in column ['A']
I have dataframe df, where column D is an exact copy of column A. Column A has list of different cereal names. But I want to check if A contains a certain word and if so, then column D updates to show that word.
For example, If A contains the word Bran then D should update to show Bran. And if column A contains none of the criteria we are looking for then it should return Other.
name | Type of Cereal |
---|---|
Bran Cereal | Bran |
Fiber Cereal | Fiber |
Nut Cereal | Nut |
So far I have tried this, but I am not able to change the rest of the columns without any criteria words to 'Other'
df.loc[df['Type of Cereal'].str.contains('Fiber'), 'Type of Cereal'] = 'Fiber'
df.loc[df['Type of Cereal'].str.contains('Bran'), 'Type of Cereal'] = 'Bran'
df.loc[df['Type of Cereal'].str.contains('Nut'), 'Type of Cereal'] = 'Nut'
I have also tried creating a function, but I'm not sure how to loop through each value in column A
def conditions1(s):
CerealTypeFiber = df['name'].str.contains('Fiber').sum()
CerealTypeBran = df['name'].str.contains('Bran').sum()
if CerealTypeFiber != None:
return 'Fiber'
elif CerealTypeBran != None:
return 'Bran'
else:
return 'Other'
df['Cereal Category'] = df.apply(conditions1, axis=1)
This just updates the whole column to Fiber.
Any thoughts/suggestions?
CodePudding user response:
Other way using pandas.Series.str.extract
:
df["Type of Cereal"] = df["name"].str.extract("(Bran|Fiber|Nut)").fillna("Other")
print(df)
Output (Note 4th row in case of no match):
name Type of Cereal
0 Bran Cereal Bran
1 Fiber Cereal Fiber
2 Nut Cereal Nut
3 Something Other
CodePudding user response:
You might be able to get away with just using np.where
with str.extract
here:
df["Type of Cereal"] = np.where(df["name"].str.contains('cereal'),
df["name"].str.extract(r'^(.*) Cereal$'),
"Other")