Update different values in a column based on multiple conditions-CodePudding

I'd like to update values in a column ['D'] of my data frame based on the substrings contained in column ['A']

I have dataframe df, where column D is an exact copy of column A. Column A has list of different cereal names. But I want to check if A contains a certain word and if so, then column D updates to show that word.

For example, If A contains the word Bran then D should update to show Bran. And if column A contains none of the criteria we are looking for then it should return Other.

name	Type of Cereal
Bran Cereal	Bran
Fiber Cereal	Fiber
Nut Cereal	Nut

So far I have tried this, but I am not able to change the rest of the columns without any criteria words to 'Other'

df.loc[df['Type of Cereal'].str.contains('Fiber'), 'Type of Cereal'] = 'Fiber'

df.loc[df['Type of Cereal'].str.contains('Bran'), 'Type of Cereal'] = 'Bran'

df.loc[df['Type of Cereal'].str.contains('Nut'), 'Type of Cereal'] = 'Nut'

I have also tried creating a function, but I'm not sure how to loop through each value in column A

def conditions1(s):

    CerealTypeFiber = df['name'].str.contains('Fiber').sum()

    CerealTypeBran = df['name'].str.contains('Bran').sum()
    
    if CerealTypeFiber != None:
        return 'Fiber'
    
    elif CerealTypeBran != None:
        return 'Bran'
    
    else:
        return 'Other'
    
df['Cereal Category'] = df.apply(conditions1, axis=1)

This just updates the whole column to Fiber.

Any thoughts/suggestions?

CodePudding user response：

Other way using pandas.Series.str.extract:

df["Type of Cereal"] = df["name"].str.extract("(Bran|Fiber|Nut)").fillna("Other")
print(df)

Output (Note 4th row in case of no match):

           name Type of Cereal
0   Bran Cereal           Bran
1  Fiber Cereal          Fiber
2    Nut Cereal            Nut
3     Something          Other

CodePudding user response：

You might be able to get away with just using np.where with str.extract here:

df["Type of Cereal"] = np.where(df["name"].str.contains('cereal'),
                                df["name"].str.extract(r'^(.*) Cereal$'),
                                "Other")