I need to loop through a dataframe containing a list of strings and return a column based on the ord-CodePudding

I would like to loop through the "FRUIT" column in the dataframe below and output a boolean value in another column,called "STATUS" based on the ordering of the values in the "FRUIT" column. Whenever the string "PINE" is positioned before the word "ORANGE" in the list I need the value in the STATUS column to be "TRUE" otherwise the value would be "FALSE"

Initial dataframe

I tried the code below but did not get the expected result:

datadf = {'ID': ['1000', '1001', '1002', '1003','1004','1005','1006'], 'FRUIT': [["ORANGE","ORANGE"],["ORANGE","PINE"],["PINE","ORANGE"],["PINE","PINE","ORANGE"],["ORANGE","PINE","ORANGE"],["ORANGE","PINE"],["ORANGE","ORANGE","PINE"]]} 

def FRUIT_STATUS(datadf):
    counter=0
    for i in range(len(datadf['FRUIT'])):
        if ("PINE" in datadf['FRUIT'] ):
            return "TRUE" 
    else:
        return "FALSE" 
    
datadf['STATUS'] = datadf.apply(FRUIT_STATUS, axis = 1)

The Final dataframe should look like this:

Desired Dataframe

CodePudding user response：

One possibility would be to use regular expressions. Note that this also will return True if somethings else is between PINE and ORANGE. Depending on what you want exactly, you can adjust the regular expression.

import pandas as pd

datadf = {'ID': ['1000', '1001', '1002', '1003','1004','1005','1006'], 'FRUIT': [["ORANGE","ORANGE"],["ORANGE","PINE"],["PINE","ORANGE"],["PINE","PINE","ORANGE"],["ORANGE","PINE","ORANGE"],["ORANGE","PINE"],["ORANGE","ORANGE","PINE"]]} 

df = pd.DataFrame(datadf)

df['STATUS'] = df.FRUIT.astype(str).str.contains(r'PINE.*ORANGE')