Home > Enterprise >  pandas: match substring from a column in dataframe with another dataframe column
pandas: match substring from a column in dataframe with another dataframe column

Time:09-23

I have two dataframe like the following but with more rows:

data = {'First':  [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': [['adj','noun'],['adj','noun'],['adj','noun','verb'],['adj','noun','verb']]}

df = pd.DataFrame (data, columns = ['First','Second'])

data2 = {'example':  ['First value is important', 'second value is imprtant too','it us goof to know']}

df2 = pd.DataFrame (data2, columns = ['example'])

I wrote a function that checks if the first word in the example column can be found in the First column in the first dataframe, and if true return the string, like the following:

def reader():
    for l in [l for l in df2.example]:
        if df["first"].str.contains(pat=l.split(' ', 1)[0]).any() is True:
           return l

However, i realized that it would not work because the First column in df is a list of strings, so I made the following modification:

def reader():
    for l in [l for l in df2.example]:
        df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
        if df["first_unlist"].str.contains(pat=l.split(' ', 1)[0]).any() is True:
            return l

however, i still get 'None' when i run the function, and I cannot figure out what is wrong here.

Update:

I would like the function to return the first two strings in the example column, 'First value is important', 'second value is imprtant too'

CodePudding user response:

Your function doesn's return False when the first word in the example column can not be found. Here is the revision.

def reader():
    for l in [l for l in df2.example]:
        df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
   
        if df["first_unlist"].str.contains(pat=l.split(' ', 1)[0]).any() is True:
            return l
    return list(df2.example[:2])

reader()
  • Related