Home > database >  Find the string value from a list in a dataframe column and append the string value as a column
Find the string value from a list in a dataframe column and append the string value as a column

Time:08-07

I have a list of names and a dataframe with a column of free form text. I am trying to scan through the column of text and if it contains a string from the list then append the string as an additional column on the data frame.

I have only found ways to make it appear as a binary or True/False in the additional column.

  sys_list = ['AAAA', 'BBBB', 'AD-12', 'B31-A']
  data = {'text': ['need help with AAAA system requesting help', 'AD-12 crashed, need 
  support', 'fuel system down', '/BBBB needs refresh']}

  df = pd.DataFrame(data)

with the end result being

                text                                System
0   need help with AAAA system requesting help      AAAA
1   AD-12 crashed, need support                     AD-12
2   fuel system down                                  0
3   /BBBB needs refresh                             BBBB

I have tried

# which gives True or False values 

 pattern = '|'.join(sys_list)
 df['System'] = df['text'].str.contains(pattern)
 
 # which gives 0 or 1 
 df['System'] = [int(any(w in sys_list for w in x.split())) for x in df['text']]

CodePudding user response:

import pandas as pd
sys_list = ['AAAA', 'BBBB', 'AD-12', 'B31-A']
data = {'text': ['need help with AAAA system requesting help', 'AD-12 crashed, need support', 'fuel system down', '/BBBB needs refresh']}

df = pd.DataFrame(data)
def f(s):
    for symbol in sys_list:
        if symbol in s:
            return symbol
    return 0
df['System'] = df.text.apply(f)
print(df)

prints

index text System
0 need help with AAAA system requesting help AAAA
1 AD-12 crashed, need support AD-12
2 fuel system down 0
3 /BBBB needs refresh BBBB

Remark: this only uses the first symbol in sys_list that occurs in a string, i.e. assumes that the symbol occurrences are mutually exclusive.

CodePudding user response:

Slightly modifying your second example using :=:

df["System"] = [
    word
    if any((word := ww) in w for w in x.split() for ww in sys_list)
    else "N/A"
    for x in df["text"]
]


print(df)

Prints:

                                         text System
0  need help with AAAA system requesting help   AAAA
1                 AD-12 crashed, need support  AD-12
2                            fuel system down    N/A
3                         /BBBB needs refresh   BBBB
  • Related