Home > Blockchain >  If a string or multiple strings from a list matches the value(strings) from a specific column then,
If a string or multiple strings from a list matches the value(strings) from a specific column then,

Time:01-23

I have a pandas data frame and a list. I am trying to match the list of string with the data frame and fill a new column named system_name with the matched string or strings.

import pandas as pd
import re

df = pd.DataFrame({"A":["ailerons and landing gear is jammed.",
                        "something went wrong in flaps and slats. Flaps are open",
                        "engine is lost with flap and brake but right brake is working",
                        "I dont know anything about elevator and engine. Engine is lost "]})

systems = ['aileron', 'landing gear', 'flap', 'slat', 'engine', 'brake', 'elevator']

I tried with this code below:

df['system_name'] = df['A'].str.findall('('   '|'.join(systems)   ')', flags=re.IGNORECASE).str.join(',')

and getting this outcome:

enter image description here

Here its extracting all matches. Instead of all matches, I want an element from the list to match once in every row in column A and show in the output. So for an example, for index 1, instead of flap, slat, Flap, the outcome should show flap, slat.

CodePudding user response:

If possible test only lowercase values add Series.str.lower with lambda function form remove duplicates by dict.fromkeys trick:

df['system_name'] = (df['A'].str.lower()
                            .str.findall('('   '|'.join(systems)   ')')
                            .agg(lambda x: ','.join(dict.fromkeys(x))))

print (df)
                                                   A           system_name
0               ailerons and landing gear is jammed.  aileron,landing gear
1  something went wrong in flaps and slats. Flaps...             flap,slat
2  engine is lost with flap and brake but right b...     engine,flap,brake
3  I dont know anything about elevator and engine...       elevator,engine

If need ignore cases use custom function with join:

#https://stackoverflow.com/a/48283398/2901002
def f(x):
    result=[]

    marker = set()
    
    for l in x:
        ll = l.lower()
        if ll not in marker:   # test presence
            marker.add(ll)
            result.append(l)   # preserve order
    return ','.join(result)


df['system_name'] = (df['A'].str.findall(' ('  '|'.join(systems)   ')', flags=re.IGNORECASE)
                                .agg(f))

print (df)
                                                   A           system_name
0               ailerons and landing gear is jammed.  aileron,landing gear
1  something went wrong in flaps and slats. Flaps...             flap,slat
2  engine is lost with flap and brake but right b...     engine,flap,brake
3  I dont know anything about elevator and engine...       elevator,engine
  • Related