I have a pandas data frame and a list. I am trying to match the list of string with the data frame and fill a new column named system_name with the matched string or strings.
import pandas as pd
import re
df = pd.DataFrame({"A":["ailerons and landing gear is jammed.",
"something went wrong in flaps and slats. Flaps are open",
"engine is lost with flap and brake but right brake is working",
"I dont know anything about elevator and engine. Engine is lost "]})
systems = ['aileron', 'landing gear', 'flap', 'slat', 'engine', 'brake', 'elevator']
I tried with this code below:
df['system_name'] = df['A'].str.findall('(' '|'.join(systems) ')', flags=re.IGNORECASE).str.join(',')
and getting this outcome:
Here its extracting all matches. Instead of all matches, I want an element from the list to match once in every row in column A and show in the output. So for an example, for index 1, instead of flap, slat, Flap, the outcome should show flap, slat.
CodePudding user response:
If possible test only lowercase values add Series.str.lower
with lambda function form remove duplicates by dict.fromkeys
trick:
df['system_name'] = (df['A'].str.lower()
.str.findall('(' '|'.join(systems) ')')
.agg(lambda x: ','.join(dict.fromkeys(x))))
print (df)
A system_name
0 ailerons and landing gear is jammed. aileron,landing gear
1 something went wrong in flaps and slats. Flaps... flap,slat
2 engine is lost with flap and brake but right b... engine,flap,brake
3 I dont know anything about elevator and engine... elevator,engine
If need ignore cases use custom function with join
:
#https://stackoverflow.com/a/48283398/2901002
def f(x):
result=[]
marker = set()
for l in x:
ll = l.lower()
if ll not in marker: # test presence
marker.add(ll)
result.append(l) # preserve order
return ','.join(result)
df['system_name'] = (df['A'].str.findall(' (' '|'.join(systems) ')', flags=re.IGNORECASE)
.agg(f))
print (df)
A system_name
0 ailerons and landing gear is jammed. aileron,landing gear
1 something went wrong in flaps and slats. Flaps... flap,slat
2 engine is lost with flap and brake but right b... engine,flap,brake
3 I dont know anything about elevator and engine... elevator,engine