I have a dataframe where I want to analyze the string word by word. Eg. I have string:
Hose clip 8-12 mm W4 9 mm edge SST left right
and on this I wanted to apply this matcher to extract only few values from that description
def matcher(x):
for i in attributes:
if i.lower() in x.lower():
return i
else:
return np.nan
Hence I created
attributes =['left','right']
and called it like
df['Colours'] = df['pre_descr'].apply(matcher)
I thought it will give me all occurencies, but is stops after finding the first find. So I get only
'left'
Then I thought I would split the string by ' ' and store this list into pandas column like this
a = 0
for i in df['pre_descr']:
df.at[a, 'pre_descr_list']= i.split(' ')
a =1
and iterate over the values and store them in there is they are in the attributes list but!
- This gives me error ValueError: Must have equal len keys and value when setting with an iterable
- But I see the list:
['Hose', 'clip', '8-12', 'mm', 'W4', '9', 'mm', 'edge', 'SST', 'left', 'right']
Please, how would you solve it? I think I have it overcomplicated and it should be easier... but I dont know how to even specify it... Maybe the first thing = to store the values in the column as list is not even needed? Thanks!
CodePudding user response:
I believe that I need to know: Why you want this list? This list you be used for what? This may clarify the intended solution.
But, let's answer your question.
The following code must resolve your problem:
attributes = ['left', 'right']
df['pre_descr'].apply(lambda x: [word for word in x.lower().split(" ") if word in attributes])