I would like extract and store all latin words from multilingual string to separate column. Desired output 'hhhh tcx cord\with plastic end / light mustard cm non woven grid socks'
I tried to use basic expression but it did not work
st={'string':['hhhh 15-0850tcx cord\with plastic end / light mustard -82cm шнур нужд вес 07 кг','1. 06900000027899 non woven 12 grid socks']}
s = pd.DataFrame(st)
re.findall("[^a-zA-Z]", s)
TypeError: expected string or bytes-like object
CodePudding user response:
Use Series.str.findall
:
df = pd.DataFrame(st)
df['new'] = df['string'].str.findall(r"[a-zA-Z] ")
print (df)
string \
0 hhhh 15-0850tcx cord\with plastic end / light ...
1 1. 06900000027899 non woven 12 grid socks
new
0 [hhhh, tcx, cord, with, plastic, end, light, m...
1 [non, woven, grid, socks]