pandas extract latin words from multi language string to a separate column-CodePudding

I would like extract and store all latin words from multilingual string to separate column. Desired output 'hhhh tcx cord\with plastic end / light mustard cm non woven grid socks'

I tried to use basic expression but it did not work

st={'string':['hhhh 15-0850tcx cord\with plastic end / light mustard -82cm  шнур нужд вес 07 кг','1. 06900000027899 non woven 12 grid socks']}
s = pd.DataFrame(st)
re.findall("[^a-zA-Z]", s)

TypeError: expected string or bytes-like object

CodePudding user response：

Use Series.str.findall:

df = pd.DataFrame(st)

df['new'] = df['string'].str.findall(r"[a-zA-Z] ")
print (df)
                                              string  \
0  hhhh 15-0850tcx cord\with plastic end / light ...   
1          1. 06900000027899 non woven 12 grid socks   

                                                 new  
0  [hhhh, tcx, cord, with, plastic, end, light, m...  
1                          [non, woven, grid, socks]