I've a list like this
lis=["proc_movieclip1_0.450-16.450.wav", "proc_movieclip1_17.700-23.850.wav", "proc_movieclip1_25.800-29.750.wav"]
I've converted into df by
import numpy as np
import pandas as pd
dfs = pd.DataFrame(mylist2)
dfs.columns=['path']
dfs
so dfs look like this
path
0 proc_movieclip1_0.450-16.450.wav
1 proc_movieclip1_17.700-23.850.wav
2 proc_movieclip1_25.800-29.750.wav
I just wanto extract this num range in string as a new column as follows
range
0.450-16.450
17.700-23.850
25.800-29.750
what I've tried.
dfs.path.str.extract('(\d )')
output
0
0 1
1 1
2 1
Also tried
dfn = dfs.assign(path = lambda x: x['path'].str.extract('(\d )'))
I got same output as above...Am i missing anything?
CodePudding user response:
You need to use a more complex regex here:
dfs['path'].str.extract(r'(\d (?:\.\d )?-\d (?:\.\d )?)')
output:
0
0 0.450-16.450
1 17.700-23.850
2 25.800-29.750
CodePudding user response:
If you're unfamiliar with regex, you would want to use str.split() method:
def Extractor(string):
num1, num2 = string.split('_')[-1][:-4].split('-')
return (float(num1), float(num2))
Result:
>>> Extractor('proc_movieclip1_0.450-16.450.wav')
(0.45, 16.45)
Lambda one-liner:
lambda x: tuple([float(y) for y in x.split('_')[-1][:-4].split('-')])