Home > Back-end >  Extract substring numbers from string pandas
Extract substring numbers from string pandas

Time:09-07

I've a list like this

lis=["proc_movieclip1_0.450-16.450.wav", "proc_movieclip1_17.700-23.850.wav", "proc_movieclip1_25.800-29.750.wav"]

I've converted into df by

import numpy as np
import pandas as pd

dfs = pd.DataFrame(mylist2)
dfs.columns=['path']
dfs

so dfs look like this

      path
0   proc_movieclip1_0.450-16.450.wav
1   proc_movieclip1_17.700-23.850.wav
2   proc_movieclip1_25.800-29.750.wav

I just wanto extract this num range in string as a new column as follows

range 

0.450-16.450

17.700-23.850

25.800-29.750

what I've tried.

dfs.path.str.extract('(\d )')

output

    0
0   1
1   1
2   1

Also tried

dfn = dfs.assign(path = lambda x: x['path'].str.extract('(\d )'))

I got same output as above...Am i missing anything?

CodePudding user response:

You need to use a more complex regex here:

dfs['path'].str.extract(r'(\d (?:\.\d )?-\d (?:\.\d )?)')

output:

               0
0   0.450-16.450
1  17.700-23.850
2  25.800-29.750

regex demo

CodePudding user response:

If you're unfamiliar with regex, you would want to use str.split() method:

def Extractor(string):
    num1, num2 = string.split('_')[-1][:-4].split('-')
    return (float(num1), float(num2))

Result:

>>> Extractor('proc_movieclip1_0.450-16.450.wav')
(0.45, 16.45)

Lambda one-liner:

lambda x: tuple([float(y) for y in x.split('_')[-1][:-4].split('-')])
  • Related