Given a df
df=pd.DataFrame(['/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__131147.png',
'/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__160565.png'])
I would like to extract only the integer just before the file extension.
The code below should answer the above objective
df['fname'] =df[0].apply(lambda x : os.path.split(x)[1])
df['f'] =df['fname'].apply(lambda x : x.split('__')[1].split('.png')[0])
df['f']=df['f'].astype(int)
However, I have the impression this can be achieve easily using pandas build-in split
, such as below
df['f']=df[0].str.split(re.compile(r"__\d.jpg"), expand=True)
But, it seems nothing is being split. May I know what parameter not being set correctly?
CodePudding user response:
You can use Series.str.extract
:
df['num'] = df['f'].str.extract(r'_(\d )\.[^.] $', expand=False)
Details:
_
- an underscore(\d )
- Capturing group 1 (this is the value returned bySeries.str.extract
): one or more digits\.
- a.
char[^.]
- one or more chars other than a.
char$
- end of string
Python test:
import pandas as pd
df = pd.DataFrame({'f':['/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__131147.png',
'/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__160565.png']})
df['num'] = df['f'].str.extract(r'_(\d )\.[^.] $', expand=False)
print(df.to_string())
Output:
f num
0 /home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__131147.png 131147
1 /home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__160565.png 160565
CodePudding user response:
Assuming 0
the name of your column (as in your example), you can use str.extract
:
df[0].str.extract(r'(\d )\.[^.] $', expand=False)
output:
0 131147
1 160565
Name: 0, dtype: object
To assign to a new column:
df['f'] = df[0].str.extract(r'(\d )\.[^.] $')
CodePudding user response:
def extract(values):
values = values.split('__') # cut at '__'
return int(values[-1].replace('.png','')) # take the last part en replace the .png
df[0].apply(extract)