Home > OS >  Python extracting string
Python extracting string

Time:03-23

I have a dataframe where one of the columns which is in string format looks like this

    filename
 0  Machine02-2022-01-28_00-21-45.blf.424
 1  Machine02-2022-01-28_00-21-45.blf.425
 2  Machine02-2022-01-28_00-21-45.blf.426
 3  Machine02-2022-01-28_00-21-45.blf.427
 4  Machine02-2022-01-28_00-21-45.blf.428

I want my column to look like this

      filename
 0    2022-01-28 00-21-45 424
 1    2022-01-28 00-21-45 425
 2    2022-01-28 00-21-45 426
 3    2022-01-28 00-21-45 427
 4    2022-01-28 00-21-45 428

I tried this code

df['filename'] = df['filename'].str.extract(r"(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d )", r"\1 \2 \3")

I am getting this error, unsupported operand type(s) for &: 'str' and 'int'.
Can anyone please tell me where I am doing wrong ?

CodePudding user response:

Use str.replace and add .*- to remove strings like Machine02-:

df['filename'] = df['filename'].str.replace(r".*-(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d )", r"\1 \2 \3")
print(df)

# Output
                  filename
0  2022-01-28 00-21-45 424
1  2022-01-28 00-21-45 425
2  2022-01-28 00-21-45 426
3  2022-01-28 00-21-45 427
4  2022-01-28 00-21-45 428

CodePudding user response:

please try this:

df['filename'] = df['filename'].str.split('-',1).apply(lambda x:' '.join(x[1].split('_')).replace('.blf.',' '))

CodePudding user response:

Use replace

df['filename']=df['filename'].str.replace('Machine|\.blf\.',' ',regex=True).str.strip().str.replace('^\d \-','',regex=True)



 filename
0  2022-01-28_00-21-45 424
1  2022-01-28_00-21-45 425
2  2022-01-28_00-21-45 426
3  2022-01-28_00-21-45 427
4  2022-01-28_00-21-45 428

or

Extract values between e02 and .blf

df['filename']=df['filename'].str.extract('((?<=[e02])[\w|\-] (?=[.blf]))')



    filename
0  02-2022-01-28_00-21-45
1  02-2022-01-28_00-21-45
2  02-2022-01-28_00-21-45
3  02-2022-01-28_00-21-45
4  02-2022-01-28_00-21-45

CodePudding user response:

Regex are nice, but sometimes is easier and more readable to make a replace, if the arguments won't ever change:

df['filename'] = df['filename'].str.replace('Machine02-','',regex=False)
df['filename'] = df['filename'].str.replace('.blf.',' ',regex=False)
  • Related