I have a dataframe where one of the columns which is in string format looks like this
filename
0 Machine02-2022-01-28_00-21-45.blf.424
1 Machine02-2022-01-28_00-21-45.blf.425
2 Machine02-2022-01-28_00-21-45.blf.426
3 Machine02-2022-01-28_00-21-45.blf.427
4 Machine02-2022-01-28_00-21-45.blf.428
I want my column to look like this
filename
0 2022-01-28 00-21-45 424
1 2022-01-28 00-21-45 425
2 2022-01-28 00-21-45 426
3 2022-01-28 00-21-45 427
4 2022-01-28 00-21-45 428
I tried this code
df['filename'] = df['filename'].str.extract(r"(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d )", r"\1 \2 \3")
I am getting this error, unsupported operand type(s) for &: 'str' and 'int'.
Can anyone please tell me where I am doing wrong ?
CodePudding user response:
Use str.replace
and add .*-
to remove strings like Machine02-
:
df['filename'] = df['filename'].str.replace(r".*-(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d )", r"\1 \2 \3")
print(df)
# Output
filename
0 2022-01-28 00-21-45 424
1 2022-01-28 00-21-45 425
2 2022-01-28 00-21-45 426
3 2022-01-28 00-21-45 427
4 2022-01-28 00-21-45 428
CodePudding user response:
please try this:
df['filename'] = df['filename'].str.split('-',1).apply(lambda x:' '.join(x[1].split('_')).replace('.blf.',' '))
CodePudding user response:
Use replace
df['filename']=df['filename'].str.replace('Machine|\.blf\.',' ',regex=True).str.strip().str.replace('^\d \-','',regex=True)
filename
0 2022-01-28_00-21-45 424
1 2022-01-28_00-21-45 425
2 2022-01-28_00-21-45 426
3 2022-01-28_00-21-45 427
4 2022-01-28_00-21-45 428
or
Extract values between e02 and .blf
df['filename']=df['filename'].str.extract('((?<=[e02])[\w|\-] (?=[.blf]))')
filename
0 02-2022-01-28_00-21-45
1 02-2022-01-28_00-21-45
2 02-2022-01-28_00-21-45
3 02-2022-01-28_00-21-45
4 02-2022-01-28_00-21-45
CodePudding user response:
Regex are nice, but sometimes is easier and more readable to make a replace, if the arguments won't ever change:
df['filename'] = df['filename'].str.replace('Machine02-','',regex=False)
df['filename'] = df['filename'].str.replace('.blf.',' ',regex=False)