I have a dataset where I would like to create a new column derived from one of my existing columns. The new column is created by taking the values after the first colon up until, and not including the last colon.
Data
site stat crate
JJ - site 1 ok JD1:0055:AB5.30:100
JJ - site 1 ok JD1:00AB:AB5.30:111
J1 - site 2 fail JD1:00AA:AB5.30:200
JJ - site 1 ok JD1:00A:AB5.30:555
CC - site 8 fail CC5:01AAA:BA8.40:777
Desired
site stat main_cr crate
JJ - site 1 ok 0055:AB5.30 JD1:0055:AB5.30:100
JJ - site 1 ok 00AB:AB5.30 JD1:00AB:AB5.30:111
J1 - site 2 fail 00AA:AB5.30 JD1:00AA:AB5.30:200
JJ - site 1 ok 00A:AB5.30 JD1:00A:AB5.30:555
CC - site 8 fail 01AAA:BA8.40 CC5:01AAA:BA8.40:777
Doing
df['main_cr']=df['crate'].str.extract(r':(\d):.*):')
I've read the documentation but am a little stuck on the placement of the extraction steps Any suggestion is helpful thank you.
CodePudding user response:
I think you can get the result you describe like this:
df['main_cr']=df['crate'].str.extract(r':(.*):')
Because .*
is greedy, it will match all the :
characters except the first and last ones.
Input:
site stat crate
0 a ok JD1:0055:AB5.30:100
1 b ok JD1:00AB:AB5.30:111
2 c fail JD1:00AA:AB5.30:200
3 d ok JD1:00A:AB5.30:555
4 e fail CC5:01AAA:BA8.40:777
Output:
site stat crate main_cr
0 a ok JD1:0055:AB5.30:100 0055:AB5.30
1 b ok JD1:00AB:AB5.30:111 00AB:AB5.30
2 c fail JD1:00AA:AB5.30:200 00AA:AB5.30
3 d ok JD1:00A:AB5.30:555 00A:AB5.30
4 e fail CC5:01AAA:BA8.40:777 01AAA:BA8.40
Full test code:
import pandas as pd
df = pd.DataFrame({
'site':['a','b','c','d','e'],
'stat':['ok','ok','fail','ok','fail'],
'crate':['JD1:0055:AB5.30:100','JD1:00AB:AB5.30:111','JD1:00AA:AB5.30:200','JD1:00A:AB5.30:555','CC5:01AAA:BA8.40:777']
})
df['main_cr']=df['crate'].str.extract(r':(.*):')
CodePudding user response:
You can consider using str.replace
with the regex ^[^:] :
ie deleting evertything from the begining until the first :
df['crate'].str.replace('^[^:] :|:[^:] $', '', regex = True)
0 0055:AB5.30
1 00AB:AB5.30
2 00AA:AB5.30
3 00A:AB5.30
4 01AAA:BA8.40
Name: crate, dtype: object