Creating new column from transformed existing column in Python-CodePudding

I have a dataset where I would like to create a new column derived from one of my existing columns. The column is created by extracting the values between the first and last colons.

Data

site        stat    crate   
AA - site 1 ok      AD1:00:AB5.30:100   
AA - site 1 ok      AD1:00:AB5.30:111   
A1 - site 2 fail    AD1:00:AB5.30:200   
AA - site 1 ok      AD1:00:AB5.30:555   
BB - site 8 fail    BB5:01:BA8.40:777

Desired

site        stat    main_cr     crate
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:100
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:111
A1 - site 2 fail    00:AB5.30   AD1:00:AB5.30:200
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:555
BB - site 8 fail    01:BA8.40   BB5:01:BA8.40:777

Doing

My approach is to use some form of regex or split.

df['main_cr'] = df['crate'].str.split(':').str[1:3]

Above is not working, as it provides the result

[00, AB5.30]

I would like to create a new column by extracting the values between the first and last colons of an existing column within my dataframe.

Any suggestion is helpful thank you

CodePudding user response：

use pd.extract, and extract based on the regex pattern

df['main_cr']=df['crate'].str.extract(r':(\d{2}:.*):')
df

regex: it matches the first occurrence of ":" followed by two digits, specified as {2}, followed by colon ":", and then any number of characters before the subsequent occurrence of colon ":". the expression in b/w the parenthesis is extract out.

    site        stat    crate               main_cr
AA - site 1     ok      AD1:00:AB5.30:100   00:AB5.30
AA - site 1     ok      AD1:00:AB5.30:111   00:AB5.30
A1 - site 2     fail    AD1:00:AB5.30:200   00:AB5.30
AA - site 1     ok      AD1:00:AB5.30:555   00:AB5.30
BB - site 8     fail    BB5:01:BA8.40:777   01:BA8.40