Home > Back-end >  Creating new column from transformed existing column in Python
Creating new column from transformed existing column in Python

Time:06-11

I have a dataset where I would like to create a new column derived from one of my existing columns. The column is created by extracting the values between the first and last colons.

Data

site        stat    crate   
AA - site 1 ok      AD1:00:AB5.30:100   
AA - site 1 ok      AD1:00:AB5.30:111   
A1 - site 2 fail    AD1:00:AB5.30:200   
AA - site 1 ok      AD1:00:AB5.30:555   
BB - site 8 fail    BB5:01:BA8.40:777   

Desired

site        stat    main_cr     crate
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:100
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:111
A1 - site 2 fail    00:AB5.30   AD1:00:AB5.30:200
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:555
BB - site 8 fail    01:BA8.40   BB5:01:BA8.40:777   

Doing

My approach is to use some form of regex or split.

df['main_cr'] = df['crate'].str.split(':').str[1:3]

Above is not working, as it provides the result

[00, AB5.30]

I would like to create a new column by extracting the values between the first and last colons of an existing column within my dataframe.

Any suggestion is helpful thank you

CodePudding user response:

use pd.extract, and extract based on the regex pattern

df['main_cr']=df['crate'].str.extract(r':(\d{2}:.*):')
df

regex: it matches the first occurrence of ":" followed by two digits, specified as {2}, followed by colon ":", and then any number of characters before the subsequent occurrence of colon ":". the expression in b/w the parenthesis is extract out.

    site        stat    crate               main_cr
AA - site 1     ok      AD1:00:AB5.30:100   00:AB5.30
AA - site 1     ok      AD1:00:AB5.30:111   00:AB5.30
A1 - site 2     fail    AD1:00:AB5.30:200   00:AB5.30
AA - site 1     ok      AD1:00:AB5.30:555   00:AB5.30
BB - site 8     fail    BB5:01:BA8.40:777   01:BA8.40
  • Related