Home > Software design >  Creating new data frame columns derived from another
Creating new data frame columns derived from another

Time:11-19

I have a dataframe (atc_df) with a column called 'atc' made of fixed length strings with a fixed structure, and can be split in 5 levels of subcoding. Here is an example:

    prin    atc
0   Acarbosio   A10BF01
1   Aceclofenac M01AB16
2   Aciclovir   J05AB01
3   Acido acetilsalicilico  B01AC06
4   Acido alendronico   M05BA04
... ... ...
324 Voriconazolo    J02AC03
325 Zofenopril  C09AA15
326 Zofenopril   idroclorotiazide   C09BA15
327 Zolmitriptan    N02CC03
328 Zonisamide  N03AX15

I have a function that given an atc code would return a list of the 5 subcodes:

def atc_split(atc_str):
    atc1 = atc_str[0]
    atc2 = atc_str[1:3]
    atc3 = atc_str[3]
    atc4 = atc_str[4]
    atc5 = atc_str[5:7]
    return(atc1,atc2,atc3,atc4,atc5)

Two questions:

Is there a more efficient/elegant way of splitting the atc code into its five subcodes?

How can I best apply this function to the atc_df dataframe to add five new columns (atc1..atc5) to each row?

Thank you

CodePudding user response:

Use str.extract:

df_atc[[f"atc{i 1}" for i in range(5)]] = df_atc["atc"].str.extract("(\w)(\d{2})(\w)(\w)(\d{2})")

>>> df_atc
                     prin      atc atc1 atc2 atc3 atc4 atc5
0               Acarbosio  A10BF01    A   10    B    F   01
1             Aceclofenac  M01AB16    M   01    A    B   16
2               Aciclovir  J05AB01    J   05    A    B   01
3  Acido acetilsalicilico  B01AC06    B   01    A    C   06
4       Acido alendronico  M05BA04    M   05    B    A   04

If you want to have every code including the previous codes, you can slice using .str:

df_atc["atc1"] = df_atc["atc"].str[0]
df_atc["atc2"] = df_atc["atc"].str[:3]
df_atc["atc3"] = df_atc["atc"].str[:4]
df_atc["atc4"] = df_atc["atc"].str[:5]
df_atc["atc5"] = df_atc["atc"]

>>> df_atc
                     prin      atc atc1 atc2  atc3   atc4     atc5
0               Acarbosio  A10BF01    A  A10  A10B  A10BF  A10BF01
1             Aceclofenac  M01AB16    M  M01  M01A  M01AB  M01AB16
2               Aciclovir  J05AB01    J  J05  J05A  J05AB  J05AB01
3  Acido acetilsalicilico  B01AC06    B  B01  B01A  B01AC  B01AC06
4       Acido alendronico  M05BA04    M  M05  M05B  M05BA  M05BA04
  • Related