Home > database >  Creating a combined pandas column for mixed value types
Creating a combined pandas column for mixed value types

Time:08-03

I have a pandas dataframe where I want the values starting with ID to be replaced with just 'user/ID' prefix and any leading zeros removed. I want to make a third column where I just get the ID values (no user prefix, no leading zeros, no IDm/IDs, just ID) and the E values on the same row to be combined with an underscore and then add in a 'user/' prefix. I have an example for reference. original

item_id_a                   item_id_b   
0   E00000170630            IDm00010461 
1   IDm00010461             E00000170630    
2   E00000353915            IDs236274573    
3   IDs23627457             E00000353915    

desired:

item_id_a                   item_id_b                  combined
0   E00000170630            user/ID10461             user/E00000170630_ID10461
1   user/ID10461            E00000170630              user/ID10461_E00000170630
2   E00000353915            user/ID236274573          user/E00000353915_ID236274573            
3   user/ID23627457         E00000353915              user/ID23627457_E00000353915

CodePudding user response:

This should work:

(df.replace(r'ID[a-z]?0*','ID',regex=True)
.assign(combined = lambda x: 'user/'   x['item_id_a']   '_'   x['item_id_b'])
.replace(r'^ID','user/ID',regex=True))

Output:

         item_id_a         item_id_b                       combined
0     E00000170630      user/ID10461      user/E00000170630_ID10461
1     user/ID10461      E00000170630      user/ID10461_E00000170630
2     E00000353915  user/ID236274573  user/E00000353915_ID236274573
3  user/ID23627457      E00000353915   user/ID23627457_E00000353915

CodePudding user response:

df["combined"] = str("user/")   df.item_id_a   "_"   df.item_id_b
df.loc[1::2, "item_id_a"] = "user/"   df.loc[1::2, "item_id_a"]
df.loc[0::2, "item_id_b"] = "user/"   df.loc[0::2, "item_id_b"]
  • Related