I have a pandas dataframe where I want the values starting with ID to be replaced with just 'user/ID' prefix and any leading zeros removed. I want to make a third column where I just get the ID values (no user prefix, no leading zeros, no IDm/IDs, just ID) and the E values on the same row to be combined with an underscore and then add in a 'user/' prefix. I have an example for reference. original
item_id_a item_id_b
0 E00000170630 IDm00010461
1 IDm00010461 E00000170630
2 E00000353915 IDs236274573
3 IDs23627457 E00000353915
desired:
item_id_a item_id_b combined
0 E00000170630 user/ID10461 user/E00000170630_ID10461
1 user/ID10461 E00000170630 user/ID10461_E00000170630
2 E00000353915 user/ID236274573 user/E00000353915_ID236274573
3 user/ID23627457 E00000353915 user/ID23627457_E00000353915
CodePudding user response:
This should work:
(df.replace(r'ID[a-z]?0*','ID',regex=True)
.assign(combined = lambda x: 'user/' x['item_id_a'] '_' x['item_id_b'])
.replace(r'^ID','user/ID',regex=True))
Output:
item_id_a item_id_b combined
0 E00000170630 user/ID10461 user/E00000170630_ID10461
1 user/ID10461 E00000170630 user/ID10461_E00000170630
2 E00000353915 user/ID236274573 user/E00000353915_ID236274573
3 user/ID23627457 E00000353915 user/ID23627457_E00000353915
CodePudding user response:
df["combined"] = str("user/") df.item_id_a "_" df.item_id_b
df.loc[1::2, "item_id_a"] = "user/" df.loc[1::2, "item_id_a"]
df.loc[0::2, "item_id_b"] = "user/" df.loc[0::2, "item_id_b"]