I have a dataframe (t) of codes for patients surgeries. On any hospital admission they can have 5 surgeries or combination of surgeries, - the index on the left column is the indiviudal patient. I want to add the text description for all 5 surgeries to indivisual new columns.
| OPERTN_01 | OPERTN_02 | OPERTN_03 | OPERTN_04 | OPERTN_05 |
| ------ | --------- | --------- | --------- | --------- | --------- |
| 85 | B041 | Y766 | Z943 | NaN | NaN |
| 144 | B041 | Y766 | Y539 | NaN | NaN |
| 260 | B041 | Y766 | NaN | NaN | NaN |
| 276 | B041 | Y766 | NaN | NaN | NaN |
| 345 | B041 | Y766 | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... |
| 557445 | B041 | Y461 | L714 | Z954 | Z942 |
| 557525 | B041 | Y766 | NaN | NaN | NaN |
| 557533 | B041 | Y766 | E158 | Y766 | Y261 |
| 557765 | B041 | Y766 | NaN | NaN | NaN |
| 557832 | B041 | Y766 | U051 | Y973 | Y981 |
I want to merge the code from all 5 columns ( also handing the null values) with the text description from this dataframe (opcs_short)
| | opcs_4.9str | Description |
| 0 | A011 | A01.1: Hemispherectomy |
| 1 | A012 | A01.2: Total lobectomy of brain |
| 2 | A013 | A01.3: Partial lobectomy of brain |
| 3 | A018 | A01.8: Other specified major excision of tissu... |
| 4 | A019 | A01.9: Unspecified major excision of tissue of... |
| ... | ... | ... |
| 9673 | O439 | O43.9: Part of heart NEC |
| 9674 | O451 | O45.1: Bifurcation of aorta |
| 9675 | O452 | O45.2: Juxtarenal abdominal aorta |
| 9676 | O458 | O45.8: Specified other aorta NEC |
| 9677 | O459 | O45.9: Other aorta NEC |
I tried to do this using this code,
t2 = t.merge(opcs_short, how = 'left', left_on =['OPERTN_01','OPERTN_02', 'OPERTN_03', 'OPERTN_04', 'OPERTN_05'],
right_on =['opcs_4.9str','opcs_4.9str','opcs_4.9str','opcs_4.9str','opcs_4.9str'])
This produces
| | OPERTN_01 | OPERTN_02 | OPERTN_03 | OPERTN_04 | OPERTN_05 | opcs_4.9str | Description |
| 0 | B041 | Y766 | Z943 | NaN | NaN | NaN | NaN |
| 1 | B041 | Y766 | Y539 | NaN | NaN | NaN | NaN |
| 2 | B041 | Y766 | NaN | NaN | NaN | NaN | NaN |
| 3 | B041 | Y766 | NaN | NaN | NaN | NaN | NaN |
| 4 | B041 | Y766 | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 6410 | B041 | Y461 | L714 | Z954 | Z942 | NaN | NaN |
| 6411 | B041 | Y766 | NaN | NaN | NaN | NaN | NaN |
| 6412 | B041 | Y766 | E158 | Y766 | Y261 | NaN | NaN |
| 6413 | B041 | Y766 | NaN | NaN | NaN | NaN | NaN |
| 6414 | B041 | Y766 | U051 | Y973 | Y981 | NaN | NaN |
So nothing has merged. I am not sure why but I know I haven't handled the null values. Some patients have only one simple surgery and the rest of the columns are empty so I don't want to drop them. TBH I am not sure using merge is the right approach here but don't have enough knowledge to know if a eg dictionary technique would be a better way. The code description dataframe though has 19,000 records.
CodePudding user response:
I think a dictionary-based replace method is what you're after. Does the following achieve your desired result?
code_map = {i[1]: i[2] for i in opcs_short.to_records()}
for col in t.columns:
t[col " Description"] = t[col].replace(code_map)