Home > Software design >  How to merge multiple columns in Pandas with a single series?
How to merge multiple columns in Pandas with a single series?

Time:11-28

I have a dataframe (t) of codes for patients surgeries. On any hospital admission they can have 5 surgeries or combination of surgeries, - the index on the left column is the indiviudal patient. I want to add the text description for all 5 surgeries to indivisual new columns.


        | OPERTN_01 | OPERTN_02 | OPERTN_03 | OPERTN_04 | OPERTN_05 |
| ------ | --------- | --------- | --------- | --------- | --------- |
| 85     | B041      | Y766      | Z943      | NaN       | NaN       |
| 144    | B041      | Y766      | Y539      | NaN       | NaN       |
| 260    | B041      | Y766      | NaN       | NaN       | NaN       |
| 276    | B041      | Y766      | NaN       | NaN       | NaN       |
| 345    | B041      | Y766      | NaN       | NaN       | NaN       |
| ...    | ...       | ...       | ...       | ...       | ...       |
| 557445 | B041      | Y461      | L714      | Z954      | Z942      |
| 557525 | B041      | Y766      | NaN       | NaN       | NaN       |
| 557533 | B041      | Y766      | E158      | Y766      | Y261      |
| 557765 | B041      | Y766      | NaN       | NaN       | NaN       |
| 557832 | B041      | Y766      | U051      | Y973      | Y981      |


I want to merge the code from all 5 columns ( also handing the null values) with the text description from this dataframe (opcs_short)

|      | opcs_4.9str | Description                                       |
| 0    | A011        | A01.1: Hemispherectomy                            |
| 1    | A012        | A01.2: Total lobectomy of brain                   |
| 2    | A013        | A01.3: Partial lobectomy of brain                 |
| 3    | A018        | A01.8: Other specified major excision of tissu... |
| 4    | A019        | A01.9: Unspecified major excision of tissue of... |
| ...  | ...         | ...                                               |
| 9673 | O439        | O43.9: Part of heart NEC                          |
| 9674 | O451        | O45.1: Bifurcation of aorta                       |
| 9675 | O452        | O45.2: Juxtarenal abdominal aorta                 |
| 9676 | O458        | O45.8: Specified other aorta NEC                  |
| 9677 | O459        | O45.9: Other aorta NEC                            |

I tried to do this using this code,

t2 = t.merge(opcs_short, how = 'left',  left_on =['OPERTN_01','OPERTN_02', 'OPERTN_03', 'OPERTN_04', 'OPERTN_05'],  
              right_on =['opcs_4.9str','opcs_4.9str','opcs_4.9str','opcs_4.9str','opcs_4.9str'])

This produces


|      | OPERTN_01 | OPERTN_02 | OPERTN_03 | OPERTN_04 | OPERTN_05 | opcs_4.9str | Description |
| 0    | B041      | Y766      | Z943      | NaN       | NaN       | NaN         | NaN         |
| 1    | B041      | Y766      | Y539      | NaN       | NaN       | NaN         | NaN         |
| 2    | B041      | Y766      | NaN       | NaN       | NaN       | NaN         | NaN         |
| 3    | B041      | Y766      | NaN       | NaN       | NaN       | NaN         | NaN         |
| 4    | B041      | Y766      | NaN       | NaN       | NaN       | NaN         | NaN         |
| ...  | ...       | ...       | ...       | ...       | ...       | ...         | ...         |
| 6410 | B041      | Y461      | L714      | Z954      | Z942      | NaN         | NaN         |
| 6411 | B041      | Y766      | NaN       | NaN       | NaN       | NaN         | NaN         |
| 6412 | B041      | Y766      | E158      | Y766      | Y261      | NaN         | NaN         |
| 6413 | B041      | Y766      | NaN       | NaN       | NaN       | NaN         | NaN         |
| 6414 | B041      | Y766      | U051      | Y973      | Y981      | NaN         | NaN         |

So nothing has merged. I am not sure why but I know I haven't handled the null values. Some patients have only one simple surgery and the rest of the columns are empty so I don't want to drop them. TBH I am not sure using merge is the right approach here but don't have enough knowledge to know if a eg dictionary technique would be a better way. The code description dataframe though has 19,000 records.

CodePudding user response:

I think a dictionary-based replace method is what you're after. Does the following achieve your desired result?

code_map = {i[1]: i[2] for i in opcs_short.to_records()}

for col in t.columns:
    t[col   " Description"] = t[col].replace(code_map)
  • Related