Home > other >  ›How to join tables in Python without overwriting existing column data
›How to join tables in Python without overwriting existing column data

Time:03-11

I need to join multiple tables but I can't get the join in Python to behave as expected. I need to left join table 2 to table 1, without overwriting the existing data in the "geometry" column of table 1. What I'm trying to achieve is sort of like a VLOOKUP in Excel. I want to pull matching values from my other tables (~10) into table 1 without overwriting what is already there. Is there a better way? Below is what I tried:

TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1  | 123       |    ABC   |
| 2  | 456       |    DEF   |
| 3  | 789       |          |

TABLE 2
| ID | GEOID | GEOMETRY |
| -- | ----- | -------- |
| 1  | 123   |          |
| 2  | 456   |          |
| 3  | 789   |  GHI     |

TABLE 3 (What I want)
| ID | BLOCKCODE | GEOID | GEOMETRY |
| -- | --------- |-----  | -------- |
| 1  |  123      | 123   |    ABC   |
| 2  |  456      | 456   |    DEF   |
| 3  |           | 789   |    GHI   |

What I'm getting
| ID | GEOID | GEOMETRY_X | GEOMETRY_Y |      
| -- | ----- | --------   | ---------  |
| 1  | 123   |    ABC     |            |
| 2  | 456   |    DEF     |            |
| 3  | 789   |            | GHI        |

join = pd.merge(table1, table2, how="left", left_on="BLOCKCODE", right_on="GEOID"

When I try this: join = pd.merge(table1, table2, how="left", left_on=["BLOCKCODE", "GEOMETRY"], right_on=["GEOID", "GEOMETRY"]

I get this:

TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1  | 123       |    ABC   |
| 2  | 456       |    DEF   |
| 3  | 789       |          |

CodePudding user response:

You could try:

# rename the Blockcode column in table1 to have the same column ID as table2.
# This is necessary for the next step to work.
table1 = table1.rename(columns={"Blockcode": "GeoID",})

# Overwrites all NaN values in table1 with the value from table2.
table1.update(table2)
  • Related