Home > Enterprise >  Comparing 2 dataframes gives : Can only compare identically-labeled DataFrame objects
Comparing 2 dataframes gives : Can only compare identically-labeled DataFrame objects

Time:12-02

I'm having some difficulties using pandas..

I have 2 dataframes (named bru and bru2) both coming from almost the same file. the only diffrence between the 2 files is that I have added an extra row and changed a cell value from "4" to "50000" for testing.

What i'd now like to do is look for changed cells and new rows.

But first of all, I'm checking if both dataframes are the same so that I don't have to look for changes when both files have the exact same data.

When I try to compare them (bru == bru2), I get an error: Can only compare identically-labeled DataFrame objects.

I'm importing the files like this, I also drop some columns that I don't need, reorder both files their columns in the exact same order and rename some for prefrence:

bru = pd.read_csv("file1.csv", dtype={"street_id": "string",  "address_id": "string"})
bru = bru.fillna('')
bru = bru.drop(columns=["EPSG:31370_x", "EPSG:31370_y", "EPSG:4326_lat", "EPSG:4326_lon", "postname_fr", "postname_nl", "streetname_de"])
bru = bru.rename(columns={"postcode": "pkancode"})
bru = bru.reindex(columns=["address_id", "box_number", "house_number", "municipality_id", "municipality_name_de", "municipality_name_fr", "municipality_name_nl", "pkancode", "street_id", "streetname_nl", "streetname_fr", "region_code", "status"])
    

bru2 = pd.read_csv("file2.csv", dtype={"street_id": "string",  "address_id": "string"})
bru2 = bru2.fillna('')
bru2 = bru2.drop(columns=["EPSG:31370_x", "EPSG:31370_y", "EPSG:4326_lat", "EPSG:4326_lon", "postname_fr", "postname_nl", "streetname_de"])
bru2 = bru2.rename(columns={"postcode": "pkancode"})
bru2 = bru2.reindex(columns=["address_id", "box_number", "house_number", "municipality_id", "municipality_name_de", "municipality_name_fr", "municipality_name_nl", "pkancode", "street_id", "streetname_nl", "streetname_fr", "region_code", "status"])

enter image description here

enter image description here

What am I doing wrong?

I've tried other solutions from the stack that for some reason failed for me:

Error: Can only compare identically-labeled DataFrame objects

Pandas "Can only compare identically-labeled DataFrame objects" error

CodePudding user response:

You can use reindex_like to make bru2 have the same indexing as bru then compare the dataframes.

bru2.reindex_like(bru).compare(bru)

And you can use pd.Index.difference to find the rows or columns in bru2 that are in bru.

bru.index.difference(bru2.index) #and like wise with bru.columns and bru2.columns
  • Related