I'm having some difficulties using pandas..
I have 2 dataframes (named bru
and bru2
) both coming from almost the same file. the only diffrence between the 2 files is that I have added an extra row and changed a cell value from "4" to "50000" for testing.
What i'd now like to do is look for changed cells and new rows.
But first of all, I'm checking if both dataframes are the same so that I don't have to look for changes when both files have the exact same data.
When I try to compare them (bru == bru2), I get an error: Can only compare identically-labeled DataFrame objects
.
I'm importing the files like this, I also drop some columns that I don't need, reorder both files their columns in the exact same order and rename some for prefrence:
bru = pd.read_csv("file1.csv", dtype={"street_id": "string", "address_id": "string"})
bru = bru.fillna('')
bru = bru.drop(columns=["EPSG:31370_x", "EPSG:31370_y", "EPSG:4326_lat", "EPSG:4326_lon", "postname_fr", "postname_nl", "streetname_de"])
bru = bru.rename(columns={"postcode": "pkancode"})
bru = bru.reindex(columns=["address_id", "box_number", "house_number", "municipality_id", "municipality_name_de", "municipality_name_fr", "municipality_name_nl", "pkancode", "street_id", "streetname_nl", "streetname_fr", "region_code", "status"])
bru2 = pd.read_csv("file2.csv", dtype={"street_id": "string", "address_id": "string"})
bru2 = bru2.fillna('')
bru2 = bru2.drop(columns=["EPSG:31370_x", "EPSG:31370_y", "EPSG:4326_lat", "EPSG:4326_lon", "postname_fr", "postname_nl", "streetname_de"])
bru2 = bru2.rename(columns={"postcode": "pkancode"})
bru2 = bru2.reindex(columns=["address_id", "box_number", "house_number", "municipality_id", "municipality_name_de", "municipality_name_fr", "municipality_name_nl", "pkancode", "street_id", "streetname_nl", "streetname_fr", "region_code", "status"])
What am I doing wrong?
I've tried other solutions from the stack that for some reason failed for me:
Error: Can only compare identically-labeled DataFrame objects
Pandas "Can only compare identically-labeled DataFrame objects" error
CodePudding user response:
You can use reindex_like
to make bru2 have the same indexing as bru then compare the dataframes.
bru2.reindex_like(bru).compare(bru)
And you can use pd.Index.difference
to find the rows or columns in bru2 that are in bru.
bru.index.difference(bru2.index) #and like wise with bru.columns and bru2.columns