Home > Enterprise >  Compare two dataframes with varying index lengths and multiple occurances
Compare two dataframes with varying index lengths and multiple occurances

Time:07-02

df = pd.DataFrame({"_id": [1, 2, 3, 4], "names_e": ["emil", "emma", "enton", "emma"]})
df2 = pd.DataFrame({"id": [1, 3, 4], "name": ["emma", "emma", "emma"]})
#df2 = df2.set_index("id", drop="False")
#df = df.set_index("_id", drop="False")
df[(df['_id']==df2["id"]) & (df['names_e'] == df2["name"])] #-> Can only compare identically-labeled Series objects
#df[[x for x in (df2["name"] == df["names_e"].values)]] #->'Lengths must match to compare'
#df[[x for x in (df2["name"] == df["names_e"])]] # ->Can only compare identically-labeled Series objects

I'm trying to make an intersection of two dataframes based on the column name and the unique identifier id. The expected result would only include id:4 and name:'emma' but I keep running into the same errors

CodePudding user response:

Let us do inner merge:

df2.merge(df.rename(columns={'_id': 'id', 'names_e': 'name'}))

   id  name
0   4  emma
  • Related