Home > Net >  How does one check if all rows in a dataframe match another dataframe?
How does one check if all rows in a dataframe match another dataframe?

Time:11-23

Say you have 2 dataframes with the same columns.

But say dataframe A has 10 rows, and dataframe B has 100 rows, but the 10 rows in dataframe A are in dataframe B. The 10 rows may not be in the same row numbers as dataframe B.

How do we determine that those 10 rows in df A are fully contained in df B?

For example.

Say we have this for df A (only using 1 row)

A | B | C
1 | 2 | 3

and df B is:

A | B | C
2 | 5 | 5
3 | 2 | 7
1 | 2 | 3
5 | 1 | 5

How do we check that df A is contained in B? Assume that the rows will always be unique in the sense that there will always be a unique A B combination

CodePudding user response:

Is a Dataframe a subset of another:

You can try solving this using merge and then comparison.

The inner-join of the 2 dataframes would be the same as the smaller dataframe if the second one is a superset for the first.

import pandas as pd

# df1 - smaller dataframe, df2 - larger dataframe

df1 = pd.DataFrame({'A ': [1], ' B ': [2], ' C': [3]})
df2 = pd.DataFrame({'A ': [2, 3, 1, 5], ' B ': [5, 2, 2, 1], ' C': [5, 7, 3, 5]})

df1.merge(df2).shape == df1.shape
True

If you have duplicates, then drop duplicates first -

df1.merge(df2).drop_duplicates().shape == df1.drop_duplicates().shape

More details here.

CodePudding user response:

Convert df2 into a dictionary, and use isin to check:

df1.isin({key:value.array for key, value in df2.items()}).all(1).squeeze()
True

Another option would be to convert both dataframes to MultiIndexes and use isin or intersection - I suspect this may be more expensive computationally than the first option:

A = pd.MultiIndex.from_frame(df1)

B = pd.MultiIndex.from_frame(df2)

A.isin(B).item()
True

# via intersection
A.intersection(B).empty
Out[73]: True

  • Related