Home > Enterprise >  How to find one-to-one relation in between two columns with duplicates?
How to find one-to-one relation in between two columns with duplicates?

Time:12-05

I have a dataset like the following:

import pandas as pd

dict1 = {
    "idx": [1, 2, 9, 1, 1, 6, 1, 3, 2],
    "value": ["a1", "b1", "c1", "t1", "t1", "f1", "r1", "l1", "b1"]
}

df = pd.DataFrame(dict1)

Here, a1, t1, r1 always occur with 1. To verify this, I used the following code,

def one_to_one(df, col1, col2):
    first = df.groupby(col1)[col2].count().max()
    second = df.groupby(col2)[col1].count().max()
    return first   second == 2

one_to_one(df, 'idx', 'value')

But it returned None even though a1, t1, r1 always occur with 1 and others do the same. How to verify that, a1 for example, only occurs with idx = 1?

CodePudding user response:

You can just drop_duplicates based idx value combination. If each value has only one idx, then the result should contain the same number of records as the number of unique values in value column:

len(df[['idx', 'value']].drop_duplicates()) == df['value'].nunique()
# True
  • Related