percentage of values matching between two columns of a csv in python-CodePudding

I have a dataframe like this

df = pd.DataFrame({"true_key" :["Astral","Blob","Blob","Cat","Astral"], "true_key2": ["Japan","Astral","Blob","quics","Cat"]})

How do I calculate the percentage of values present in true_key that are present in true_key2 and vice versa?

So, as we can see 100% of true_key values are present in true_key2. And 60% of true_key2 are present in true_key

Is there any other method to do it in Python?

Thanks in advance.

CodePudding user response：

one way would be to use set intersection and divide len accordingly:

mutual_len = len(set(df['true_key']).intersection(set(df['true_key2'])))
mutual_len / df['true_key'].nunique(), mutual_len / df['true_key2'].nunique()

(1.0, 0.6)

CodePudding user response：

Use set:

v = len(set(df['true_key']).intersection(df['true_key2']))
l = len(df)
true_key, true_key2 = v/l, 1-v/l

>>> true_key
0.6

>>> true_key2
0.4