I have a dataframe like this
df = pd.DataFrame({"true_key" :["Astral","Blob","Blob","Cat","Astral"], "true_key2": ["Japan","Astral","Blob","quics","Cat"]})
How do I calculate the percentage of values present in true_key that are present in true_key2 and vice versa?
So, as we can see 100% of true_key values are present in true_key2. And 60% of true_key2 are present in true_key
Is there any other method to do it in Python?
Thanks in advance.
CodePudding user response:
one way would be to use set intersection and divide len accordingly:
mutual_len = len(set(df['true_key']).intersection(set(df['true_key2'])))
mutual_len / df['true_key'].nunique(), mutual_len / df['true_key2'].nunique()
(1.0, 0.6)
CodePudding user response:
Use set
:
v = len(set(df['true_key']).intersection(df['true_key2']))
l = len(df)
true_key, true_key2 = v/l, 1-v/l
>>> true_key
0.6
>>> true_key2
0.4