If I have a pandas dataframe A that looks like:
index id some_number some_other_number
1 42 1.2 32
2 54 1.3 33
3 66 3.4 64
4 77 4.7 12
and another, dfB, with this:
index id some_number some_other_number
1 42 1.2 32
2 99 1.3 33
3 11 3.4 64
4 77 4.7 12
What is the fastest way to update dfA such that if the id in the id column is present in dfB we get:
index id some_number some_other_number id_is_in_dfB
1 42 1.2 32 True
2 54 1.3 33 False
3 66 3.4 64 False
4 77 4.7 12 True
At the moment i do:
dfA["id_is_in_dfB"]=dfA["id"].isin(dfB["id"])
I was wondering if there are alternative quicker approaches?
CodePudding user response:
It might be faster to do this operation outside of pandas, something like this:
prepared_set = set(dfB("id"))
dfA["id_is_in_dfB"] = [value in prepared_set for value in dfA["id"].values]
Can't confirm that in your exact case it will be faster, because it may depend on tables size.
Further improvement might be using numba
.
CodePudding user response:
Try this approach:
dfB['id_is_in_dfB'] = dfA['id']==dfB['id']