Fastest way to add column to dataframe containing boolen-CodePudding

If I have a pandas dataframe A that looks like:

index id some_number some_other_number
1    42       1.2        32
2    54       1.3        33
3    66       3.4        64
4    77       4.7        12

and another, dfB, with this:

index id some_number some_other_number
1     42     1.2       32
2     99     1.3       33
3     11     3.4       64
4     77     4.7       12

What is the fastest way to update dfA such that if the id in the id column is present in dfB we get:

index id some_number some_other_number  id_is_in_dfB
1    42       1.2        32               True
2    54       1.3        33               False
3    66       3.4        64               False
4    77       4.7        12               True

At the moment i do:

dfA["id_is_in_dfB"]=dfA["id"].isin(dfB["id"])

I was wondering if there are alternative quicker approaches?

CodePudding user response：

It might be faster to do this operation outside of pandas, something like this:

prepared_set = set(dfB("id"))
dfA["id_is_in_dfB"]  = [value in prepared_set for value in dfA["id"].values]

Can't confirm that in your exact case it will be faster, because it may depend on tables size.

Further improvement might be using numba.

CodePudding user response：

Try this approach:

dfB['id_is_in_dfB'] = dfA['id']==dfB['id']