Home > Back-end >  Faster solution for checking if value is already in another dataframe with numpy
Faster solution for checking if value is already in another dataframe with numpy

Time:01-30

this is my code right now

for i, row in gdf_pot.iterrows():
        if row.ID not in historized.ID.to_list():
            gdf_pot.at[i, "FLAG_NEW"] = 1
        else:
            gdf_pot.at[i, "FLAG_NEW"] = 0

it's very slow, because the dataframe is very big.

I saw some solutions with np.where but I could'nt make it work.

Maybe you have some ideas?

Thanks.

CodePudding user response:

One way is to use boolean indexing with pandas.Series.isin and pandas.Series.astype.

The core data structure in GeoPandas is the geopandas.GeoDataFrame, a subclass of pandas.DataFrame

gdf_pot["FLAG_NEW"] = gdf_pot["ID"].isin(historized["ID"]).astype(int)

NB : True behaves as 1 and False as 0 in Python. So, when we call astype(int), the boolean Series returned by isin() is mapped implicitly to those two numbers.

  • Related