Merging GeoDataFrames - TypeError: float() argument must be a string or a number, not 'Point&#0-CodePudding

I have a dataframe whose one of the columns has a Series of shapely Points and another one in which I have a Series of Polygons.

df.head()

                    

     hash number                               street unit  \
2024459  283e04eca5c4932a     SN  AVENIDA DOUTOR SEVERIANO DE ALMEIDA  NaN   
2024460  1a92a1c3cba7941a    485  AVENIDA DOUTOR SEVERIANO DE ALMEIDA  NaN   
2024461  837341c45de519a3    475  AVENIDA DOUTOR SEVERIANO DE ALMEIDA  NaN   

            city  district region   postcode  id                     geometry  
2024459  Jaguari       NaN     RS  97760-000 NaN  POINT (-54.69445 -29.49421)  
2024460  Jaguari       NaN     RS  97760-000 NaN  POINT (-54.69445 -29.49421)  
2024461  Jaguari       NaN     RS  97760-000 NaN  POINT (-54.69445 -29.49421)

poly_df.head()
                                          centroids                                           geometry
0   POINT (-29.31067315122428 -54.64176359828149)  POLYGON ((-54.64069 -29.31161, -54.64069 -29.3...
1   POINT (-29.31067315122428 -54.63961783106958)  POLYGON ((-54.63854 -29.31161, -54.63854 -29.3...
2  POINT (-29.31067315122428 -54.637472063857665)  POLYGON ((-54.63640 -29.31161, -54.63640 -29.3...

I'm checking if the Point belongs to the Polygon and inserting the Point object into the cell of the second dataframe. However, I'm getting the following error:

Traceback (most recent call last):
   
  File "/tmp/ipykernel_4771/1967309101.py", line 1, in <module>
    df.loc[idx, 'centroids'] = poly_mun.loc[ix, 'centroids']

  File ".local/lib/python3.8/site-packages/pandas/core/indexing.py", line 692, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)

  File ".local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1599, in _setitem_with_indexer
    self.obj[key] = infer_fill_value(value)

  File ".local/lib/python3.8/site-packages/pandas/core/dtypes/missing.py", line 516, in infer_fill_value
    val = np.array(val, copy=False)

TypeError: float() argument must be a string or a number, not 'Point'

I'm using the following command line:

df.loc[idx, 'centroids'] = poly_df.loc[ix, 'centroids']

I have already tried at as well.

Thanks

CodePudding user response：

You can't create a new column in pandas with a shapely geometry using loc:

In [1]: import pandas as pd, shapely.geometry

In [2]: df = pd.DataFrame({'mycol': [1, 2, 3]})

In [3]: df.loc[0, "centroid"] = shapely.geometry.Point([0, 0])
/Users/mikedelgado/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/indexing.py:1642: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  self.obj[key] = infer_fill_value(value)
/Users/mikedelgado/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/dtypes/missing.py:550: FutureWarning: The input object of type 'Point' is an array-like implementing one of the corresponding protocols (`__array__`, `__array_interface__` or `__array_struct__`); but not a sequence (or 0-D). In the future, this object will be coerced as if it was first converted using `np.array(obj)`. To retain the old behaviour, you have to either modify the type 'Point', or assign to an empty array created with `np.empty(correct_shape, dtype=object)`.
  val = np.array(val, copy=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 df.loc[0, "centroid"] = shapely.geometry.Point([0, 0])

File ~/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/indexing.py:716, in _LocationIndexer.__setitem__(self, key, value)
    713 self._has_valid_setitem_indexer(key)
    715 iloc = self if self.name == "iloc" else self.obj.iloc
--> 716 iloc._setitem_with_indexer(indexer, value, self.name)

File ~/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/indexing.py:1642, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
   1639     self.obj[key] = empty_value
   1641 else:
-> 1642     self.obj[key] = infer_fill_value(value)
   1644 new_indexer = convert_from_missing_indexer_tuple(
   1645     indexer, self.obj.axes
   1646 )
   1647 self._setitem_with_indexer(new_indexer, value, name)

File ~/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/dtypes/missing.py:550, in infer_fill_value(val)
    548 if not is_list_like(val):
    549     val = [val]
--> 550 val = np.array(val, copy=False)
    551 if needs_i8_conversion(val.dtype):
    552     return np.array("NaT", dtype=val.dtype)

TypeError: float() argument must be a string or a real number, not 'Point'

Essentially, pandas doesn't know how to interpret a point object, and so creates a float column with NaNs, and then can't handle the point. This might get fixed in the future, but you're best off explicitly defining the column as object dtype:

In [27]: df['centroid'] = None

In [28]: df['centroid'] = df['centroid'].astype(object)

In [29]: df
Out[29]:
   mycol centroid
0      1     None
1      2     None
2      3     None

In [30]: df.loc[0, "centroid"] = shapely.geometry.Point([0, 0])
/Users/mikedelgado/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/internals/managers.py:304: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  applied = getattr(b, f)(**kwargs)

In [31]: df
Out[31]:
   mycol     centroid
0      1  POINT (0 0)
1      2         None
2      3         None

That said, joining two GeoDataFrames with polygons and points based on whether the points are in the polygons certainly sounds like a job for geopandas.sjoin:

union = gpd.sjoin(polygon_df, points_df, op='contains')