Home > Software design >  Count points in grid cells each time
Count points in grid cells each time

Time:10-15

I have csv file like this

        latitude  longitude    acq_date  confidence
0        -8.1135   112.9281  2001-01-01          99
1        -6.4586   143.2235  2001-01-03          86
2         6.6564   125.0055  2001-01-03          85
3         6.6545   124.9990  2001-01-03          84
4         9.7481   107.9814  2001-01-03          96
...          ...        ...         ...         ...
456844   -4.0529   143.4047  2020-12-28          89
456845   -8.1128   112.9365  2020-12-30         100
456846   -2.5768   121.3746  2020-12-31         100
456847   -2.5754   121.3848  2020-12-31          84
456848   -1.4573   127.4369  2020-12-31          90

each confidence is one data

I'm trying to create a grid, and then count the number of points that fall within each grid cell each time using Python.

I tried using geopandas to count the values each grid cell but i got problem how to count based on time, here the code that I used:

# convert df into a geopandas geodataframe
gdf = geopandas.GeoDataFrame(df,
                             geometry=geopandas.points_from_xy(
                                 df.longitude, df.latitude),
                             crs='epsg:4326')
# gdf.head()
# drop lon lat
gdf = gdf.drop(columns=['longitude', 'latitude'])

# total area for the grid
xmin=93
ymin=-11
xmax=141
ymax=8
# how many cells across and down
n_cells = 104
cell_size = (xmax-xmin)/n_cells
# projection of the grid
crs = 'epsg:4326'

# create the cells in a loop
grid_cells = []
for x0 in np.arange(xmin, xmax cell_size, cell_size):
    for y0 in np.arange(ymin, ymax cell_size, cell_size):
        # bounds
        x1 = x0-cell_size
        y1 = y0 cell_size
        grid_cells.append(box(x0, y0, x1, y1))
cell = geopandas.GeoDataFrame(grid_cells, columns=['geometry'], crs=crs)

merged = geopandas.sjoin(gdf, cell, how='left', predicate='within')
# make a simple count variable that we can sum
merged['n_fires'] = 1
# Compute stats per grid cell -- aggregate fires to grid cells with dissolve
dissolve = merged.dissolve(by="index_right", aggfunc="count")
# put this into cell
cell.loc[dissolve.index, 'n_fires'] = dissolve.n_fires.values

any suggestions how to calculate each data in each grid area each time?

CodePudding user response:

Binning lat/lon into a regular cartesian grid is a very straightforward op you don't need geopandas for. Just bin the data yourself and then group on the bins:

x_cell_edges = np.arange(xmin, xmax cell_size, cell_size)
y_cell_edges = np.arange(ymin, ymax cell_size, cell_size)

# label grid cells based on centroid (or whatever you want)
x_cell_labels = (x_cell_edges[:-1]   x_cell_edges[1:]) / 2
y_cell_labels = (y_cell_edges[:-1]   y_cell_edges[1:]) / 2

# use pd.cut to bin all the actual lat/lons into cells
df['x'] = pd.cut(df['longitude'], bins=x_cell_edges, labels=x_cell_labels)
df['y'] = pd.cut(df['latitude'], bins=y_cell_edges, labels=y_cell_labels)

# group on the bins and compute your summary stat
summary = df.groupby(["y", "x", "acq_date"])[["confidence"]].count()

This would be different if you had a super irregular grid, though you should in almost(?) every situation still be able to determine mathematically whether a point falls into a grid cell (perhaps after a projection or some other transformation). Constructing polygons for each cell and then using shape.contains is a ton of extra computing work.

  • Related