Count (or sum) the number of the gridpoints from a high resultion 2-D data, that are closest to the-CodePudding

I have two datasets, the first one is a high spatial resolution, and its values are 0 and 1, and the second dataset has coarse spatial resolution data (its values are not important in my case).

I would like to count the number of gridpoints from the high-resolution data which are closest to the gridpoints of the coarse-resolution data, where the values of the high-resolution data are 1.

In other words, count the number of high-resolution gridpoints with the value of 1, that fall within the pixels of the coarse-resolution data.

Example of the data for coarse spatial resolution data

lon = [ 176.25,  176.75, 177.25,  177.75,  178.25,  178.75,  179.25,  179.75]
lat = [-87.25, -87.75, -88.25, -88.75, -89.25, -89.75]
temperature = np.random.rand(6, 8)
coarse_res = xr.DataArray(temperature, coords={'lat': lat,'lon': lon}, dims=["lat", "lon"])

Example of the data for high spatial resolution data

lon = [176.125,176.375,176.625,176.875,177.125,177.375,177.625,177.875,178.125,178.375,178.625,178.875,179.125,179.375,179.625,179.875]
lat = [-87.125, -87.375, -87.625, -87.875, -88.125, -88.375, -88.625, -88.875, -89.125, -89.375, -89.625, -89.875]
ds_2 = np.random.randint(0, 2, size=(12, 16))
high_res = xr.DataArray(ds_2, coords={'lat': lat,'lon': lon}, dims=["lat", "lon"])

In the end, I would like to calculate the fraction of the high_res gridpoints/pixels with the value of 1 surrounding the coarse-resolution gridpoint. For example, if the first gridpoint of the coarse_res data is surrounded by 4 high-res gridpoints and these values are 0, 1, 1, 1 the fraction should be 0.75.

CodePudding user response：

You can do this with xr.Dataset.groupby_bins:

low_lon_edges = np.arange(176., 178.001, 0.5)
low_lat_edges = np.arange(-90, -86.9, 0.5)

low_lon_centers = (low_lon_edges[:-1]   low_lon_edges[1:]) / 2
low_lat_centers = (low_lat_edges[:-1]   low_lat_edges[1:]) / 2

aggregated = (
    high_res
    .groupby_bins('lon', bins=low_lon_edges, labels=low_lon_centers)
    .sum(dim="lon")
    .groupby_bins('lat', bins=low_lat_edges, labels=low_lat_centers)
    .sum(dim="lat")
)

Additionally, if the cells nest perfectly (it looks like you're dealing with 1/4 and 1/2 degree data which are both centered on the half cell, so this should work fine) you can just use xr.Dataset.coarsen:

aggregated = ds.coarsen(lat=2, lon=2, boundary="exact").sum()