So I want to merge 2 datasets, 1 is a single band raster dataset that came from rioxarray.open_rasterio(), the other a lookup table, with an index dim 'mukey'. The coords along 'mukey' correspond to 'mukey' index values in the lookup table. The desired result is a dataset with identical x and y coords to the Raster dataset, with variables 'n' and 'K' whose values are populated by merging on the 'mukey'. If you are familiar with ArcGIS, this is the analogous operation.
xr.merge() and assign() don't quite seem to perform this operation, and cheating by converting into pandas or numpy hits memory issues on my 32GB machine. Does xarray provide any support for this simple use case? Thanks,
data = (np.abs(np.random.randn(12000000))).astype(np.int32).reshape(1,4000,3000)
raster = xr.DataArray(data,dims=['band','y','x'],coords=[[1],np.arange(4000),np.arange(3000)])
raster = raster.to_dataset(name='mukey')
raster
lookup = pd.DataFrame({'mukey':list(range(10)),'n':np.random.randn(10),'K':np.random.randn(10)*2}).set_index('mukey').to_xarray()
lookup
CodePudding user response:
You're looking for the advanced indexing with DataArrays feature of xarray.
You can provide a DataArray
as a keyword argument to DataArray.sel
or Dataset.sel
- this will reshape the indexed array along the dimensions of the indexing array, based on the values of the indexing array. I think this is exactly what you're looking for in a "lookup table".
In your case:
lookup.sel(mukey=raster.mukey)