I have a gridded xarray dataset that looks like this:
print(ds)
<xarray.Dataset>
Dimensions: (month: 12, isobaricInhPa: 37, latitude: 721, longitude: 1440)
Coordinates:
* isobaricInhPa (isobaricInhPa) float64 1e 03 975.0 950.0 ... 3.0 2.0 1.0
* latitude (latitude) float64 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
* longitude (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
h (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 405, 720), meta=np.ndarray>
speed (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 405, 720), meta=np.ndarray>
direction (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 405, 720), meta=np.ndarray>
From this dataset, I extracted data for a series of points using:
ds_volc = ds.sel(latitude=volc['Latitude'].values,longitude=volc['Longitude'].values, method='nearest', drop=True)
where volc
is a Pandas DataFrame and volc['Latitude'].values
and volc['Longitude'].values
are vectors of lat/lon pairs for my 1431 points of interests.
print(ds_volc)
<xarray.Dataset>
Dimensions: (month: 12, isobaricInhPa: 37, latitude: 1431, longitude: 1431)
Coordinates:
* isobaricInhPa (isobaricInhPa) float64 1e 03 975.0 950.0 ... 3.0 2.0 1.0
* latitude (latitude) float64 50.25 45.75 42.25 ... -56.0 -64.25 -62.0
* longitude (longitude) float64 6.75 3.0 2.5 0.0 ... 0.0 0.0 0.0 0.0
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
h (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
speed (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
direction (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
Rather than using pairs of lat/lon coordinates, I need to add the ID of my points (called Volcano Number
) as a new coordinate:
ds_volc = ds_volc.assign_coords({'Volcano Number':volc['Volcano Number'].values})
which results in:
<xarray.Dataset>
Dimensions: (month: 12, isobaricInhPa: 37, latitude: 1431, longitude: 1431, Volcano Number: 1431)
Coordinates:
* isobaricInhPa (isobaricInhPa) float64 1e 03 975.0 950.0 ... 3.0 2.0 1.0
* latitude (latitude) float64 50.25 45.75 42.25 ... -56.0 -64.25 -62.0
* longitude (longitude) float64 6.75 3.0 2.5 0.0 ... 0.0 0.0 0.0 0.0
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
* Volcano Number (Volcano Number) int64 210010 210020 ... 390829 390847
Data variables:
h (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
speed (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
direction (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
The question
Since I added the Volcano Number
coordinate, the latitude
and longitude
coordinates (and dimensions) become obsolete and I need to reorganise the dimensions of the variables. I thought I could simply use ds_volc.drop_dims(['latitude', 'longitude'])
, but that drops the associated variables. I have also tried ds_volc.sel({'Volcano Number': 210010}, drop=True)
, but that returns a xarray.Dataset with 1431 longitude and 1431 latitude points.
Therefore, is there a way to somehow reshape the variables so their dimensions become (month: 12, isobaricInhPa: 37, Volcano Number: 1431)
instead of (month, isobaricInhPa, latitude, longitude)
, so as to return a xarray.Dataset of dimension (month: 12, isobaricInhPa: 37)
when I query with ds_volc.sel({'Volcano Number': 210010}, drop=True)
?
CodePudding user response:
When you select a subset of a Dataset using numpy arrays or lists of lat/lons, you subset the data in multiple dimensions independently. In other words, you're selecting all combinations of each of the lats with all of the lons - the result is still a world map, but with vertical and horizontal bands of data for each of the volcano coordinates. This is why you have 1,431 entries along both your latitude and longitude dimensions.
Instead, use a DataArray indexer to make use of xarray's advanced indexing:
# set the column you would like as the DataFrame's index
volc = volc.set_index('Volcano Number')
# extract the data at each point, forming a new 1-D dimension
# "Volcano Number" in place of "Latitude" and "Longitude"
ds_volc = ds.sel(
latitude=volc['Latitude'].to_xarray(),
longitude=volc['Longitude'].to_xarray(),
method='nearest',
)
This will reshape the Dataset to conform to the shape of the volc
DataFrame's index. The result will be indexed by the 1,431 volcanoes, rather than 1,431 lats and 1,431 lons.