Home > Software design >  How to reshape xarray dataset by collapsing coordinate
How to reshape xarray dataset by collapsing coordinate

Time:05-18

I currently have a dataset that when opened with xarray contains three coordinates x, y, band. The band coordinate has temperature and dewpoint each at 4 different time intervals, meaning there are 8 total bands. Is there a way to reshape this so that I could have x, y, band, time such that the band coordinate is now only length 2 and the time coordinate would be length 4?

I thought I could add a new coordinate named time and then add the bands in but

ds = ds.assign_coords(time=[1,2,3,4])

returns ValueError: cannot add coordinates with new dimensions to a DataArray.

CodePudding user response:

You can re-assign the "band" coordinate to a MultiIndex:

In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])

In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
   ...:     [
   ...:         [1, 1, 1, 1, 2, 2, 2, 2],
   ...:         pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
   ...:     ],
   ...:     names=['band_stacked', 'time'],
   ...: )

In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
         8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
        [3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
         4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
        [7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
         7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
        [5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
         6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
       [[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
         2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
        [9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
         6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
        [6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
         5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
        [7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
         5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
  * band          (band) MultiIndex
  - band_stacked  (band) int64 1 1 1 1 2 2 2 2
  - time          (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y

Then you can expand the dimensionality by unstacking:

In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
          5.23808010e-01],
         [8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
          1.54739786e-02]],
...
        [[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
          6.23766131e-02],
         [5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
          2.44385584e-01]]]])
Coordinates:
  * band     (band) int64 1 2
  * time     (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y

Another more manual option would be to reshape in numpy and just create a new DataArray. Note that this manual reshape is much faster for a larger array:

In [8]: reshaped = xr.DataArray(
   ...:     da.data.reshape((4, 4, 2, 4)),
   ...:     dims=['x', 'y', 'band', 'time'],
   ...:     coords={
   ...:         'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
   ...:         'band': [1, 2],
   ...:     },
   ...: )

Note that if your data is chunked (and assuming you'd like to keep it that way) your options are more limited - see the dask docs on reshaping dask arrays.

  • Related