Home > other >  How to add a constant to negative values in array
How to add a constant to negative values in array

Time:12-14

Given the xarray below, I would like to add 10 to all negative values (i.e, -5 becomes 5, -4 becomes 6 ... -1 becomes 9, all values remain unchanged).

a = xr.DataArray(np.arange(25).reshape(5, 5)-5, dims=("x", "y"))

I tried:

  • a[a<0]=10 a[a<0], but it returns 2-dimensional boolean indexing is not supported.
  • Several attempts with a.where, but it seems that the other argument can only replace the mapped values with a constant rather than with indexed values.

I also considered using numpy as suggested here, but my actual dataset is ~ 80 Gb and loaded with dask and using numpy crashes my Jupyter console.

Is there any way to achieve this using only xarray?

Update

I updated the code using @SpaceBugger and this. However my initial example was using a dataarray whereas my true problem is using a dataset:

a = xr.DataArray(np.arange(25).reshape(5, 5)-5, dims=("x", "y"))
a = a.to_dataset(name='variable')

Now, if I do this:

a1 = a['variable']
a2 = 10 a1.copy()
a['variable'] = dask.array.where(a['variable'] < 0, a2, a1)

I get this error:

MissingDimensionsError: cannot set variable 'variable' with 2-dimensional data without explicit dimension names. Pass a tuple of (dims, data) instead.

Can anyone suggest a proper syntax?

CodePudding user response:

My best guess is based on my meagre understanding of these libraries, and especially the xarray.Dataset.update section of the xarray doc. This says the signature of xarray.Dataset.update parameters should be mapping {var name: (tuple of dimension names, array-like)}.

This means that datasets expect you to give them the name of the coordinates the data is attached to. And this makes some sense, as you should be able to see the name or the coordinates used in your dataset when printing the object (print(a)). Printing the dataset should give you the name of the coordinates generated with the call to a.to_dataset. Let us say they are named coord_x and coord_y. You should be able to set your data variable with

a['variable'] = (('coord_x', 'coord_y'), dask.array.where(a['variable'] < 0, a2, a1))

Which should be equivalent to

a.update('variable': (('coord_x', 'coord_y'), dask.array.where(a['variable'] < 0, a2, a1))

Or maybe the following that makes it easier to read but it doesn't use dask so it might not be as efficient.

a.assign(variable=lambda x: x.variable 10 if x.variable<0 else x.variable)

So, to summarize you should be able to do the following :

>>> a = xr.DataArray(np.arange(25).reshape(5, 5)-5, dims=("x", "y")) # 2D data with dims X and Y
>>> a = a.to_dataset(name='variable') # Should create a dataset with X and Y coordinates

>>> print(a) # Make sure X and Y are the correct names generated by the previous call
<xarray.Dataset>
Dimensions:        (X: 2, Y: 2)
Coordinates:
  * X            (X) int64 10 20 ...
  * Y            (Y) int64 150 160 ...
Data variables:
  * variable     (X,Y) int64 1 2 3 4 5 ...

>>> a1 = a['variable']
>>> a2 = 10 a1.copy()
>>> a['variable'] = (('X', 'Y'), dask.array.where(a['variable'] < 0, a2, a1))

CodePudding user response:

xarray’s where method is the way to go here - you can provide any other argument which can be broadcast against the condition argument and the original array:

a['variable'] = a['variable'].where(
    a['variable'] >= 0,
    (a['variable']   10),
)

This will work fine with dask and will handle your coordinates seamlessly.

  • Related