Home > Software engineering >  How to combine characters of value in xarray dataset
How to combine characters of value in xarray dataset

Time:07-22

I am extremely new to coding, so please correct me on anything I'm misunderstanding.

I need to find a way to combine the different characters in each value into a single string of characters. The data originally came from a netcdf file that I opened in xarray. The dtype was originally |S1, but I was able to convert it to a string using .astype(str) and then transposing, and this was the result.

ncases_OFCL: 2743, nchars_basin: 2

array([['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ...,
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p']], dtype='<U1')

Coordinates: (0)
Attributes:

long_name :
    OFCL basin
units :

Now that I've gotten to this point, is there a way to join the 'a' and 'l' and so on into just 'al'? Thank you for any help you can give!!

CodePudding user response:

Let me know if this works for you.

data  = np.array([['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p']])

data2 = []
for n in range(0,len(data)):
    data2.append(data[n,0]   data[n,1])

CodePudding user response:

The problem is the data is encoded as a fixed-width unicode array type, and you can't resize fixed-width character arrays.

Your data probably looks something like this:

In [16]: da = xr.DataArray(
    ...:     np.array([['a', 'l']] * 10   [['e', 'p']] * 10).astype('U'),
    ...:     dims=['ncases_OFCL', 'nchars_basin'],
    ...: )

In [17]: da
Out[17]:
<xarray.DataArray (ncases_OFCL: 20, nchars_basin: 2)>
array([['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['a', 'l'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p'],
       ['e', 'p']], dtype='<U1')
Dimensions without coordinates: ncases_OFCL, nchars_basin

You can sum the characters along a dimension if you first convert the data to object type:

In [18]: da.astype('O').sum(dim='nchars_basin')
Out[18]:
<xarray.DataArray (ncases_OFCL: 20)>
array(['al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'ep',
       'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep'], dtype=object)
Dimensions without coordinates: ncases_OFCL

If you'd like, you could convert back to fixed-width '<U2' type:

In [19]: da.astype('O').sum(dim='nchars_basin').astype('U')
Out[19]:
<xarray.DataArray (ncases_OFCL: 20)>
array(['al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'ep',
       'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep'], dtype='<U2')
Dimensions without coordinates: ncases_OFCL
  • Related