I am extremely new to coding, so please correct me on anything I'm misunderstanding.
I need to find a way to combine the different characters in each value into a single string of characters. The data originally came from a netcdf file that I opened in xarray. The dtype was originally |S1, but I was able to convert it to a string using .astype(str) and then transposing, and this was the result.
ncases_OFCL: 2743, nchars_basin: 2
array([['a', 'l'],
['a', 'l'],
['a', 'l'],
...,
['e', 'p'],
['e', 'p'],
['e', 'p']], dtype='<U1')
Coordinates: (0)
Attributes:
long_name :
OFCL basin
units :
Now that I've gotten to this point, is there a way to join the 'a' and 'l' and so on into just 'al'? Thank you for any help you can give!!
CodePudding user response:
Let me know if this works for you.
data = np.array([['a', 'l'],
['a', 'l'],
['a', 'l'],
['e', 'p'],
['e', 'p'],
['e', 'p']])
data2 = []
for n in range(0,len(data)):
data2.append(data[n,0] data[n,1])
CodePudding user response:
The problem is the data is encoded as a fixed-width unicode array type, and you can't resize fixed-width character arrays.
Your data probably looks something like this:
In [16]: da = xr.DataArray(
...: np.array([['a', 'l']] * 10 [['e', 'p']] * 10).astype('U'),
...: dims=['ncases_OFCL', 'nchars_basin'],
...: )
In [17]: da
Out[17]:
<xarray.DataArray (ncases_OFCL: 20, nchars_basin: 2)>
array([['a', 'l'],
['a', 'l'],
['a', 'l'],
['a', 'l'],
['a', 'l'],
['a', 'l'],
['a', 'l'],
['a', 'l'],
['a', 'l'],
['a', 'l'],
['e', 'p'],
['e', 'p'],
['e', 'p'],
['e', 'p'],
['e', 'p'],
['e', 'p'],
['e', 'p'],
['e', 'p'],
['e', 'p'],
['e', 'p']], dtype='<U1')
Dimensions without coordinates: ncases_OFCL, nchars_basin
You can sum the characters along a dimension if you first convert the data to object type:
In [18]: da.astype('O').sum(dim='nchars_basin')
Out[18]:
<xarray.DataArray (ncases_OFCL: 20)>
array(['al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'ep',
'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep'], dtype=object)
Dimensions without coordinates: ncases_OFCL
If you'd like, you could convert back to fixed-width '<U2'
type:
In [19]: da.astype('O').sum(dim='nchars_basin').astype('U')
Out[19]:
<xarray.DataArray (ncases_OFCL: 20)>
array(['al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'al', 'ep',
'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep', 'ep'], dtype='<U2')
Dimensions without coordinates: ncases_OFCL