i am trying to remove all the character beside the last 4 from all the values in a numpy array. I'd normally use [-4:] but if i use that on the arra i only obtain the last 4 values in the array.
andatum = andatum[-4:] print(andatum)
'15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999']
runfile('O:/GIS/GEP/Risikomanagement/Flussvermessung/ALD/Analyses/ReadFilesToRawData.py', wdir='O:/GIS/GEP/Risikomanagement/Flussvermessung/ALD/Analyses') ['15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999']
What i am trying to do is to obtain the same array but only with the last 4 digits (the year). Any idea how i could do that?
Thank you,
Davide
I would like to remove all the characters beside the last 4 (the year) but using [-4:] i get the last 4 entries of my numpy array.
CodePudding user response:
Looks like you have a 1d array of strings:
In [28]: arr = np.array(['15.11.1999']*6)
In [29]: arr
Out[29]:
array(['15.11.1999', '15.11.1999', '15.11.1999', '15.11.1999',
'15.11.1999', '15.11.1999'], dtype='<U10')
numpy
is better for numbers than strings. This array is little better than a list of strings. But for convenience, numpy
has a set of functions that apply string methods to the elements of an array.
In [30]: np.char.split(arr, sep='.')
Out[30]:
array([list(['15', '11', '1999']), list(['15', '11', '1999']),
list(['15', '11', '1999']), list(['15', '11', '1999']),
list(['15', '11', '1999']), list(['15', '11', '1999'])],
dtype=object)
We can convert this to a 2d array of strings with stack
(or vstack
):
In [31]: np.stack(_)
Out[31]:
array([['15', '11', '1999'],
['15', '11', '1999'],
['15', '11', '1999'],
['15', '11', '1999'],
['15', '11', '1999'],
['15', '11', '1999']], dtype='<U4')
And select a column:
In [32]: np.stack(_)[:,2]
Out[32]: array(['1999', '1999', '1999', '1999', '1999', '1999'], dtype='<U4')
np.char
does not have a function to index the strings. For that we have to stick with a list comprehension
In [33]: [i[-4:] for i in arr]
Out[33]: ['1999', '1999', '1999', '1999', '1999', '1999']
That kind of iteration is faster with lists.
CodePudding user response:
andatum[i]
will reference items in the array. To reference individual characters of these items, you need to use multiple brackets like this andatum[i][x]
To get array of only last 4 characters you need to go over each item of the array like this:
for i in range(len(andatum)):
andatum[i] = andatum[i][:-4]
Or to keep things more tidy and also faster, this oneliner should also do the work:
andatum = [x[:-4] for x in andatum]