I have three arrays of the same length containing integers: years
, months
and days
. I want to create a (NumPy) array of the same length, containing formatted strings like '(-)yyyy-mm-dd'
using the format '%i-%2.2i-%2.2i'
.
For the scalar case, I would do something like
year=2000; month=1; day=1
datestr = '%i-%2.2i-%2.2i' % (year, month, day)
which would yield '2000-01-01'
.
How can I create the vector version of this, e.g.:
import numpy as np
years = np.array([-1000, 0, 1000, 2000])
months = np.array([1, 2, 3, 5])
days = np.array([1, 11, 21, 31])
datestr_array = numpy.somefunction(years, months, days, format='%i-%2.2i-%2.2i', ???)
Note that the date range I am interested in lies between the years -2000 and 3000 (CE), and hence both Python's datetime
and Pandas' DateTimeIndex
offer no solution.
CodePudding user response:
Explanation
Let's create a function that will convert any date without bounds to a yyyy-mm-dd string. We can use string formatting, where we create a predefined string and simply format in the relevant data. We also need to format the length to have zeros at the front to 'fill it out', i.e. 2001-05-20.
To be able to run this function, all the respective years months and days must be grouped together, which can be achieved with a zip
function, which groups rows between columns as tuples. Preferably, we will convert this to a numpy
array.
Now that we have the data in the correct tupled form, let's parse it through our function. We can create a new array that does this using apply
, namely numpy.apply_on_axis(func, axis, data)
. Because the tuples are in the second axis, the axis parameter must be set to 1.
Code
def FormatDate(data):
# Where data is a tuple for y, m, d
return "{0:04}-{1:02}-{2:02}".format(data[0], data[1], data[2]) # Note that this formatting can later be update to account for some weirdness
# Convert the data into tuples where y, m, d are aligned in rows
converted = numpy.array(list(zip(years, months, days)))
# Now, lets apply that function to make the tuples all dates
datestr_array = numpy.apply_along_axis(FormatDate, 1, converted)