I am trying to achieve a script, which will create an Orange data table with just a single column containing a custom time stamp.
Usecase: I need a complete time stamp so I can merge some other csv files later on. I'm working in the Orange GUI btw and am not working in the actual python shell or any other IDE (in case this information makes any difference).
Here's what I have come up with so far:
From Orange.data import Domain, Table, TimeVariable
import numpy as np
domain = Domain([TimeVariable("Timestamp")])
# Timestamp from 22-03-08 to 2022-03-08 in minute steps
arr = np.arange("2022-03-08", "2022-03-15", dtype="datetime64[m]")
# Obviously necessary to achieve a correct format for the matrix
arr = arr.reshape(-1,1)
out_data = Table.from_numpy(domain, arr)
However the results do not match:
>>> print(arr)
[['2022-03-08T00:00']
['2022-03-08T00:01']
['2022-03-08T00:02']
...
['2022-03-14T23:57']
['2022-03-14T23:58']
['2022-03-14T23:59']]
>>> print(out_data)
[[27444960.0],
[27444961.0],
[27444962.0],
...
[27455037.0],
[27455038.0],
[27455039.0]]
Obviously I'm missing something when handing over the data from numpy but I'm having a real hard time trying to understand the documentation.
I've also found this post which seems to tackle a similar issue, but I haven't figured out how to apply the solution on my problem.
I would be really glad if anyone could help me out here. Please try to use simple terms and concepts cause I'm a true noob.
CodePudding user response:
Thank you for the question, and apologies for the weak documentation of the TimeVariable
.
In your code, you must change two things to work.
First, it is necessary to set whether the TimeVariable
includes time and/or date data:
TimeVariable("Timestamp", have_date=True)
stores only date information -- it is analogous todatetime.date
TimeVariable("Timestamp", have_time=True)
stores only time information (without date) -- it is analogous todatetime.time
TimeVariable("Timestamp", have_time=True, have_date=True)
stores date and time -- it is analogous todatetime.datetime
You didn't set that information in your example, so both were False
by default. For your case, you must set both to True
since your attribute will hold the date-time values.
The other issue is that Orange's Table stores date-time values as UNIX epoch (seconds from 1970-01-01), and so also Table.from_numpy
expect values in this format. Values in your current arr
array are in minutes instead. I just transformed the dtype
in the code below to seconds.
Here is the working code:
from Orange.data import Domain, Table, TimeVariable
import numpy as np
# Important: set whether TimeVariable contains time and/or date
domain = Domain([TimeVariable("Timestamp", have_time=True, have_date=True)])
# Timestamp from 22-03-08 to 2022-03-08 in minute steps
arr = np.arange("2022-03-08", "2022-03-15", dtype="datetime64[m]").astype("datetime64[s]")
# necessary to achieve a correct format for the matrix
arr = arr.reshape(-1,1)
out_data = Table.from_numpy(domain, arr)