I have a dataframe like the following:
df = pd.DataFrame(
{
"timestamp1": [
pd.Timestamp("2021-01-01"),
pd.Timestamp("2021-03-01"),
],
"timestamp2": [
pd.Timestamp("2022-01-01"),
pd.Timestamp("2022-03-01"),
],
})
I want to convert this to a list of numpy arrays so I get something like the following:
array([[Timestamp('2021-01-01 00:00:00'),
Timestamp('2022-01-01 00:00:00')],
[Timestamp('2021-01-01 00:00:00'),
Timestamp('2022-03-01 00:00:00')]], dtype=object)
I have tried df.to_numpy()
but this doesn't seem to work as each item is a numpy.datetime64 object.
CodePudding user response:
Use list comprehension for convert values to lists and then to numpy arrays:
print (np.array([list(df[x]) for x in df.columns]))
[[Timestamp('2021-01-01 00:00:00') Timestamp('2021-03-01 00:00:00')]
[Timestamp('2022-01-01 00:00:00') Timestamp('2022-03-01 00:00:00')]]
CodePudding user response:
In [176]: df
Out[176]:
timestamp1 timestamp2
0 2021-01-01 2022-01-01
1 2021-03-01 2022-03-01
I don't know much about pd.Timestamp
, but it looks like the values are actually stored as you got from to_numpy()
, as numpy.datetime64[ns]
:
In [179]: df.dtypes
Out[179]:
timestamp1 datetime64[ns]
timestamp2 datetime64[ns]
dtype: object
An individual column, a Series, has a tolist()
method
In [190]: df['timestamp1'].tolist()
Out[190]: [Timestamp('2021-01-01 00:00:00'), Timestamp('2021-03-01 00:00:00')]
That's why `@jezrael's answer works
In [191]: arr = np.array([list(df[x]) for x in df.columns])
In [192]: arr
Out[192]:
array([[Timestamp('2021-01-01 00:00:00'),
Timestamp('2021-03-01 00:00:00')],
[Timestamp('2022-01-01 00:00:00'),
Timestamp('2022-03-01 00:00:00')]], dtype=object)
Once you have an array, you can easily transpose it:
In [193]: arr.T
Out[193]:
array([[Timestamp('2021-01-01 00:00:00'),
Timestamp('2022-01-01 00:00:00')],
[Timestamp('2021-03-01 00:00:00'),
Timestamp('2022-03-01 00:00:00')]], dtype=object)
An individual Timestamp
object can be converted/displayed in various ways:
In [196]: x=arr[0,0]
In [197]: type(x)
Out[197]: pandas._libs.tslibs.timestamps.Timestamp
In [198]: x.to_datetime64()
Out[198]: numpy.datetime64('2021-01-01T00:00:00.000000000')
In [199]: x.to_numpy()
Out[199]: numpy.datetime64('2021-01-01T00:00:00.000000000')
In [200]: x.to_pydatetime()
Out[200]: datetime.datetime(2021, 1, 1, 0, 0)
In [201]: print(x)
2021-01-01 00:00:00
In [202]: repr(x)
Out[202]: "Timestamp('2021-01-01 00:00:00')"