Home > Blockchain >  How to convert a dataframe of Timestamps to a numpy list of Timestamps
How to convert a dataframe of Timestamps to a numpy list of Timestamps

Time:04-26

I have a dataframe like the following:

df = pd.DataFrame(
                {
                    "timestamp1": [
                        pd.Timestamp("2021-01-01"),
                        pd.Timestamp("2021-03-01"),
                    ],
                    "timestamp2": [
                        pd.Timestamp("2022-01-01"),
                        pd.Timestamp("2022-03-01"),
                    ],
                })

I want to convert this to a list of numpy arrays so I get something like the following:

array([[Timestamp('2021-01-01 00:00:00'),
        Timestamp('2022-01-01 00:00:00')],
       [Timestamp('2021-01-01 00:00:00'),
        Timestamp('2022-03-01 00:00:00')]], dtype=object)

I have tried df.to_numpy() but this doesn't seem to work as each item is a numpy.datetime64 object.

CodePudding user response:

Use list comprehension for convert values to lists and then to numpy arrays:

print (np.array([list(df[x]) for x in df.columns]))
[[Timestamp('2021-01-01 00:00:00') Timestamp('2021-03-01 00:00:00')]
 [Timestamp('2022-01-01 00:00:00') Timestamp('2022-03-01 00:00:00')]]

CodePudding user response:

In [176]: df
Out[176]: 
  timestamp1 timestamp2
0 2021-01-01 2022-01-01
1 2021-03-01 2022-03-01

I don't know much about pd.Timestamp, but it looks like the values are actually stored as you got from to_numpy(), as numpy.datetime64[ns]:

In [179]: df.dtypes
Out[179]: 
timestamp1    datetime64[ns]
timestamp2    datetime64[ns]
dtype: object

An individual column, a Series, has a tolist() method

In [190]: df['timestamp1'].tolist()
Out[190]: [Timestamp('2021-01-01 00:00:00'), Timestamp('2021-03-01 00:00:00')]

That's why `@jezrael's answer works

In [191]: arr = np.array([list(df[x]) for x in df.columns])
In [192]: arr
Out[192]: 
array([[Timestamp('2021-01-01 00:00:00'),
        Timestamp('2021-03-01 00:00:00')],
       [Timestamp('2022-01-01 00:00:00'),
        Timestamp('2022-03-01 00:00:00')]], dtype=object)

Once you have an array, you can easily transpose it:

In [193]: arr.T
Out[193]: 
array([[Timestamp('2021-01-01 00:00:00'),
        Timestamp('2022-01-01 00:00:00')],
       [Timestamp('2021-03-01 00:00:00'),
        Timestamp('2022-03-01 00:00:00')]], dtype=object)

An individual Timestamp object can be converted/displayed in various ways:

In [196]: x=arr[0,0]
In [197]: type(x)
Out[197]: pandas._libs.tslibs.timestamps.Timestamp
In [198]: x.to_datetime64()
Out[198]: numpy.datetime64('2021-01-01T00:00:00.000000000')
In [199]: x.to_numpy()
Out[199]: numpy.datetime64('2021-01-01T00:00:00.000000000')
In [200]: x.to_pydatetime()
Out[200]: datetime.datetime(2021, 1, 1, 0, 0)
In [201]: print(x)
2021-01-01 00:00:00
In [202]: repr(x)
Out[202]: "Timestamp('2021-01-01 00:00:00')"
  • Related