Home > Mobile >  Pandas: How to create date32 dtype datetime objects?
Pandas: How to create date32 dtype datetime objects?

Time:04-08

I am working with parquet and I need to use date32[day] objects for my dates but I am unclear how to use pandas to generate this exact datatype, rather than a timestamp.

Consider this example:

from datetime import datetime, date
import pyarrow.parquet as pq
import pandas as pd

df1 = pd.DataFrame({'date': [date.today()]})
df1.to_parquet('testdates.parquet')
pq.read_table("testdates.parquet")  # date32[day]
# pandas version

df2 = pd.DataFrame({'date': [pd.to_datetime('2022-04-07')]})
df2.to_parquet('testdates2.parquet')
pq.read_table("testdates2.parquet")  # timestamp[us]

CodePudding user response:

From pandas integraton with pyarrow here

import pyarrow as pa
from datetime import date

df2 = pd.Series({'date':[date(2022,4,7)]})
df2_dat32 = pa.array(df2)

print("dataframe:", df2)
print("value of dataframe:", df2_dat32[0])
print("datatype:", df2_dat32.type)

Output

dataframe: date    [2022-04-07]
dtype: object
value of dataframe: [datetime.date(2022, 4, 7)]
datatype: list<item: date32[day]>

Edit: If you have entire column of dates, you will need to first convert datetime to date and then use same method as above. See example below:

import pyarrow as pa
from datetime import date

#create pandas DataFrame with one column with five
#datetime values through a dictionary
datetime_df = pd.DataFrame({'DateTime': ['2021-01-15 20:02:11',
                                '1989-05-24 20:34:11',
                                '2020-01-18 14:43:24',
                                '2021-01-15 20:02:10',
                                '1999-04-04 20:34:11']})

datetime_df['Date'] = pd.to_datetime(datetime_df['DateTime']).dt.date

date_series = pd.Series(datetime_df['Date']) 
print(date_series)

Output:

0    2021-01-15
1    1989-05-24
2    2020-01-18
3    2021-01-15
4    1999-04-04
Name: Date, dtype: object

Then use pyarrow for conversion:

df2_dat32 = pa.array(date_series)

print("datatype of values in the dataframe with dates:", type(date_series[0]))
print("value of dataframe after converting using pyarrow:", df2_dat32[0])
print("datatype after converting using pyarrow :", df2_dat32.type)

Output:

datatype of values in the dataframe with dates: <class 'datetime.date'>
value of dataframe after converting using pyarrow: 2021-01-15
datatype after converting using pyarrow : date32[day]
  • Related