I would like to (manually) create in Python a dataframe with daily dates (in column 'date') as per below code. But the code does not provide the correct format for the daily dates, neglects dates (the desired format representation is below). Could you please advise how I can correct the code so that the 'date' column is entered in a desired format? Thanks in advance!
------------------------------------------------------
desired format for date column
2021-03-22 3
2021-04-07 3
2021-04-18 3
2021-05-12 0
------------------------------------------------------
df1 = pd.DataFrame({"date": [2021-3-22, 2021-4-7, 2021-4-18, 2021-5-12],
"x": [3, 3, 3, 0 ]})
df1
date x
0 1996 3
1 2010 3
2 1999 3
3 2004 0
CodePudding user response:
You would want to create the series as a datetime and use the following codes when doing so as strings, more info here pandas.to_datetime:
df1 = pd.DataFrame({"date": pd.to_datetime(["2021-3-22", "2021-4-7", "2021-4-18", "2021-5-12"]),
"x": [3, 3, 3, 0 ]})
CodePudding user response:
Python wants to interpret the numbers in the sequence 2021-3-22
as a series of mathematical operations 2021 minus 3 minus 22
.
If you want that item to be stored as a string that resembles a date you will need to mark them as string literal datatype (str
), as shown below by encapsulating them with quotes.
import pandas as pd
df1 = pd.DataFrame({"date": ['2021-3-22', '2021-4-7', '2021-4-18', '2021-5-12'],
"x": [3, 3, 3, 0 ]})
The results for the date column, as shown here indicate that the date column contains elements of the object
datatype which encompasses str
in pandas. Notice that the strings were created exactly as shown (2021-3-22
instead of 2021-03-22
).
0 2021-3-22
1 2021-4-7
2 2021-4-18
3 2021-5-12
Name: date, dtype: object
IF however, you actually want them stored as datetime
objects so that you can do datetime manipulations on them (i.e. determine the number of days between to dates OR filter by a specific month OR year) then you need to convert the values to datetime
objects.
This technique will do that:
df1['date'] = pd.to_datetime(df1['date'])
The results of this conversion are Pandas datetime
objects which enable nanosecond precision (I differentiate this from Python datetime
objects which are limited to microsecond precision).
0 2021-03-22
1 2021-04-07
2 2021-04-18
3 2021-05-12
Name: date, dtype: datetime64[ns]
Notice the displayed results are now formatted just as you would expect of datetimes
(2021-03-22
instead of 2021-3-22
).