I have a DataFrame read from Excel with one of the columns of type DateTime.
sales_data=pandas.read_excel(r'Sample Sales Data.xlsx')
I was able to extract substrings from other columns using str.extract/lambda functions. But I was unable to process the column "Order Date"
The command sales_data['Order Date']
gives the below output
As recommended in But when I tried the datetime operation
sales_data['Order Date'].apply(lambda x:x.strftime("m"))
I got the error: AttributeError: 'int' object has no attribute 'strftime' I got a similar error for the command
sales_data['Order Date'].apply(lambda x:x.dt.month)
Please suggest a method to extract month from the datetime object into another column without iterating through the DataFrame. I am not able to use datetime or int functions with this column since it is behaving as both a datetime and int column.
CodePudding user response:
Use dt
accessor:
sales_data=pandas.read_excel(r'Sample Sales Data.xlsx', parse_dates=['Order Date'])
sales_data['Order Date'].dt.month
Suppose this dataframe:
>>> sales_data
Order Date
0 2016-01-01 00:00:00
1 2016-03-28 22:00:00
>>> sales_data['month'] = sales_data['Order Date'].dt.month
>>> sales_data
Order Date month
0 2016-01-01 00:00:00 1
1 2016-03-28 22:00:00 3
CodePudding user response:
Try using pd.to_datetime
to ensure your columns dtype is datetime
. Then use dt.month
to extract the month. You can also extract day and year by using dt.day
, dt.year
respectively.
import pandas as pd
sales_data = pd.read_excel(r'Sample Sales Data.xlsx')
sales_data['Order_Month'] = pd.to_datetime(sales_data['Order Date']).dt.month
CodePudding user response:
I found the issue.
The sales_data['Order Date']
column had a mix of both date and int values due to some input data inaccuracy.
I found this since
sales_data['DateType']=sales_data['Order Date'].apply(lambda x:type(x))
sales_data['DateType'].unique()
returned array([<class 'datetime.datetime'>, <class 'int'>], dtype=object)
I cleaned this DataFrame by filtering out the values without the datetype as datetime.
type1=type(sales_data['DateType'][0])
new_df=sales_data[sales_data['DataType']==type1]
Now the new dataframe supports the date and string operations.
pd.to_datetime(new_df['Order Date']).dt.month
This can be assigned to other columns.