Home > OS >  Extract month from datetime column in pandas DataFrame
Extract month from datetime column in pandas DataFrame

Time:09-30

I have a DataFrame read from Excel with one of the columns of type DateTime.

sales_data=pandas.read_excel(r'Sample Sales Data.xlsx')

I was able to extract substrings from other columns using str.extract/lambda functions. But I was unable to process the column "Order Date"

The command sales_data['Order Date'] gives the below output

enter image description here

As recommended in enter image description here But when I tried the datetime operation

sales_data['Order Date'].apply(lambda x:x.strftime("m"))

I got the error: AttributeError: 'int' object has no attribute 'strftime' I got a similar error for the command

sales_data['Order Date'].apply(lambda x:x.dt.month)

Please suggest a method to extract month from the datetime object into another column without iterating through the DataFrame. I am not able to use datetime or int functions with this column since it is behaving as both a datetime and int column.

CodePudding user response:

Use dt accessor:

sales_data=pandas.read_excel(r'Sample Sales Data.xlsx', parse_dates=['Order Date'])
sales_data['Order Date'].dt.month

Suppose this dataframe:

>>> sales_data
            Order Date
0  2016-01-01 00:00:00
1  2016-03-28 22:00:00

>>> sales_data['month'] = sales_data['Order Date'].dt.month
>>> sales_data
           Order Date  month
0 2016-01-01 00:00:00      1
1 2016-03-28 22:00:00      3

CodePudding user response:

Try using pd.to_datetime to ensure your columns dtype is datetime. Then use dt.month to extract the month. You can also extract day and year by using dt.day, dt.year respectively.

import pandas as pd

sales_data = pd.read_excel(r'Sample Sales Data.xlsx')
sales_data['Order_Month'] = pd.to_datetime(sales_data['Order Date']).dt.month

CodePudding user response:

I found the issue. The sales_data['Order Date'] column had a mix of both date and int values due to some input data inaccuracy. I found this since

sales_data['DateType']=sales_data['Order Date'].apply(lambda x:type(x))
sales_data['DateType'].unique() 

returned array([<class 'datetime.datetime'>, <class 'int'>], dtype=object) I cleaned this DataFrame by filtering out the values without the datetype as datetime.

type1=type(sales_data['DateType'][0])
new_df=sales_data[sales_data['DataType']==type1]

Now the new dataframe supports the date and string operations.

pd.to_datetime(new_df['Order Date']).dt.month

This can be assigned to other columns.

  • Related