I am working on a problem where I have to take user input which is an integer indicating the number of months I have to look back at. For example if I want to look at the data 3 months back I must take input from user as 3. Based on this integer user input I have to filter my dataset. For example today's date is 8/30/2022 so 3 months back will be 5/30/2022. Now I want to filter my dataframe to include only those rows for this date which is 3 months back i.e. 5/30/2022
I tried using datetime and relativetime libraries but nothing seems to work for me.
Below is an example of my dataframe:
so I want output to be rows corresponding to 1 and 2
CodePudding user response:
Please try to show your attempt in future, this feels like I'm doing your homework, but hopefully, this gives you an idea.
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3], 'text1': ['Ram', 'John', 'Rich'], 'text2': ['patient', 'patient', 'child'], 'date': ['5/30/2021 10:22:00', '5/30/2022 11:45:08', '5/28/2022 10:45:13']})
user_input = int(input("Enter the number of months to look back: "))
#convert the date column to datetime
df['date'] = pd.to_datetime(df['date'])
#set the date column as index
df = df.set_index('date')
#sort the index
df = df.sort_index()
#filter the dataframe to get the rows which are within the last x months
df = df[df.index >= pd.Timestamp.today() - pd.DateOffset(months=user_input)]
#convert the index back to a column
df.reset_index(inplace=True)
#print the filtered dataframe
print(df)
Input:
Enter the number of months to look back: 12
Output:
date id text1 text2
0 2022-05-28 10:45:13 3 Rich child
1 2022-05-30 11:45:08 2 John patient
CodePudding user response:
You can use the DateOffset
function in Pandas. See the documentation for more details.
An example of this is below which assumes you have a data frame with a date column:
num_months = int(input('Please enter the number of months to look back: '))
df['date'] = pd.to_datetime(df['date'])
past_date = pd.to_datetime('today') - pd.DateOffset(months=num_months)
df = df[df['date'] >= past_date]
print(df)
The above will filter the data frame on the date
column leaving only those rows where the date is on or after the calculated date i.e. today's date minus the specified number of months.