Home > Net >  Python - Remove lines prior to current month and year
Python - Remove lines prior to current month and year

Time:08-09

I have a dataframe that contain arrival dates for vessels and I'd want to make python recognize the current year and month that we are at the moment and remove all entries that are prior to the current month and year.

I have a column with the date itself in the format '%d/%b/%Y' and columns for month and year separatly if needed.

For instance, if today is 01/01/2022. I'd like to remove everything that is from dec/2021 and prior.

CodePudding user response:

from datetime import datetime
import pandas as pd

df = ...

# assuming your date column is named 'date'
t = datetime.utcnow()
df = df[pd.to_datetime(df.date) >= datetime(t.year, t.month, t.day)]

CodePudding user response:

Let us consider this example dataframe:

import pandas as pd
import datetime

df = pd.DataFrame()

data = [['nao victoria', '21/Feb/2012'], ['argo', '6/Jun/2022'], ['kon tiki', '23/Aug/2022']]

df = pd.DataFrame(data, columns=['Vessel', 'Date'])

You can convert your dates to datetimes, by using pandas' to_datetime method; for instance, you may save the output into a new Series (column):

df['Datetime']=pd.to_datetime(df['Date'], format='%d/%b/%Y')

You end up with the following dataframe:

     Vessel         Date        Datetime
0   nao victoria  21/Feb/2012 2012-02-21
1           argo   6/Jun/2022 2022-06-06
2       kon tiki  23/Aug/2022 2022-08-23

You can then reject rows containing datetime values that are smaller than today's date, defined using datetime's now method:

df = df[df.Datetime > datetime.datetime.now()]

This returns:

     Vessel         Date   Datetime
2  kon tiki  23/Aug/2022 2022-08-23

CodePudding user response:

Using pandas periods and boolean indexing:

# set up example
df = pd.DataFrame({'date': ['01/01/2022', '08/02/2022', '09/03/2022'], 'other_col': list('ABC')})

# find dates equal or greater to this month
keep = (pd.to_datetime(df['date'], dayfirst=False)
          .dt.to_period('M')
          .ge(pd.Timestamp('today').to_period('M'))
        )

# filter
out = df[keep]

Output:

         date other_col
1  08/02/2022         B
2  09/03/2022         C
  • Related