Home > OS >  How to return values only within a specific date range?
How to return values only within a specific date range?

Time:02-11

I have a program that scrapes through an API and gets the required values from the fields. There is a field called published_date one act json object. I want to publish only the values for the last 2 months from current date.

try:
    price = str(price).replace(',', '')
    price = Decimal(price)

    if date < end:

        if not math.isnan(price):
            report_item = PriceItem(
            source=SOURCE,
            source_url=crawled_url,
            original_index_id=original_index_id,
            index_specification=index_specification,
            published_date=date,
            price=price.quantize(Decimal('1.00'))
        )



            yield report_item
except DecimalException as ex:
    self.logger.error(f"Non decimal price of {price} "
                      f"found in {original_index_id}", exc_info=ex)

The published date is extracted:

                     for report_date in REPORT_DATE_TYPES:
                        if report_date in result:
                            date = result[report_date].split(' ')[0]
                            date = datetime.strptime(date, '%m/%d/%Y')
MAX_REPORT_MONTHS = 3
current_date = datetime.now()
current_date_str = current_date.strftime('%m/%d/%Y')
start = datetime.strptime(current_date_str, '%m/%d/%Y')

last_date = current_date - relativedelta(months=MAX_REPORT_MONTHS)
last_date_str = last_date.strftime('%m/%d/%Y')
end = datetime.strptime(last_date_str, '%m/%d/%Y')

The above I say last date string and current date string.

Extract of the api:

enter image description here

CodePudding user response:

After having gathered the data into a dataframe you can convert the column containing the dates to datetime and then through comparison operators mantain just the desidered data.

For example, assuming this is your data:

data = {'date': ['02/02/2022 10:23:23', '09/23/2021 10:23:23', '02/01/2021 10:23:23', '12/15/2021 10:23:23'], 'random': [324, 231, 213, 123]}
df = pd.DataFrame(data)

# convert date column to datetime
df['date'] = pd.to_datetime(df['date'], format="%m/%d/%Y %H:%M:%S")

# select "threshold" date, two months before current one
current_date = datetime.now()
last_date = current_date - relativedelta(months=2)

# select data published after last_date
df[df['date'] > last_date]

If we consider the date of today we will have this result.

Before:

                   date  random
0   02/02/2022 10:23:23     324
1   09/23/2021 10:23:23     231
2   02/01/2021 10:23:23     213
3   12/15/2021 10:23:23     123

After:

                   date  random
0   2022-02-02 10:23:23     324
3   2021-12-15 10:23:23     123
  • Related