I am trying to take some values from a Covid database and I wrote the following code which works as I want (see below) but I have a question for you after the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def main():
pd.set_option('display.max_rows', None)
df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
df=df[df["Country/Region"]=="Italy"]
df=df.drop(columns=["Province/State","Lat","Long","Country/Region"])
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
df=df.T
df = df.rename(columns={0: 'date', 1: 'nuovi_casi'})
df['nuovi_casi'] = df['nuovi_casi'].diff(periods=1).fillna(1)
df = df[(df['date'] > '11/26/21') & (df['date'] <= '12/8/21')]
print(df)
dati_giornalieri=list(df.nuovi_casi)
sommatoriaitalia=(sum(dati_giornalieri)/1390000000)*100
print(sommatoriaitalia)
print(dati_giornalieri)
Now I want to add this part of the code to ask the user what is the starting date and the finish date:
def main():
start_date=str(input("Enter starting date in format mm/dd/yy"))
end_date=str(input("Enter ending date in format mm/dd/yy"))
pd.set_option('display.max_rows', None)
df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
df=df[df["Country/Region"]=="Italy"]
df=df.drop(columns=["Province/State","Lat","Long","Country/Region"])
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
df=df.T
df = df.rename(columns={0: 'date', 1: 'nuovi_casi'})
df['nuovi_casi'] = df['nuovi_casi'].diff(periods=1).fillna(1)
df = df[(df['date'] > start_date) & (df['date'] <= end_date)]
but in the line df = df[(df['date'] > start_date) & (df['date'] <= end_date)] there is an error because he cannot compare date to string. I actually tried importing datetime:
start_date = datetime.strptime(input('Enter Start date in the format m/d/y'), '%m/%d/%y')
but I actually had the same result because there is still a problem because for some reason it only consider a day per month or something similar but anyway the result is not as wanted.
How to solve the problem, selecting the days in between? Thanks.
CodePudding user response:
Convert the values to datetime before comparing:
start_date = pd.to_datetime(start_date, format="%m/%d/%y")
end_date = pd.to_datetime(end_date, format="%m/%d/%y")
df["date"] = pd.to_datetime(df["date"], format="%m/%d/%y")
df = df[df["date"].between(start_date, end_date, inclusive="right")]