Home > Blockchain >  How to extract csv data based on user input (date range)
How to extract csv data based on user input (date range)

Time:05-17

I'm a Python beginner, we have to read in .csv data and then extract date ranged data (user input). Expected output example below. How do I loop through the reader and extract the rows within the date range (from user input)?

I think I have to convert the inputed dates and the column of dates in the .csv file to date objects using datetime.strptime but am unsure how to go about this for the dates in the .csv file. Then I have to display the number of new infections for the period, the total number of infections at the end date, the percentage of the population that is infected and the name of the region. The unknown region can be ignored and excluded from output. The .csv file contains data for approximately 3 months.

I'm thinking that I can append to an empty list the rows within the user's inputed dates and then write to a csv file? I should only be using base python, please no Pandas solution.

My current code:

import csv
from datetime import datetime
#Ask user to input the name of the file they wish to read
file_name = input("Enter the name of the CSV file:\n")
regional_data = open(file_name)
data_reader = csv.reader(regional_data)
cumulativeregional_data = list(data_reader)
#Print 1st and last date for the user before asking for a date range as input
print(f"The first record is for the {cumulativeregional_data[1][0]}\nThe last record is for the {cumulativeregional_data[-1][0]}")
start_date = input("Enter the start date:\n")
startdate_object = datetime.strptime(start_date, "%d/%m/%Y")
end_date = input("Enter the end date:\n")
enddate_object = datetime.strptime(end_date, "%d/%m/%Y")

The CSV we are reading data in from (sample in script):

date,region,region_id,total_infections, adjusted_total_infections, total_deaths, total_recoveries, current_infections, population, day_no, daily_infections, daily_deaths
01/01/2001, Unknown, U,0,0,0,0,0,0,1,0,0
01/01/2001, East,E,5000,0,20,3800,1180,150000,1,100,7
01/01/2001, North,N,3550,0,25,3150,375,180000,1,80,0
01/01/2001, Central,C,4250,0,38,3200,264,175000,1,120,0
01/01/2001, South,S,5525,0,10,5120,395,185000,1,110,0
01/01/2001, West,W,4150,0,45,3850,255,155000,1,80,0
02/02/2001, Unknown, U,0,0,0,0,0,0,2,0,0
02/02/2001, East,E,5300,0,27,3950,1323,150000,2,300,0
02/02/2001, North,N,3750,0,25,3350,375,180000,2,200,5
02/02/2001, Central,C,4350,0,38,3310,1002,175000,2,100,7
02/02/2001, South,S,5550,0,10,5220,320,185000,2,25,1
02/02/2001, West,W,4500,0,45,4000,455,155000,2,350,0
03/01/2001, Unknown, U,0,0,0,0,0,0,3,0,0
03/01/2001, East,E,5450,0,27,4000,1423,150000,3,150,10
03/01/2001, North,N,3825,0,30,3330,465,180000,3,75,3
03/01/2001, Central,C,4475,0,45,3435,995,175000,3,125,10
03/01/2001, South,S,5705,0,11,5300,394,185000,3,155,0
03/01/2001, West,W,4700,0,45,4200,455,155000,3,200,10
04/01/2001, Unknown, U,0,0,0,0,0,0,4,0,0
04/01/2001, East,E,5520,0,37,4200,1283,150000,4,70,0
04/01/2001, North,N,3910,0,33,3510,367,180000,4,85,0
04/01/2001, Central,C,4710,0,55,3550,1105,175000,4,235,0
04/01/2001, South,S,5710,0,11,5500,199,185000,4,5,0
04/01/2001, West,W,4750,0,55,4350,345,155000,4,50,0

My expected output:

Expected output

CodePudding user response:

you could do what you want with something like this:

import csv
from datetime import datetime

#Ask user to input the name of the file they wish to read
file_name = input("Enter the name of the CSV file:\n")
with open(file_name) as csvfile: # recommended when dealing with files to properly open and close files (context manager)
    data_in = list(csv.reader(csvfile))

# tell user appropriate date range (first & last date)
print(f"The first record is for the {data_in[1][0]}\nThe last record is for the {data_in[-1][0]}")
# Ask for a date range as input
start_date = datetime.strptime(input("Enter the start date:\n"), "%d/%m/%Y")
end_date = datetime.strptime(input("Enter the end date:\n"), "%d/%m/%Y")

# filter dates

# returns True if date is in between start and end
def filter_func(date):
    dt_date = datetime.strptime(date, "%d/%m/%Y")
    return (start_date <= dt_date) and (dt_date <= end_date)

# filter list to include dates, removing the headers from data_in
filtered_list = [item for item in data_in[1:] if filter_func(item[0])]

# print out data
total_period_infections = 0
print("New infections\tTotal infections\tPopulation\tPercentage\tRegion") # table headers
for item in filtered_list:
    if item[1] == ' Unknown': # filter out the unknown region 
        continue
    
    total_period_infections  = int(item[7]) # to use for the last print statement
    print(f"{item[7]}\t{item[3]}\t{item[8]}\t{round(int(item[3]) / int(item[8]), 3)}\t{item[1]}")

print(f"Total new infections for the period: {total_period_infections}")

The program filters out the 'unknown' regions. However, for the new infections I was unsure how to calculate those numbers from the data supplied. The table will require formatting for the exact version that you want, however the data has been filtered to include those dates that the user has entered and then to print out the data accordingly.

  • Related