Tying to "filter" years from a text file. [Can't use pandas or numpys]
I only want to keep years 2015-2020. So if year isn't 2015-2020 remove the record alltogether. Origional data looks like:
series_id year period value footnote_codes
LASST480000000000003 1976 M01 5.8
LASST480000000000003 1976 M02 5.8
Previosly, I split the date into list and then I tried to create an "if". . The code doesn't show any errors but is still not filtering the years. Here are a few codes I have tried:
unwanted_years={'1976','1977','1978','1979','1980','1981','1982','1983','1984','1985','1986','1987','1988','1989','1990'}
#Filtering years(First attempt):
if row[1] in unwanted_years:
rows.strip()
#Filtering years(2 attempt)
if row[1] in unwanted_years:
row.remove()
#Filtering years(3 attempt)
if row[1] in unwanted_years:
row.remove('/t')
Any help is appreciated! Full code looks like this:
#Creating string for formated columns
new_colum = ''
unwanted_years={'1976','1977','1978','1979','1980','1981','1982','1983','1984','1985','1986','1987','1988','1989','1990'}
#Calling file
output_file = open('Desktop/HW3/HW3_1/' state_names[counter] '.txt','w')
#for loop to manipulate file
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
rows = html.splitlines() # split the data in rows. DON'T NEED `1` because I don't need `\n'
#Starting counter for rows
#First row/header
if counter == 0:
#Creating title for new column
new_colum = rows[0] '\t' "Month_Year" '\n'
#Removing unwanted column from header
new_colum=new_colum.replace('footnote_codes',' ')
#Next rows:
for row in rows[1:]: # get directly string instead of index
#Filtering years:
if row[1] == '1976':
row.strip()
#Converting rows into a list
row = row.split('\t')
#Creating new value using column 1 and 2 (month_year)
new_value ='\t' row[2] '_' row[1]
#Appending to list
row.append(new_value)
#Removing unwanted column from all the data
row.remove(row[4])
#Converting back to a string
row = '\t'.join(row)
#Adding that mont_year column to all the data
new_colum = row '\n'
#Saving file with changes
output_file.write(new_colum) # once finished with the for loops then the it will download and close.
output_file.close()
CodePudding user response:
It's simpler to check if the year is wanted.
wanted_years = ("2015","2016","2017","2018","2019","2020")
# just filter rows by list comprehension
filtered_rows = [r for r in rows if all(x not in r for x in wanted_years)]
CodePudding user response:
I assume you can use the csv
module since it comes with Python. If your actual delimiter is tab use delimiter='\t'
instead, but tabs don't work well on SO so I'll use comma delimiter as an example:
input.csv
series_id,year,period,value,footnote_codes
LASST480000000000003,1976,M01,5.8
LASST480000000000003,1978,M02,5.8
LASST480000000000003,1980,M01,5.8
LASST480000000000003,1982,M01,5.8
LASST480000000000003,1984,M02,5.8
LASST480000000000003,2014,M01,5.8
LASST480000000000003,2015,M01,5.8
LASST480000000000003,2016,M01,5.8
LASST480000000000003,2017,M01,5.8
LASST480000000000003,2018,M01,5.8
LASST480000000000003,2019,M01,5.8
LASST480000000000003,2020,M01,5.8
LASST480000000000003,2021,M01,5.8
LASST480000000000003,2022,M01,5.8
demo.py
import csv
with (open('input.csv', 'r', newline='') as fin,
open('output.csv', 'w', newline='') as fout):
reader = csv.reader(fin, delimiter=',')
writer = csv.writer(fout, delimiter=',')
writer.writerow(next(reader)) # move headers to output file
for row in reader:
if 2015 <= int(row[1]) <= 2020:
writer.writerow(row)
output.csv
series_id,year,period,value,footnote_codes
LASST480000000000003,2015,M01,5.8
LASST480000000000003,2016,M01,5.8
LASST480000000000003,2017,M01,5.8
LASST480000000000003,2018,M01,5.8
LASST480000000000003,2019,M01,5.8
LASST480000000000003,2020,M01,5.8
If you want to write different columns you can use DictReader/DictWriter. Below will only write the first two columns to the output:
import csv
with (open('input.csv', 'r', newline='') as fin,
open('output.csv', 'w', newline='') as fout):
reader = csv.DictReader(fin, delimiter=',')
writer = csv.DictWriter(fout, delimiter=',',
fieldnames=reader.fieldnames[:2],
extrasaction='ignore')
writer.writeheader()
for row in reader:
if 2015 <= int(row['year']) <= 2020:
writer.writerow(row)
output.csv
series_id,year
LASST480000000000003,2015
LASST480000000000003,2016
LASST480000000000003,2017
LASST480000000000003,2018
LASST480000000000003,2019
LASST480000000000003,2020