I need to add a column to a CSV file that was created through a web imported data. The new column has to be a concatanation of two rows something like 06_2018.
New_Format_Data = ''
Output_File = open('Desktop/HW3/' state_names[counter] '.txt','w')
for counter in range(0 , len(urls)):#Will go tru all the states.
print (urls[counter])
html = urllib.request.urlopen(urls[counter]).read().decode('utf-8')#opening url
rows = html.splitlines(1)#Split the data in rows. The number 1 is very important
if counter ==0:
New_Format_Data = "Test" rows[0] #Header
for row in range(1, len(rows)): #First row...
New_Format_Data = 'Test' '\t' rows[row]#Adding that state column.
Output_File.write(New_Format_Data)#Once finished with the for loops then the it will download and close.
Output_File.close()
CodePudding user response:
I have no idea what you have in rows - and which columns you want to concatenate so I use column 1 and 2 as example.
You will have to split row (string) into list of values, next replace/add value in this list, next join all values back to single string, and then you can write to new file.
It will need to remove \n
from the end of string because it needs to add new value in the same row - so 1
in splitlines()
will be useless.
Something like this.
I get directly string from list instead of using index and range(len(..))
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split(',')
# create new value using column 1 and 2
new_value = row[1] '_' row[2]
# append to list
row.append(new_value)
# convert back to string
row = ','.join(row)
# add new row and `\n` at the end
New_Format_Data = 'Test' '\t' row '\n'
Full code could look like this
# PEP8: at least two spaces before `#` and one space after `#
new_format_data = '' # PEP8: `lower_case_names` for variables
output_file = open('Desktop/HW3/' state_names[counter] '.txt','w')
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
rows = html.splitlines() # split the data in rows. DON'T NEED `1` because I don't need `\n'
if counter == 0:
new_format_data = "Test" rows[0] ',new_columns' '\n' # header with new column
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split(',')
# create new value using column 1 and 2
new_value = row[1] '_' row[2]
# append to list
row.append(new_value)
# convert back to string
row = ','.join(row)
new_format_data = 'Test' '\t' row '\n' # adding that state column.
# --- after loop ---
output_file.write(new_format_data) # once finished with the for loops then the it will download and close.
output_file.close()
But this may have problem if some column will have ,
in value because split
will treat it as separator. So it would be better to use standard module csv
which would resolve all problems.
Something like
import csv
output_file = open('Desktop/HW3/' state_names[counter] '.txt','w')
# create csv writer
output_csv = csv.writer(output_file)
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
# read all rows from csv
rows = list(csv.reader(html.splitlines()))
if counter == 0:
headers = rows[0]
headers[0] = "Test" headers[0]
headers.append('new_colum')
# write headers
output_csv.writerow(headers)
for row in rows[1:]: # get directly string instead of index
# create new value using column 1 and 2
new_value = row[1] '_' row[2]
# append to row
row.append(new_value)
# write row
output_csv.writerow(row)
# --- after loop ---
output_file.close()
PEP 8 -- Style Guide for Python Code
CodePudding user response:
It ended up working like this:
new_format_data = '' # PEP8: `lower_case_names` for variables
output_file = open('Desktop/HW3_2/' state_names[counter] '.txt','w')
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
rows = html.splitlines() # split the data in rows. DON'T NEED `1` because I don't need `\n'
if counter == 0:
# new_format_data = "Month_Year" '\t' rows[0] '\n' # header with new column
new_format_data = rows[0] "Month_Year" '\n' # header with new column
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split('\t')
# create new value using column 1 and 2
new_value = row[2] '_' row[1]
# append to list
row.append(new_value)
# convert back to string
row = '\t'.join(row)
new_format_data = row '\n' # adding that state column.
output_file.write(new_format_data) # once finished with the for loops then the it will download and close.
output_file.close()
I want to revise, it actually was txt
format not CSV
.
Now, i'm trying to remove a column and filter the information. So, one of the columns is "year". The origional data starts from 1976
through 2022
. I only need information from 2015
till 2020
.
Tried a few things but i broke the rest of the code :(