Home > Back-end >  Modify CSV file [Can't use pandas or numpys]
Modify CSV file [Can't use pandas or numpys]

Time:03-21

I need to add a column to a CSV file that was created through a web imported data. The new column has to be a concatanation of two rows something like 06_2018.

New_Format_Data = ''

Output_File = open('Desktop/HW3/'   state_names[counter]    '.txt','w')

for counter in range(0 , len(urls)):#Will go tru all the states.
    print (urls[counter])
    
    html = urllib.request.urlopen(urls[counter]).read().decode('utf-8')#opening url
    
    rows = html.splitlines(1)#Split the data in rows. The number 1 is very important
    
    if counter ==0:
        New_Format_Data = "Test"   rows[0] #Header
        
    for row in range(1, len(rows)): #First row...
            
        New_Format_Data  = 'Test'   '\t'   rows[row]#Adding that state column.
    
Output_File.write(New_Format_Data)#Once finished with the for loops then the it will download and close.
Output_File.close()

CodePudding user response:

I have no idea what you have in rows - and which columns you want to concatenate so I use column 1 and 2 as example.

You will have to split row (string) into list of values, next replace/add value in this list, next join all values back to single string, and then you can write to new file.

It will need to remove \n from the end of string because it needs to add new value in the same row - so 1 in splitlines() will be useless.


Something like this.

I get directly string from list instead of using index and range(len(..))

for row in rows[1:]:  # get directly string instead of index
    
    # convert to list
    row = row.split(',')
    
    # create new value using column 1 and 2
    new_value = row[1]   '_'   row[2]
    
    # append to list
    row.append(new_value)
    
    # convert back to string
    row = ','.join(row)
    
    # add new row and `\n` at the end
    New_Format_Data  = 'Test'   '\t'   row   '\n'

Full code could look like this

# PEP8: at least two spaces before `#` and one space after `#

new_format_data = ''  # PEP8: `lower_case_names` for variables

output_file = open('Desktop/HW3/'   state_names[counter]    '.txt','w')

for counter, url in enumerate(urls):
    print('url:', url)
    
    html = urllib.request.urlopen(url).read().decode('utf-8')  # opening url
    
    rows = html.splitlines()  # split the data in rows. DON'T NEED `1` because I don't need `\n'
    
    if counter == 0:
        new_format_data = "Test"   rows[0]   ',new_columns'   '\n'  # header with new column
        
    for row in rows[1:]:  # get directly string instead of index
        
        # convert to list
        row = row.split(',')
        
        # create new value using column 1 and 2
        new_value = row[1]   '_'   row[2]
        
        # append to list
        row.append(new_value)
        
        # convert back to string
        row = ','.join(row)
                    
        new_format_data  = 'Test'   '\t'   row   '\n'  # adding that state column.
    
# --- after loop ---

output_file.write(new_format_data)  # once finished with the for loops then the it will download and close.
output_file.close()

But this may have problem if some column will have , in value because split will treat it as separator. So it would be better to use standard module csv which would resolve all problems.

Something like

import csv

output_file = open('Desktop/HW3/'   state_names[counter]    '.txt','w')

# create csv writer 
output_csv = csv.writer(output_file)

for counter, url in enumerate(urls):
    print('url:', url)
    
    html = urllib.request.urlopen(url).read().decode('utf-8')  # opening url
    
    # read all rows from csv 
    rows = list(csv.reader(html.splitlines()))
                
    if counter == 0:
        
        headers = rows[0]
        
        headers[0] = "Test"   headers[0]
        headers.append('new_colum')
        
        # write headers
        output_csv.writerow(headers)
        
    for row in rows[1:]:  # get directly string instead of index
        # create new value using column 1 and 2
        new_value = row[1]   '_'   row[2]
        
        # append to row
        row.append(new_value)
        
        # write row
        output_csv.writerow(row)
    
# --- after loop ---

output_file.close()

PEP 8 -- Style Guide for Python Code

CodePudding user response:

It ended up working like this:

new_format_data = ''  # PEP8: `lower_case_names` for variables

output_file = open('Desktop/HW3_2/'   state_names[counter]    '.txt','w')

for counter, url in enumerate(urls):
    print('url:', url)
    
    html = urllib.request.urlopen(url).read().decode('utf-8')  # opening url
    
    rows = html.splitlines()  # split the data in rows. DON'T NEED `1` because I don't need `\n'
    
    if counter == 0:
       # new_format_data = "Month_Year"   '\t'   rows[0]   '\n'  # header with new column
        
        new_format_data = rows[0]   "Month_Year" '\n'  # header with new column
    for row in rows[1:]:  # get directly string instead of index
        
        # convert to list
        row = row.split('\t')
        
        # create new value using column 1 and 2
        new_value = row[2]   '_'   row[1]
        
        # append to list
        row.append(new_value)
        
        # convert back to string
        row = '\t'.join(row)
                    
        new_format_data  = row   '\n'  # adding that state column.
   
output_file.write(new_format_data)  # once finished with the for loops then the it will download and close.
output_file.close()

I want to revise, it actually was txt format not CSV. Now, i'm trying to remove a column and filter the information. So, one of the columns is "year". The origional data starts from 1976 through 2022. I only need information from 2015 till 2020. Tried a few things but i broke the rest of the code :(

  • Related