Home > Net >  Read a txt file, convert nth column, save to separate file
Read a txt file, convert nth column, save to separate file

Time:04-16

Given a csv file named 'myFile.csv' with a format like this:

1,20,2
2,12,3
3,43,2
4,12,3

I'd like to read in that file, multiple the number in the second column by 3, then save the updated data frame to a new file.

To give an idea of the workflow, I have tried this but get errors:

f = open('myFile.csv', 'r ') 
     lines=f.readlines()
     for x in lines:
       x.split(',')[1]=x.split(',')[1] * 3

f.close

#I'm assuming the second column in "lines" has now been updated (x3)

f2=open('newFile.csv', 'a')
f2.write(lines)
f2.close

CodePudding user response:

Using the csv module

import csv

lines = []
with open('myFile.csv', 'r ') as f1:
    reader = csv.reader(f1, delimiter=',')
    for row in reader:
        lines.append(row)
f1.close()

for x in lines:
    x[1] = str(int(x[1]) * 3)

lines = [",".join(line) for line in lines]

f2 = open('newFile.csv', 'a')
f2.write("\n".join(lines))
f2.close()

First, all lines of the csv file are converted to arrays and saved in the array lines.

Then, the second element of each line is converted to an int, multiplied by 3, and converted back to a str, so it can be written to newFile later.

Each line is converted from an array back to a comma-separated String.

Lastly, we convert the array lines to a String, separated with newlines, and write it to newFile.csv

CodePudding user response:

Disclaimer

While this code achieves your desired effect, I recommend writing to a separate file. Alternatively, make a copy of your input file first, then write to this file. This is recommended because otherwise small coding errors can make changes to your input file in a way that makes your data unrecoverable. The assumption in my code is that you have made a copy of your input file prior.

End of Disclaimer

The csv module is very well-suited for your problem. First we open the file for reading and writing (specified with the 'r ' option). We use csv.reader(), which returns a reader object that allows us to iterate over lines of the csv file. Applying list() to this iterable (i.e. the reader object) returns a list of lists where each inner list represents a line from the file. For example, one of these inner lists in your case will be ['1', '20', '2']. Next, we modify the second element of each inner list. Finally, we write that back out to the file. Before doing so, we ensure that the file position is at the beginning of the file which is what f.seek(0) achieves. This is so that we start writing from the beginning. writerows() writes each line in lines to the file.

import csv

with open('blankpaper.txt', 'r ') as f:
    # returns reader object that allows iterating over lines of file
    reader = csv.reader(f)
    # list of lists such that each inner list represents a line from the file
    lines = list(reader)
    
    # modify second column values 
    for line in lines:
        line[1] = int(line[1]) * 3

    # ensure position at start of file before writing
    f.seek(0) 

    # returns writer object with useful methods for writing to files
    writer = csv.writer(f)
    # writes each line to the file
    writer.writerows(lines)

Output

1,60,2
2,36,3
3,129,2
4,36,3
  • Related