How to replace "00" with "N/A" skipping first row and first column in python-CodePudding

I'm working with GWAS data which is of 2Million columns and 522 rows. Here I need to replace "00" with "N/A" over data. Since I have a huge file I'm using the open_reader method. can anyone please help

Note: Need to skip the first row and first column

sample data:

ID,kgp11270025,kgp570033,rs707,kgp7500
1,CT,GT,CA,00
200,00,TG,00,GT
300,AA,00,CG,AA
400,GG,CC,AA,TA

Desired Output:

ID,kgp11270025,kgp570033,rs707,kgp7500
1,CT,GT,CA,N/A
200,N/A,TG,N/A,GT
300,AA,N/A,CG,AA
400,GG,CC,AA,TA

The code I written:

import re

input_file = "test.csv"
output_file = "testresult.csv"

# print("Processing data from", input_file)
with open(input_file) as f:
    lineno = 0
    for line in f:
        lineno = lineno   1
        if (lineno == 1):
            #need to skip first line
            # print("Skipping line 1 which is a header")
            print(line.rstrip())
        else:
            # print("Processing line {}".format(lineno))
            line = re.sub(r',00', ',N/A', line.rstrip())
            print(line)
    # print("Processed {} lines".format(lineno))

I have tried this but not working, please help!!

CodePudding user response：

when I use print(line), its showing fine

Then just use file keyword argument of print as follows

import re

input_file = "test.csv"
output_file = "testresult.csv"

# print("Processing data from", input_file)
with open(input_file) as f, open(output_file, "w") as g:
    lineno = 0
    for line in f:
        lineno = lineno   1
        if (lineno == 1):
            #need to skip first line
            # print("Skipping line 1 which is a header")
            print(line.rstrip(),file=g)
        else:
            # print("Processing line {}".format(lineno))
            line = re.sub(r',00', ',N/A', line.rstrip())
            print(line,file=g)
    # print("Processed {} lines".format(lineno))

Note that whilst opening input file name only is sufficient as default mode is read-text, but specyfing writing mode (w) is required for output file.

CodePudding user response：

You could use pandas to do this easily:

import pandas as pd
df = pd.read('test.csv', dtype = str)
df = df.replace('00', 'N/A')
df.to_csv('test-result.csv', index = False)

For very large CSV files, you could do this:

header = True
for chunk in pd.read_csv('test.csv', chunksize = your-chunk-size, type = str):
    chunk = chunk.replace('00', 'N/A')
    chunk.to_csv('test-result.csv', index = False, header = header, mode = 'a')
    header = False