Home > Software engineering >  Remove excess delimiter from CSV file
Remove excess delimiter from CSV file

Time:12-15

I'm using Python 3 to clean up a CSV file that sometimes has four entries per line. For some reason the datalogger didn't insert a new line, and this happened periodically, not sure why.

So I'm trying to remove bad characters from the CSV and have had success, but the lines with four entries instead of two I wanted to find the delimiter and replace it with a new line.

Sounds simple enough, but I don't possess the code-fu, and I'm wondering if anyone could help. :) thanks

import csv 


with open('outty1.csv', 'w', newline='') as outcsv:
    writer = csv.writer(outcsv)
    writer.writerow(["Date", "Temperature", "Humidity"])

text = open("temperature.csv", "r")
text = ''.join([i for i in text]) \
    .replace("ÿÿ", ",")


for i in text:
    if i.count(',')>1:
        text.replace(",", "/n")

x = open("outty1.csv","a")
x.writelines(text)
x.close()

A sample of the temperature log before parsing it :) .

1629881977,24.27
1629882037,24.28ÿÿ1629882097,24.29
1629882157,24.31ÿÿ1629882217,23.52
1629882277,23.38ÿÿ1629882337,23.72
1629882397,23.87ÿÿ1629882457,23.92
1629882517,23.98ÿÿ1629882577,24.02
1629882637,24.08ÿÿ1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63ÿÿ1629883718,25.85
1629883778,26.08ÿÿ1629883838,26.31
1629883898,26.53ÿÿ1629883958,26.74
1629884018,26.96ÿÿ1629884078,27.12
1629884138,27.26ÿÿ1629884198,27.38
1629884258,27.48ÿÿ1629884318,27.56
1629884378,27.63ÿÿ1629884438,27.69
1629884498,27.73.

This is my progress once I run the program

Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28,1629882097,24.29
1629882157,24.31,1629882217,23.52
1629882277,23.38,1629882337,23.72
1629882397,23.87,1629882457,23.92
1629882517,23.98,1629882577,24.02
1629882637,24.08,1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63,1629883718,25.85
1629883778,26.08,1629883838,26.31
1629883898,26.53,1629883958,26.74
1629884018,26.96,1629884078,27.12
1629884138,27.26,1629884198,27.38
1629884258,27.48,1629884318,27.56
1629884378,27.63,1629884438,27.69
1629884498,27.73

and the fixed sample output, I saw the answer once I pasted the input and compared the out put LOL :)

Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28
1629882097,24.29
1629882157,24.31
1629882217,23.52
1629882277,23.38
1629882337,23.72
1629882397,23.87
1629882457,23.92
1629882517,23.98
1629882577,24.02
1629882637,24.08
1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63
1629883718,25.85
1629883778,26.08
1629883838,26.31
1629883898,26.53
1629883958,26.74
1629884018,26.96
1629884078,27.12
1629884138,27.26
1629884198,27.38
1629884258,27.48
1629884318,27.56
1629884378,27.63
1629884438,27.69
1629884498,27.73
1629884558,27.75

the old code

text = ''.join([i for i in text]) \
    .replace("ÿÿ", ",")

the new code

text = ''.join([i for i in text]) \
    .replace("ÿÿ", "\n")

CodePudding user response:

If you're looking for another option to consider, here's something to try:

data = """1629881977,24.27
1629882037,24.28ÿÿ1629882097,24.29
1629882757,24.15
"""

# spliting on the garbage chars has a side-effect of removing them
a = data.split('ÿÿ')

# then simply join() to reassemble the original data
b = '\n'.join(a)

Or, as a one-liner:

fixed_data = '\n'.join(data.split('ÿÿ'))
  • Related