I'm using Python 3 to clean up a CSV
file that sometimes has four entries per line. For some reason the datalogger
didn't insert a new line, and this happened periodically, not sure why.
So I'm trying to remove bad characters from the CSV
and have had success, but the lines with four entries instead of two I wanted to find the delimiter and replace it with a new line.
Sounds simple enough, but I don't possess the code-fu, and I'm wondering if anyone could help. :) thanks
import csv
with open('outty1.csv', 'w', newline='') as outcsv:
writer = csv.writer(outcsv)
writer.writerow(["Date", "Temperature", "Humidity"])
text = open("temperature.csv", "r")
text = ''.join([i for i in text]) \
.replace("ÿÿ", ",")
for i in text:
if i.count(',')>1:
text.replace(",", "/n")
x = open("outty1.csv","a")
x.writelines(text)
x.close()
A sample of the temperature log before parsing it :) .
1629881977,24.27
1629882037,24.28ÿÿ1629882097,24.29
1629882157,24.31ÿÿ1629882217,23.52
1629882277,23.38ÿÿ1629882337,23.72
1629882397,23.87ÿÿ1629882457,23.92
1629882517,23.98ÿÿ1629882577,24.02
1629882637,24.08ÿÿ1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63ÿÿ1629883718,25.85
1629883778,26.08ÿÿ1629883838,26.31
1629883898,26.53ÿÿ1629883958,26.74
1629884018,26.96ÿÿ1629884078,27.12
1629884138,27.26ÿÿ1629884198,27.38
1629884258,27.48ÿÿ1629884318,27.56
1629884378,27.63ÿÿ1629884438,27.69
1629884498,27.73.
This is my progress once I run the program
Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28,1629882097,24.29
1629882157,24.31,1629882217,23.52
1629882277,23.38,1629882337,23.72
1629882397,23.87,1629882457,23.92
1629882517,23.98,1629882577,24.02
1629882637,24.08,1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63,1629883718,25.85
1629883778,26.08,1629883838,26.31
1629883898,26.53,1629883958,26.74
1629884018,26.96,1629884078,27.12
1629884138,27.26,1629884198,27.38
1629884258,27.48,1629884318,27.56
1629884378,27.63,1629884438,27.69
1629884498,27.73
and the fixed sample output, I saw the answer once I pasted the input and compared the out put LOL :)
Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28
1629882097,24.29
1629882157,24.31
1629882217,23.52
1629882277,23.38
1629882337,23.72
1629882397,23.87
1629882457,23.92
1629882517,23.98
1629882577,24.02
1629882637,24.08
1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63
1629883718,25.85
1629883778,26.08
1629883838,26.31
1629883898,26.53
1629883958,26.74
1629884018,26.96
1629884078,27.12
1629884138,27.26
1629884198,27.38
1629884258,27.48
1629884318,27.56
1629884378,27.63
1629884438,27.69
1629884498,27.73
1629884558,27.75
the old code
text = ''.join([i for i in text]) \
.replace("ÿÿ", ",")
the new code
text = ''.join([i for i in text]) \
.replace("ÿÿ", "\n")
CodePudding user response:
If you're looking for another option to consider, here's something to try:
data = """1629881977,24.27
1629882037,24.28ÿÿ1629882097,24.29
1629882757,24.15
"""
# spliting on the garbage chars has a side-effect of removing them
a = data.split('ÿÿ')
# then simply join() to reassemble the original data
b = '\n'.join(a)
Or, as a one-liner:
fixed_data = '\n'.join(data.split('ÿÿ'))