Home > Enterprise >  Reconstructing, filling the gaps in .csv using for line in, and .append in loop
Reconstructing, filling the gaps in .csv using for line in, and .append in loop

Time:04-22

Pls help im tired. cant see why how to make it work.

Problem to solve: a .csv file has to have suposedly 1sec data like:

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

but it hasn't. Some seconds are not present, so im making it with last seen data basicly reads a line:

sep = ','    
data = line.split(sep)

and data[1] to data[9] stays the same, only data[0] changes 1 second, to fill the gap:

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:01,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:02,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:03,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:06,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

i made the logic for this, just the .append makes me tricks.. the output is for every line in source csv file, it produces me same amount of rows in output csv file, but all records are with the same, last line of a source file, what the f, :

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

whyyyyyy, heres the code:

import glob
import datetime
import time
import pandas as pd

# make files sync on sec's

filenames = [i for i in glob.glob("*unique_sorted.csv")]

for filename in filenames:
    coin_name = filename[0:18] 
    print(filename)
    with open (filename, "r") as f:                     
        x = -1
        memory = {}
        memory["data"] = {}
        memory["data"]["time"] = {}
        
        new_file = []
        tolist = {}
        tolist["total"] = {}
        tolist["memory"] = {}
        sep = ','
        start = 1
        for line in f:
            data = line.split(sep)
            if data[0] != "time":
                x = x   1
                if start == 0 and data[0][0:19] != sec_1_more:
                    memory_time = datetime.datetime.strptime(tolist["memory"]['time'], "%Y-%m-%d %H:%M:%S") # data[0] from previous line
                    read_line_time = datetime.datetime.strptime(data[0][0:19], "%Y-%m-%d %H:%M:%S") # current_line
                    diff = read_line_time - sec_1_more
                    diff_sec = diff.total_seconds()
                    sec = int(diff_sec)
                    i = 1
                    while i < sec:
                        time_for_same_data = memory_time   datetime.timedelta(seconds=1) # 2:00:00   1 second
                        time_for_same_data_str = str(time_for_same_data)
                        #2022-04-03 02:00:04,1.4073,1.4073,1.4071,1.4072,1.375,1.4137077251573131,1.4242302135495926,1.4304935994973778,1.437633859477853
                        tolist["total"]['time'] = time_for_same_data_str
                        tolist["total"]['open'] = data[1]
                        tolist["total"]['high'] = data[2]
                        tolist["total"]['low'] = data[3]
                        tolist["total"]['close'] = data[4]
                        tolist["total"]['Extremum'] = data[5]
                        tolist["total"]['Fib 1'] = data[6]
                        tolist["total"]['Fib 2'] = data[7]
                        tolist["total"]['Fib 3'] = data[8]
                        tolist["total"]['l100'] = data[9].strip()
                        new_file.append(tolist["total"])
                        memory_time = memory_time   datetime.timedelta(seconds=1)
                        i = i   1
                    tolist["total"]['time'] = data[0]
                    tolist["total"]['open'] = data[1]
                    tolist["total"]['high'] = data[2]
                    tolist["total"]['low'] = data[3]
                    tolist["total"]['close'] = data[4]
                    tolist["total"]['Extremum'] = data[5]
                    tolist["total"]['Fib 1'] = data[6]
                    tolist["total"]['Fib 2'] = data[7]
                    tolist["total"]['Fib 3'] = data[8]
                    tolist["total"]['l100'] = data[9].strip()
                    new_file.append(tolist["total"])
                    
                elif start == 0 and data[0][0:19] == sec_1_more:
                    tolist["total"]['time'] = data[0]
                    tolist["total"]['open'] = data[1]
                    tolist["total"]['high'] = data[2]
                    tolist["total"]['low'] = data[3]
                    tolist["total"]['close'] = data[4]
                    tolist["total"]['Extremum'] = data[5]
                    tolist["total"]['Fib 1'] = data[6]
                    tolist["total"]['Fib 2'] = data[7]
                    tolist["total"]['Fib 3'] = data[8]
                    tolist["total"]['l100'] = data[9].strip()
                    new_file.append(tolist["total"])
                    

                memory["data"]["data"] = str(line)
                memory["data"]["time"] = str(data[0][0:19]) 
                
                tolist["memory"]['time'] = data[0]
                tolist["memory"]['open'] = data[1]
                tolist["memory"]['high'] = data[2]
                tolist["memory"]['low'] = data[3]
                tolist["memory"]['close'] = data[4]
                tolist["memory"]['Extremum'] = data[5]
                tolist["memory"]['Fib 1'] = data[6]
                tolist["memory"]['Fib 2'] = data[7]
                tolist["memory"]['Fib 3'] = data[8]
                tolist["memory"]['l100'] = data[9].strip()
                
                #t = "2022-04-03 02:00:04"
                t = datetime.datetime.strptime(memory["data"]["time"], "%Y-%m-%d %H:%M:%S")
                #sec_1_more = (t   datetime.timedelta(seconds=1)).strftime("%Y-%m-%d %H:%M:%S")
                #or
                sec_1_more = t   datetime.timedelta(seconds=1)
                
                if start == 1:
                    new_file.append(tolist["memory"])
                
                start = 0
                
                if x == 10:
                    #print(new_file)
                    #quit()
                    break # for test to see only 10 first
    f.close() # needed      
    csvData = pd.DataFrame(new_file)
    csvData.to_csv(coin_name "_unique_sorted_synced.csv", mode="w", index=False)                            
    quit() # coz just one file processing for testing

CodePudding user response:

Looks like you can just read in the data, and use asfreq:


# instead of read_clipboard, you'd read it with pd.read_csv
df = pd.read_clipboard(sep=',', parse_dates = ['time']) 

df.set_index('time').asfreq(freq='1S').ffill()

                      open   high    low  close  Extremum     Fib 1     Fib 2     Fib 3     l100
time
2022-04-03 02:00:00  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:01  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:02  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:03  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:04  3.294  3.295  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:05  3.293  3.293  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:06  3.293  3.293  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:07  3.293  3.293  3.293  3.293     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:08  3.293  3.293  3.293  3.293     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:09  3.292  3.292  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
  • Related