How to correctly append rows from a df to another based on timestamps on Python? Pandas related-CodePudding

I currently have 253 csv files with the following filenames:

1. ALICEUSDT-1h-2021-06-01.csv
2. ALICEUSDT-1h-2021-06-02.csv
3. ALICEUSDT-1h-2021-06-03.csv
4. ALICEUSDT-1h-2021-06-06.csv
5. ALICEUSDT-1h-2021-06-09.csv
6. ALICEUSDT-1h-2021-06-11.csv
7. ALICEUSDT-1h-2021-06-12.csv
.
.
.
253. ALICEUSDT-1h-2022-02-13.csv

located at this path: C:\Users\StoreX\AliceUSDT-Mark_Prices_Klines_1h_Timeframe

Each of those files contains the hourly price action data for a particular asset, having in total 24 rows (without column names nor indexes)

Any of those dataframes look like this:

    1623369600000   5.53548967  5.8         5.4473946   5.4719309   0   1623373199999   0   3600    0   0   0
    1623373200000   5.47278238  5.53195431  5.37178569  5.41425374  0   1623376799999   0   3600    0   0   0
    1623376800000   5.41425372  5.439       5.30612058  5.43896918  0   1623380399999   0   3597    0   0   0
    1623380400000   5.439       5.573       5.439       5.51534579  0   1623383999999   0   3600    0   0   0
    1623384000000   5.51534611  5.75008683  5.46536508  5.699       0   1623387599999   0   3600    0   0   0
    1623387600000   5.699       5.7319193   5.59313511  5.63882188  0   1623391199999   0   3582    0   0   0
    1623391200000   5.63892904  5.65847764  5.50099143  5.52530541  0   1623394799999   0   3600    0   0   0
    1623394800000   5.52201075  5.69104913  5.49124857  5.658       0   1623398399999   0   3600    0   0   0
    1623398400000   5.6554082   6.336       5.56700781  6.2605508   0   1623401999999   0   3600    0   0   0
    1623402000000   6.26056029  6.57769402  6.26056029  6.392       0   1623405599999   0   3600    0   0   0
    1623405600000   6.3935684   6.68259312  6.30951189  6.582       0   1623409199999   0   3600    0   0   0
    1623409200000   6.58        6.633       6.41970848  6.55150417  0   1623412799999   0   3600    0   0   0
    1623412800000   6.55051699  6.56212633  6.34272711  6.44329682  0   1623416399999   0   3600    0   0   0
    1623416400000   6.43806578  6.63709121  6.40735932  6.49129756  0   1623419999999   0   3600    0   0   0
    1623420000000   6.49097939  6.498       6.19692729  6.25924944  0   1623423599999   0   3600    0   0   0
    1623423600000   6.25924944  6.2822904   6.05089683  6.13788357  0   1623427199999   0   3600    0   0   0
    1623427200000   6.13764793  6.228       6.005       6.015432    0   1623430799999   0   3600    0   0   0
    1623430800000   6.015       6.198       5.97772068  6.19544081  0   1623434399999   0   3600    0   0   0
    1623434400000   6.198       6.349       6.13209062  6.2308081   0   1623437999999   0   3600    0   0   0
    1623438000000   6.23083904  6.42623752  6.201       6.27776139  0   1623441599999   0   3600    0   0   0
    1623441600000   6.27776137  6.31281154  6.02227423  6.036       0   1623445199999   0   3600    0   0   0
    1623445200000   6.033       6.185       5.90345962  6.119       0   1623448799999   0   3600    0   0   0
    1623448800000   6.119       6.298       6.047       6.20634564  0   1623452399999   0   3600    0   0   0
    1623452400000   6.20606916  6.20606916  5.9779765   6.05824449  0   1623455999999   0   3600    0   0   0

The Unix Timestamp is used to set the exact date at which the price was opened (2nd column), and also to set the exact date at which it was closed (5th column), the columns with 0 and ≈3600 values do not matter.

The 2nd column corresponds to the Open price

The 3rd column corresponds to the High price

The 4th column corresponds to the Low price

The 5th column corresponds to the Close price

I wanted to export a big df with all of the rows from these files, and to do so, I wrote the following code:

import pandas as pd
import os

def check_path(infile):
    return os.path.exists(infile)   

first_entry = input('Tell me the path where your csv files are located at: ')

while True:
    
    if check_path(first_entry) == False:
        print('\n')
        print('This PATH is invalid!')
        first_entry = input('Tell me the RIGHT PATH in which your csv files are located: ')
        
    elif check_path(first_entry) == True:
        print('\n')
        final_output = first_entry
        break

big_df = pd.DataFrame() 
for name in os.listdir(first_entry):
    if name.endswith(".csv"):
        print(name)
        current_df_path = first_entry "\\" name
        df = pd.read_csv(current_df_path, header=None)
        big_df.append(df)
        print(big_df)
print("")        
print("the final df is the following")
print("")
print(big_df)

However, it didn't work at all after run, the output was the following:

.

.

.

ALICEUSDT-1h-2022-02-12.csv Empty DataFrame Columns: [] Index: [] ALICEUSDT-1h-2022-02-13.csv Empty DataFrame Columns: [] Index: []

the final df is the following

Empty DataFrame Columns: [] Index: []

So, I would like to know what did I do wrong? Why isn't the big_df actually full of information?

And finally, how could the code above improved to do what's demanded?

CodePudding user response：

Append returns the dataframe so you would need to write:

big_df = big_df.append(df)

you could add ignore_index=True to reset the index

Also the default separator for read_csv is a comma so indicate the one used in your file like sep='\s ' if spaces, '\t' if tab, ...

You may want to start with just loading one file in a dataframe to verify that the filepath is correct and everything work like expected.