I currently have 253 csv files with the following filenames:
1. ALICEUSDT-1h-2021-06-01.csv
2. ALICEUSDT-1h-2021-06-02.csv
3. ALICEUSDT-1h-2021-06-03.csv
4. ALICEUSDT-1h-2021-06-06.csv
5. ALICEUSDT-1h-2021-06-09.csv
6. ALICEUSDT-1h-2021-06-11.csv
7. ALICEUSDT-1h-2021-06-12.csv
.
.
.
253. ALICEUSDT-1h-2022-02-13.csv
located at this path: C:\Users\StoreX\AliceUSDT-Mark_Prices_Klines_1h_Timeframe
Each of those files contains the hourly price action data for a particular asset, having in total 24 rows (without column names nor indexes)
Any of those dataframes look like this:
1623369600000 5.53548967 5.8 5.4473946 5.4719309 0 1623373199999 0 3600 0 0 0
1623373200000 5.47278238 5.53195431 5.37178569 5.41425374 0 1623376799999 0 3600 0 0 0
1623376800000 5.41425372 5.439 5.30612058 5.43896918 0 1623380399999 0 3597 0 0 0
1623380400000 5.439 5.573 5.439 5.51534579 0 1623383999999 0 3600 0 0 0
1623384000000 5.51534611 5.75008683 5.46536508 5.699 0 1623387599999 0 3600 0 0 0
1623387600000 5.699 5.7319193 5.59313511 5.63882188 0 1623391199999 0 3582 0 0 0
1623391200000 5.63892904 5.65847764 5.50099143 5.52530541 0 1623394799999 0 3600 0 0 0
1623394800000 5.52201075 5.69104913 5.49124857 5.658 0 1623398399999 0 3600 0 0 0
1623398400000 5.6554082 6.336 5.56700781 6.2605508 0 1623401999999 0 3600 0 0 0
1623402000000 6.26056029 6.57769402 6.26056029 6.392 0 1623405599999 0 3600 0 0 0
1623405600000 6.3935684 6.68259312 6.30951189 6.582 0 1623409199999 0 3600 0 0 0
1623409200000 6.58 6.633 6.41970848 6.55150417 0 1623412799999 0 3600 0 0 0
1623412800000 6.55051699 6.56212633 6.34272711 6.44329682 0 1623416399999 0 3600 0 0 0
1623416400000 6.43806578 6.63709121 6.40735932 6.49129756 0 1623419999999 0 3600 0 0 0
1623420000000 6.49097939 6.498 6.19692729 6.25924944 0 1623423599999 0 3600 0 0 0
1623423600000 6.25924944 6.2822904 6.05089683 6.13788357 0 1623427199999 0 3600 0 0 0
1623427200000 6.13764793 6.228 6.005 6.015432 0 1623430799999 0 3600 0 0 0
1623430800000 6.015 6.198 5.97772068 6.19544081 0 1623434399999 0 3600 0 0 0
1623434400000 6.198 6.349 6.13209062 6.2308081 0 1623437999999 0 3600 0 0 0
1623438000000 6.23083904 6.42623752 6.201 6.27776139 0 1623441599999 0 3600 0 0 0
1623441600000 6.27776137 6.31281154 6.02227423 6.036 0 1623445199999 0 3600 0 0 0
1623445200000 6.033 6.185 5.90345962 6.119 0 1623448799999 0 3600 0 0 0
1623448800000 6.119 6.298 6.047 6.20634564 0 1623452399999 0 3600 0 0 0
1623452400000 6.20606916 6.20606916 5.9779765 6.05824449 0 1623455999999 0 3600 0 0 0
The Unix Timestamp is used to set the exact date at which the price was opened (2nd column), and also to set the exact date at which it was closed (5th column), the columns with 0 and ≈3600 values do not matter.
The 2nd column corresponds to the Open price
The 3rd column corresponds to the High price
The 4th column corresponds to the Low price
The 5th column corresponds to the Close price
I wanted to export a big df with all of the rows from these files, and to do so, I wrote the following code:
import pandas as pd
import os
def check_path(infile):
return os.path.exists(infile)
first_entry = input('Tell me the path where your csv files are located at: ')
while True:
if check_path(first_entry) == False:
print('\n')
print('This PATH is invalid!')
first_entry = input('Tell me the RIGHT PATH in which your csv files are located: ')
elif check_path(first_entry) == True:
print('\n')
final_output = first_entry
break
big_df = pd.DataFrame()
for name in os.listdir(first_entry):
if name.endswith(".csv"):
print(name)
current_df_path = first_entry "\\" name
df = pd.read_csv(current_df_path, header=None)
big_df.append(df)
print(big_df)
print("")
print("the final df is the following")
print("")
print(big_df)
However, it didn't work at all after run, the output was the following:
.
.
.
ALICEUSDT-1h-2022-02-12.csv Empty DataFrame Columns: [] Index: [] ALICEUSDT-1h-2022-02-13.csv Empty DataFrame Columns: [] Index: []
the final df is the following
Empty DataFrame Columns: [] Index: []
So, I would like to know what did I do wrong? Why isn't the big_df
actually full of information?
And finally, how could the code above improved to do what's demanded?
CodePudding user response:
Append returns the dataframe so you would need to write:
big_df = big_df.append(df)
you could add ignore_index=True to reset the index
Also the default separator for read_csv is a comma so indicate the one used in your file like sep='\s ' if spaces, '\t' if tab, ...
You may want to start with just loading one file in a dataframe to verify that the filepath is correct and everything work like expected.