I have list of CSV files named by date like :
2020-01-01, 2020-01-02... 2020-01-31
I need to import them into an SQLite database in batch. My script:
file_name = '2020-01-{}.csv'
test = pd.concat([pd.read_csv(file_name.format(i)) for i in range(1, 9)])
test.to_sql('test', conn, if_exists='append', index=False)
But I have to manually add first number of day:
file_name = '2020-01-0{}.csv', file_name = '2020-01-1{}.csv', file_name = '2020-01-2{}.csv'...
How can I import all files into database, just giving years and month, making it reusable (so that if I add a new CSV file, code only imports new file)?
CodePudding user response:
You can do this process by following the steps
- Make a list of all the files
- Run a loop and add them into db
So the working solution should be like
import pandas as pd
def read_csv_file(file_name) -> pd.DataFrame:
"""Read csv file and return a dataframe"""
df = pd.read_csv(file_name)
return df
def get_all_csv_files() -> list:
"""Get all csv files in the directory"""
import glob
files = glob.glob("*.csv")
return files # Output : ['2020-01-01.csv', ... '2020-01-31.csv']
for file in get_all_csv_files():
"""Loop through all CSV files and do something"""
df = read_csv_file(file)
print(df)
# Do something with the dataframe here like adding into the database
# To move them into another directory/folder Create a folder "Processed" in the same dir
import shutil
shutil.move(file, "Processed")