Home > Net >  Loading csv files to sql database by date with pandas
Loading csv files to sql database by date with pandas

Time:11-07

I have list of csv files named by dates, for example:

2020-01-01, 2020-01-02... 2020-01-31

and i need to load them in sqlite database in batch. But i cant figure how.

I have a script:

file_name = '2020-01-{}.csv'
test = pd.concat([pd.read_csv(file_name.format(i)) for i in range(1, 9)])
test.to_sql('test', conn, if_exists='append', index=False)

But to make it work, i have manually add first number of day, for example:

file_name = '2020-01-0{}.csv', file_name = '2020-01-1{}.csv', file_name = '2020-01-2{}.csv'...

So, my question is, how can i add all of the files to database, just giving years and month ? And maybe it's possible to make it reusable for future data as well. If i add new csv, code takes only new csv and add it to db. Iam working with pandas and sqlite.

CodePudding user response:

You can do this process by following the steps

  1. Make a list of all the files
  2. Run a loop and add them into db

So the working solution should be like

import pandas as pd


def read_csv_file(file_name) -> pd.DataFrame:
    """Read csv file and return a dataframe"""
    df = pd.read_csv(file_name)
    return df


def get_all_csv_files() -> list:
    """Get all csv files in the directory"""
    import glob

    files = glob.glob("*.csv")

    return files  # Output : ['2020-01-01.csv', ... '2020-01-31.csv']


for file in get_all_csv_files():
    """Loop through all CSV files and do something"""
    df = read_csv_file(file)
    print(df)
    # Do something with the dataframe here like adding into the database

    # To move them into another directory/folder Create a folder "Processed" in the same dir
    import shutil

    shutil.move(file, "Processed")
  • Related