I have thousands of csv files under uncategorised parent folder, all the files have one date column containing same date for all the rows. I want to check the date value of each files and move/copy to month wise folder using python.
I have tried key = df.iloc[0]['Date'] but not able to use key.endswith or key.contains
CodePudding user response:
Here I am looping through the files, reading the first row of the date column. I have created new folders in the directory previously, with the names being each month of the year. Once I have read the date, I convert it to words (e.g. April, May). I then look through folders in the directory and is the date name and folder name match, I move the file to the folder.
import os
import pandas as pd
import datetime
files = os.listdir()
for file in files:
if ".csv" in file:
df = pd.read_csv(file)
dates = df['date']
date = dates[0]
date = datetime.datetime.strptime(date, "%d/%m/%y")
date = date.strftime("%B")
for folder in files:
if date.lower() == folder.lower():
os.rename(file, folder "\\" file)
CodePudding user response:
If all CSV files have the same structure, for example:
date, col_1, col_2, ..., col_n
20210923, 123, 456, ..., 999
You can try:
from os import mkdir
from os.path import join, exists, isdir
from glob import glob
from shutil import move
from datetime import datetime
import csv
FOLDER_CSV = 'data/csv'
FOLDER_MONTHS = 'data/csv_by_month'
DATE_FORMAT = '%Y%m%d'
if not exists(FOLDER_MONTHS):
mkdir(FOLDER_MONTHS)
for file in glob(join(FOLDER_CSV, '*.csv')):
with open(file, newline='') as ftx:
reader = csv.reader(ftx)
row = reader.__next__()
row = reader.__next__()
# Replaces the row[0] Index for index in the date column in the CSV
month = datetime.strptime(row[0], DATE_FORMAT).strftime('%B').lower()
if not exists(join(FOLDER_MONTHS, month)):
mkdir(join(FOLDER_MONTHS, month))
try:
if isdir(join(FOLDER_MONTHS, month)):
move(file, join(FOLDER_MONTHS, month))
except OSError:
print(f'{file} file is already in the {month} directory.')