I have 30 folders (from 2021-06-01 to 2021-06-30) and in each of them I have 15 excel files. Currently I am using this code separately 30 times
file1= glob.glob('C:/Users/Dell/Downloads/2021-06-01/*')
to make a file file1
that run data processing operation for each single folder( going into 15 excel files).So that I can have file 1
to file 30
and then I concat
them to make one single file.
Is there any way to automate this process as I dont want to run this operation 30 times separately? I am not figuring out how to make a loop for this purpose of extracting file from different paths.
Also I have the data but they are zipped inside the folders (from 2021-06-01 to 2021-06-30). So, it is tedious to go there one by one and unzip them and then run the operation one by one.
How could I achieve both purpose in a easier way? I saw unzipping operation solutions by search by I cant do them while I also have to obtain the another purpose I mentioned about ( going through different folders and extract them one by one at once, making file 1
to file 30
at once)
my directories look like:
- download
-month
-2021-01-01
-AA
-file.zip
-a list of .xlsx file
-BB
-CC
-2021-01-02
-AA
-file.zip
-a list of .xlsx file
-BB
-CC
...........................................................................................................................................................................
-2021-01-30
Now I dont want to concat these xlsx file. I want to run a certain operation on these excel file one by one and then concat them .However not being able to do so.
CodePudding user response:
Here's a script in Python that should work for you:
import os
import shutil
import time
import pandas as pd
def read_csv_or_excel(f):
if f.endswith(".csv"):
df = pd.read_csv(f"{root}/{f}", sep="\t")
if f.endswith(".xlsx"):
df = pd.read_excel(f"{root}/{f}")
return df
for root, dirs, files in os.walk("./questions/69878352/"):
# print(root, dirs, files)
if root.split("/")[-1].startswith("20"):
print(root)
appended = []
dfs = []
for f in files:
if f not in appended:
print(f)
if f.endswith(".csv") or f.endswith(".xlsx"):
dfs.append(read_csv_or_excel(f))
elif f.endswith(".zip"):
shutil.unpack_archive(f"{root}/{f}", f"{root}/")
time.sleep(0.5)
f = f"{f[:-4]}.xlsx" # ← this assumes any zipped files will be Excel files...
dfs.append(read_csv_or_excel(f))
else:
continue
appended.append(f[:-4])
pd.concat(dfs).to_excel(f"{root}.xlsx"
Lmk if it doesn't work! My test data isn't the best I'd have to spend some more time making better test data to be 100% it would work, so if you have any issues, it's probably just a little tweak that's necessary to amend it