Extracting multiple Excel files from different paths in python at once while they are zipped in vari-CodePudding

I have 30 folders (from 2021-06-01 to 2021-06-30) and in each of them I have 15 excel files. Currently I am using this code separately 30 times

file1= glob.glob('C:/Users/Dell/Downloads/2021-06-01/*')

to make a file file1 that run data processing operation for each single folder( going into 15 excel files).So that I can have file 1 to file 30 and then I concat them to make one single file. Is there any way to automate this process as I dont want to run this operation 30 times separately? I am not figuring out how to make a loop for this purpose of extracting file from different paths. Also I have the data but they are zipped inside the folders (from 2021-06-01 to 2021-06-30). So, it is tedious to go there one by one and unzip them and then run the operation one by one. How could I achieve both purpose in a easier way? I saw unzipping operation solutions by search by I cant do them while I also have to obtain the another purpose I mentioned about ( going through different folders and extract them one by one at once, making file 1 to file 30 at once) my directories look like:

- download                                                                                                                           
       -month                                                                                                                                
                 -2021-01-01 
                     -AA                                                                                                         
                          -file.zip                                                                                                                                       
                              -a list of .xlsx file 
                     -BB
                     -CC                                                                                  
                 -2021-01-02 
                     -AA                                                                                                          
                          -file.zip                                                                                                                                     
                               -a list of .xlsx file                                                                                            
                     -BB
                     -CC
 ........................................................................................................................................................................... 
                 -2021-01-30

Now I dont want to concat these xlsx file. I want to run a certain operation on these excel file one by one and then concat them .However not being able to do so.

CodePudding user response：

Here's a script in Python that should work for you:

import os
import shutil
import time

import pandas as pd


def read_csv_or_excel(f):
    if f.endswith(".csv"):
        df = pd.read_csv(f"{root}/{f}", sep="\t")
    if f.endswith(".xlsx"):
        df = pd.read_excel(f"{root}/{f}")
    return df


for root, dirs, files in os.walk("./questions/69878352/"):
    #     print(root, dirs, files)
    if root.split("/")[-1].startswith("20"):
        print(root)
        appended = []
        dfs = []
        for f in files:
            if f not in appended:
                print(f)
                if f.endswith(".csv") or f.endswith(".xlsx"):
                    dfs.append(read_csv_or_excel(f))
                elif f.endswith(".zip"):
                    shutil.unpack_archive(f"{root}/{f}", f"{root}/")
                    time.sleep(0.5)
                    f = f"{f[:-4]}.xlsx" # ← this assumes any zipped files will be Excel files...
                    dfs.append(read_csv_or_excel(f))
                else:
                    continue
                appended.append(f[:-4])
        pd.concat(dfs).to_excel(f"{root}.xlsx"

Lmk if it doesn't work! My test data isn't the best I'd have to spend some more time making better test data to be 100% it would work, so if you have any issues, it's probably just a little tweak that's necessary to amend it