Home > Blockchain >  Extracting multiple Excel files from different paths in python at once while they are zipped in vari
Extracting multiple Excel files from different paths in python at once while they are zipped in vari

Time:11-09

I have 30 folders (from 2021-06-01 to 2021-06-30) and in each of them I have 15 excel files. Currently I am using this code separately 30 times

file1= glob.glob('C:/Users/Dell/Downloads/2021-06-01/*')

to make a file file1 that run data processing operation for each single folder( going into 15 excel files).So that I can have file 1 to file 30 and then I concat them to make one single file. Is there any way to automate this process as I dont want to run this operation 30 times separately? I am not figuring out how to make a loop for this purpose of extracting file from different paths. Also I have the data but they are zipped inside the folders (from 2021-06-01 to 2021-06-30). So, it is tedious to go there one by one and unzip them and then run the operation one by one. How could I achieve both purpose in a easier way? I saw unzipping operation solutions by search by I cant do them while I also have to obtain the another purpose I mentioned about ( going through different folders and extract them one by one at once, making file 1 to file 30 at once) my directories look like:

- download                                                                                                                           
       -month                                                                                                                                
                 -2021-01-01 
                     -AA                                                                                                         
                          -file.zip                                                                                                                                       
                              -a list of .xlsx file 
                     -BB
                     -CC                                                                                  
                 -2021-01-02 
                     -AA                                                                                                          
                          -file.zip                                                                                                                                     
                               -a list of .xlsx file                                                                                            
                     -BB
                     -CC
 ........................................................................................................................................................................... 
                 -2021-01-30        

Now I dont want to concat these xlsx file. I want to run a certain operation on these excel file one by one and then concat them .However not being able to do so.

CodePudding user response:

Here's a script in Python that should work for you:

import os
import shutil
import time

import pandas as pd


def read_csv_or_excel(f):
    if f.endswith(".csv"):
        df = pd.read_csv(f"{root}/{f}", sep="\t")
    if f.endswith(".xlsx"):
        df = pd.read_excel(f"{root}/{f}")
    return df


for root, dirs, files in os.walk("./questions/69878352/"):
    #     print(root, dirs, files)
    if root.split("/")[-1].startswith("20"):
        print(root)
        appended = []
        dfs = []
        for f in files:
            if f not in appended:
                print(f)
                if f.endswith(".csv") or f.endswith(".xlsx"):
                    dfs.append(read_csv_or_excel(f))
                elif f.endswith(".zip"):
                    shutil.unpack_archive(f"{root}/{f}", f"{root}/")
                    time.sleep(0.5)
                    f = f"{f[:-4]}.xlsx" # ← this assumes any zipped files will be Excel files...
                    dfs.append(read_csv_or_excel(f))
                else:
                    continue
                appended.append(f[:-4])
        pd.concat(dfs).to_excel(f"{root}.xlsx"

Lmk if it doesn't work! My test data isn't the best I'd have to spend some more time making better test data to be 100% it would work, so if you have any issues, it's probably just a little tweak that's necessary to amend it

  • Related