Home > Software engineering >  Dataframe stores duplicate header
Dataframe stores duplicate header

Time:06-27

I have a function that shows the files by size, but when I try to store the results in a dataframe, I get every result in a line with an index=0 the code of the function is:

def show_folders_by_size(r):
    size = 0
    calcul_size=[]
    path=[]
    for ele in os.scandir(r):
     size =os.path.getsize(ele)
    size=size//1048576
    calcul_size.append(str(size) " Mb")
    path.append(r)
    e = {'chemin du dossier' : path, 'taille par Mb' : size}
    df_dev_size = pd.DataFrame(e)
    print(path,calcul_size)

In the main:

months = time.time()-(3*2628000)
for root, dirs, files in os.walk(root):
for dir in dirs:
if ("back" in dir):
        path_back=os.path.join(root,dir)
        remove_files_by_modification_time(path_back)    
    elif ("dev" in dir):
        path_dev=os.path.join(root,dir)
        show_folders_by_size(path_dev)

This is the result I get:

['C:\\Users\\mbarrech\\Desktop\\dossier_test\\dev'] ['360 Mb']
['C:\\Users\\mbarrech\\Desktop\\dossier_test\\back_up\\dev'] ['360 Mb']
['C:\\Users\\mbarrech\\Desktop\\dossier_test\\dev\\back\\dev'] ['0 Mb']

Can any one help me?

CodePudding user response:

I dont think that you need to create a Dataframe for every iteration. Store the results in an array and turn them into a single Dataframe afterwards like so :

results = []
def show_folders_by_size(r):
    // ... your function logic 
    results.append(dict(path=path, size=size))

And once your function is done running :

resultsDf = pd.DataFrame(results)
print(resultsDf)

Cheers

  • Related