Home > OS >  Need help in appending dataframe
Need help in appending dataframe

Time:08-10

I'm trying to append 3 files together. log1 contains 5441 rows, log2 contains 1003 rows log3 contains 2137 rows. When I run the programme, it has no error. But, it only append only one log and not all 3.

Supposedly the total rows after appended are 8581 rows. But, it only append 5441 rows.

This is what I did :

paths = []
thisdir = '/content/drive/Shareddrives/SNC - All/6 - Colab/HiVisionEvent'
filecount=0

for root, dirs, files in os.walk(thisdir):
    for file in files:
        if file.endswith(".csv"):
             s = os.path.join(root, file)
             paths.append(s)
             filecount  =1

    print("Total files : ", filecount)

    all_data = pd.DataFrame()

    for files in paths:
        df = pd.read_csv(files,header=None, sep=';')
        all_data = all_data.append(df,ignore_index=True)

    #add column headers  
    df.columns = ['Log No.','Safety Info','Status','DateTime','Delete','Loc','Property','Property Status']

CodePudding user response:

Do not use df.append as mentioned by the comment from fsimonjetz, it is deprecated. Use pd.concat instead.

all_data = pd.concat([all_data,df], axis=0, ignore_index=True) should give you what you are looking for, if you are looking to simply 'append' the dataframe df one after the other.

Reference: https://pandas.pydata.org/docs/reference/api/pandas.concat.html

CodePudding user response:

You can do it simpler with pandas.concat:

import os
import pandas as pd

paths = []
thisdir = '/content/drive/Shareddrives/SNC - All/6 - Colab/HiVisionEvent'

for root, dirs, files in os.walk(thisdir):
    for file in files:
        if file.endswith(".csv"):
             s = os.path.join(root, file)
             paths.append(s)

    print("Total files : ", len(paths))

    dfs = [ pd.read_csv(file, header=None, sep=';') for file in paths ]

    df = pd.concat(dfs)

    #add column headers  
    df.columns = ['Log No.','Safety Info','Status','DateTime','Delete','Loc','Property','Property Status']

Notice I eliminated filecount (it should be the lengths of paths, right?). I also renamed files to file in the for loop.

That should work.

  • Related