Home > Software engineering >  csv_readout folder in wrong series
csv_readout folder in wrong series

Time:09-28

I want to read out data from different files in one folder. The files have the names: "1.csv", "2.csv", "3.csv" ... "96.csv". But instead of reading them in from the top to the bottom, it reads in "1.csv", "10.csv", "11.csv"... "2.csv", "21.csv". Anyone knows how to fix this problem?

Thanks!

def csv_readout_folder(path):    
    os.chdir(path)     
    files = glob.glob(path  '/' '*.csv')
    all_data = pd.DataFrame()

    for f in files:
        data = csv_readout(path,f)
    
        all_data = pd.concat([all_data, data])
           
    return all_data

CodePudding user response:

In you code for f in files: should read the files in the order they appear in the list. You can try sort functions but it may be easier to make a new list like this:

file_lst=[]
for k in range(1,97):
    file_lst.append(f'{str(k)}.csv')

s1=pd.Series(file_lst)

def csv_readout_folder(path):    
    os.chdir(path)     
    files = glob.glob(path  '/' '*.csv')
    all_data = pd.DataFrame()

    for f in list(s1[s1.isin(file_lst)]):
        data = csv_readout(path,f)
    
        all_data = pd.concat([all_data, data])
           
    return all_data

CodePudding user response:

You can do something like

files = [f'{path}/{i}.csv' for i in range(1, 22)]

instead of

files = glob.glob(path  '/' '*.csv')

UPD:

def csv_readout_folder(path):    
    os.chdir(path)
    n_files = len([el for el in os.scandir(path) if el.is_file()]) 
    files = [f'{path}/{i}.csv' for i in range(1, n_files   1)]
    all_data = pd.DataFrame()

    for f in files:
        data = csv_readout(path,f)
    
        all_data = pd.concat([all_data, data])
           
    return all_data
  • Related