Looping through CSV files, performing a function, and concatenating DataFrame objects-CodePudding

I am attempting to loop through multiple CSVs, find the mean of multiple variable columns (6 to be specific) within the CSV, which in turn will output a single row of results (6,1) in dimensions, and append that to a dataframe object, for all .csv files in the folder.

I am quite new to programming, and the first code section below does not give the desired output.

My attempted code is as such:


from pathlib import Path
import os

appended_data = []

folder= r"C:\Users\A\Desktop\Analysis\Field result\Height"

for file in Path(folder).glob('*.csv'):
    
    #Print filename to see which file is being processed
    #will use filenames later to add a column to appended_data to use as date-time
    print(os.path.basename(file))
    
    #Read csv
    df = pd.read_csv(file)
    
    #Averaging all variables of interest; mean, min, max, median etc.
    averages = df[df.columns[3:9]].mean()
             
    #Transpose to store as row
    Averages_Data = averages_pd.T
    
    # store DataFrame in list
    appended_data.append(Averages_Data)
    
#see pd.concat documentation for more info
appended_data = pd.concat(appended_data)

print(appended_data)

I am able to do it manually for the first two .csvs with the code below:

height = pd.read_csv(r"C:\Users\A\Desktop\Analysis\Field result\Height\height1.csv")
height.head()
height.info()
averages = height[height.columns[3:9]].mean()
print(averages)
Averages_Data = averages_pd.T
height2 = pd.read_csv(r"C:\Users\Al\Desktop\Analysis\Field result\Height\height2.csv")
averages2 = height2[height2.columns[3:9]].mean()
print(averages2)
Averages2 = averages2_pd.T
Averages2
Averages_Data
x = pd.concat([Averages_Data, Averages2])

and this gives the results I am looking for, but I can't seem to be able to generalise it, doing it manually will take too long. How would I do this?

CodePudding user response：

Would this work:

from pathlib import Path

folder = r"C:\Users\A\Desktop\Analysis\Field result\Height"
dfs = []
for file in Path(folder).glob("*.csv"):
    print(file.name)
    dfs.append(pd.DataFrame(pd.read_csv(file).iloc[:, 3:9].mean()).T)
df = pd.concat(dfs, ignore_index=True)

When I run it, with accordingly adjusted folder, for the following 5 sample files

from pathlib import Path
from random import randint

folder = "Test"
base = Path(folder)
base.mkdir(exist_ok=True)
cols_str = ",".join(f"col{i}" for i in range(10))
for n in range(1, 6):
    (base / f"file{n}.csv").write_text(
        cols_str   "\n"
          ",".join(str(randint(1, 100)) for _ in range(10))   "\n"
          ",".join(str(randint(1, 100)) for _ in range(10))
    )

which essentially look like

col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
55,60,77,32,89,49,71,7,60,52
10,40,66,96,45,11,53,36,40,28

I get the following df:

   col3  col4  col5  col6  col7  col8
0  54.0  61.5  99.5  61.5  34.5  34.5
1  26.5  72.0  65.0  93.0  56.0  73.5
2  53.5  79.0  55.0  68.0  32.5  46.0
3  66.5  44.0  38.5  65.5  16.0  36.5
4  21.0  73.0  54.5  48.5  73.0  45.5