I am attempting to loop through multiple CSVs, find the mean of multiple variable columns (6 to be specific) within the CSV, which in turn will output a single row of results (6,1) in dimensions, and append that to a dataframe object, for all .csv files in the folder.
I am quite new to programming, and the first code section below does not give the desired output.
My attempted code is as such:
from pathlib import Path
import os
appended_data = []
folder= r"C:\Users\A\Desktop\Analysis\Field result\Height"
for file in Path(folder).glob('*.csv'):
#Print filename to see which file is being processed
#will use filenames later to add a column to appended_data to use as date-time
print(os.path.basename(file))
#Read csv
df = pd.read_csv(file)
#Averaging all variables of interest; mean, min, max, median etc.
averages = df[df.columns[3:9]].mean()
#Transpose to store as row
Averages_Data = averages_pd.T
# store DataFrame in list
appended_data.append(Averages_Data)
#see pd.concat documentation for more info
appended_data = pd.concat(appended_data)
print(appended_data)
I am able to do it manually for the first two .csvs with the code below:
height = pd.read_csv(r"C:\Users\A\Desktop\Analysis\Field result\Height\height1.csv")
height.head()
height.info()
averages = height[height.columns[3:9]].mean()
print(averages)
Averages_Data = averages_pd.T
height2 = pd.read_csv(r"C:\Users\Al\Desktop\Analysis\Field result\Height\height2.csv")
averages2 = height2[height2.columns[3:9]].mean()
print(averages2)
Averages2 = averages2_pd.T
Averages2
Averages_Data
x = pd.concat([Averages_Data, Averages2])
and this gives the results I am looking for, but I can't seem to be able to generalise it, doing it manually will take too long. How would I do this?
CodePudding user response:
Would this work:
from pathlib import Path
folder = r"C:\Users\A\Desktop\Analysis\Field result\Height"
dfs = []
for file in Path(folder).glob("*.csv"):
print(file.name)
dfs.append(pd.DataFrame(pd.read_csv(file).iloc[:, 3:9].mean()).T)
df = pd.concat(dfs, ignore_index=True)
When I run it, with accordingly adjusted folder
, for the following 5 sample files
from pathlib import Path
from random import randint
folder = "Test"
base = Path(folder)
base.mkdir(exist_ok=True)
cols_str = ",".join(f"col{i}" for i in range(10))
for n in range(1, 6):
(base / f"file{n}.csv").write_text(
cols_str "\n"
",".join(str(randint(1, 100)) for _ in range(10)) "\n"
",".join(str(randint(1, 100)) for _ in range(10))
)
which essentially look like
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
55,60,77,32,89,49,71,7,60,52
10,40,66,96,45,11,53,36,40,28
I get the following df
:
col3 col4 col5 col6 col7 col8
0 54.0 61.5 99.5 61.5 34.5 34.5
1 26.5 72.0 65.0 93.0 56.0 73.5
2 53.5 79.0 55.0 68.0 32.5 46.0
3 66.5 44.0 38.5 65.5 16.0 36.5
4 21.0 73.0 54.5 48.5 73.0 45.5