I want to import a column worth of data from multiple sheets in a single excel file and create a single large dataframe with all of the columns. Additionally, I want the name of the new column to be a string that it also being taken from the excel file.
I've tried a few different things, each with a different issue, but here is a start that works:
import pandas as pd
file = r'C:\Users\pazam\OneDrive\Desktop\neuromastCount\sf\Final_Raw.xlsx' #SF
path = r'C:\Users\pazam\OneDrive\Desktop\neuromastCount\sf'
results_raw = pd.DataFrame()
for i in range(19): #19 sheets
df = pd.read_excel(file, usecols='N',skiprows = range(0,37),nrows=36000,engine='openpyxl',header=None, sheet_name=i)
trt = pd.read_excel(file, usecols='G',nrows=1,engine='openpyxl',header=None, sheet_name=i)
# then something that adds df to results_raw as a new column with the string in trt as column header
raw_csv = path "/results_raw.csv"
results_raw.to_csv(raw_csv)
thanks!
CodePudding user response:
This code will read all the sheets in the file into a dictionary of dataframes.
It will then create single column dataframes each consisting of value from column N with the column name coming from the first cell in column G.
Those dataframes will then be concatenated together using pd.concat
.
import pandas as pd
file = 'Final_Raw.xlsx' #SF
df = pd.read_excel(file, sheet_name=None, header=None)
data = pd.concat([pd.DataFrame({v.iloc[0, 6]: v.iloc[:, 13]}) for k, v in df.items()], axis=1)
print(data)
Col1 Col2 Col3
0 Data1 Data25 Data36
1 Data2 Data26 Data37
2 Data3 Data27 Data38
3 Data4 Data28 Data39
4 Data5 Data29 Data40
5 Data6 Data30 Data41
6 Data7 Data31 Data42
7 Data8 Data32 Data43
8 Data9 Data33 Data44
9 Data10 Data34 Data45
10 Data11 Data35 Data46
11 Data12 NaN Data47
12 Data13 NaN NaN
13 Data14 NaN NaN
14 Data15 NaN NaN
15 Data16 NaN NaN
16 Data17 NaN NaN
17 Data18 NaN NaN
18 Data19 NaN NaN
19 Data20 NaN NaN
20 Data21 NaN NaN
21 Data22 NaN NaN
22 Data23 NaN NaN
23 Data24 NaN NaN
CodePudding user response:
Use read_excel
with sheet_name=None
to read all sheets:
dfs = pd.read_excel(file, sheet_name=None)
df = pd.concat(dfs)