Home > database >  import from excel file and add to existing dataframe
import from excel file and add to existing dataframe

Time:11-11

I want to import a column worth of data from multiple sheets in a single excel file and create a single large dataframe with all of the columns. Additionally, I want the name of the new column to be a string that it also being taken from the excel file.

I've tried a few different things, each with a different issue, but here is a start that works:

import pandas as pd


file = r'C:\Users\pazam\OneDrive\Desktop\neuromastCount\sf\Final_Raw.xlsx' #SF

path = r'C:\Users\pazam\OneDrive\Desktop\neuromastCount\sf' 



results_raw = pd.DataFrame()


for i in range(19): #19 sheets
    df = pd.read_excel(file, usecols='N',skiprows = range(0,37),nrows=36000,engine='openpyxl',header=None, sheet_name=i)
    trt = pd.read_excel(file, usecols='G',nrows=1,engine='openpyxl',header=None, sheet_name=i)

# then something that adds df to results_raw as a new column with the string in trt as column header



raw_csv = path "/results_raw.csv"
results_raw.to_csv(raw_csv)

thanks!

CodePudding user response:

This code will read all the sheets in the file into a dictionary of dataframes.

It will then create single column dataframes each consisting of value from column N with the column name coming from the first cell in column G.

Those dataframes will then be concatenated together using pd.concat.

import pandas as pd

file = 'Final_Raw.xlsx' #SF

df = pd.read_excel(file, sheet_name=None, header=None)

data = pd.concat([pd.DataFrame({v.iloc[0, 6]: v.iloc[:, 13]}) for k, v in df.items()], axis=1)

print(data)
      Col1    Col2    Col3
0    Data1  Data25  Data36
1    Data2  Data26  Data37
2    Data3  Data27  Data38
3    Data4  Data28  Data39
4    Data5  Data29  Data40
5    Data6  Data30  Data41
6    Data7  Data31  Data42
7    Data8  Data32  Data43
8    Data9  Data33  Data44
9   Data10  Data34  Data45
10  Data11  Data35  Data46
11  Data12     NaN  Data47
12  Data13     NaN     NaN
13  Data14     NaN     NaN
14  Data15     NaN     NaN
15  Data16     NaN     NaN
16  Data17     NaN     NaN
17  Data18     NaN     NaN
18  Data19     NaN     NaN
19  Data20     NaN     NaN
20  Data21     NaN     NaN
21  Data22     NaN     NaN
22  Data23     NaN     NaN
23  Data24     NaN     NaN

Sheet1 Sheet1

Sheet2 Sheet2

Sheet3 Sheet3

CodePudding user response:

Use read_excel with sheet_name=None to read all sheets:

dfs = pd.read_excel(file, sheet_name=None)

df = pd.concat(dfs)
  • Related