I have a .xlsx file in which there are 3 different table available, seprated by three key words "Setteled", "Refund", "Charged" in want to read all the table in separate dataframes, sharing file data and desired output.
File Data:-
Setteled
IN.Type STRA STRB STRC
CRBD 2487 XR XL0054
DFRS 3754 MY XL0684
CRBD 7356 DF XL8911
DFRS 4487 DF XL58999
DFRS 7785 MY XL76568
CRBD 8235 GL XL0635
DFRS 2468 PQ XL4569
DFRS 9735 GR XL7589
CRBD 6486 TY XL5566
DFRS 1023 PQ XL27952
Refund
IN.Type STRD STRE
DFRS 5898 RT
DFRS 5684 YU
CRBD 2564 RT
DFRS 1564 OP
DFRS 2548 YU
CRBD 4478 GL
CRBD 4515 OP
DFRS 5695 YU
DFRS 8665 RT
CRBD 1487 LK
Charged
IN.Type STRF STRG
CRBD 1289 GH
CRBD 8546 JK
CRBD 6599 LP
DFRS 7899 JK
DFRS 1456 GH
CRBD 6988 JK
DFRS 1468 LP
DFRS 4697 GH
DFRS 7941 LP
DFRS 1636 JK
Now after reading the filr, I want above three tables in different dataframe as below.
df = "Row available below Setteled"
df:-
IN.Type STRA STRB STRC
CRBD 2487 XR XL0054
DFRS 3754 MY XL0684
CRBD 7356 DF XL8911
DFRS 4487 DF XL58999
DFRS 7785 MY XL76568
CRBD 8235 GL XL0635
DFRS 2468 PQ XL4569
DFRS 9735 GR XL7589
CRBD 6486 TY XL5566
DFRS 1023 PQ XL27952
df2 = "Row available below Refund"
df2:-
IN.Type STRD STRE
DFRS 5898 RT
DFRS 5684 YU
CRBD 2564 RT
DFRS 1564 OP
DFRS 2548 YU
CRBD 4478 GL
CRBD 4515 OP
DFRS 5695 YU
DFRS 8665 RT
CRBD 1487 LK
df3 = "Rows available below Charged"
df3:-
IN.Type STRF STRG
CRBD 1289 GH
CRBD 8546 JK
CRBD 6599 LP
DFRS 7899 JK
DFRS 1456 GH
CRBD 6988 JK
DFRS 1468 LP
DFRS 4697 GH
DFRS 7941 LP
DFRS 1636 JK
CodePudding user response:
Are your "tables" actual Excel tables? If so, you could use the approach explained here.
E.g.:
import pandas as pd
from openpyxl import load_workbook
filename = "tables.xlsx"
#read file
wb = load_workbook(filename)
#access specific sheet
ws = wb["Sheet1"]
mapping = {}
for entry, data_boundary in ws.tables.items():
#parse the data within the ref boundary
data = ws[data_boundary]
#extract the data
#the inner list comprehension gets the values for each cell in the table
content = [[cell.value for cell in ent]
for ent in data
]
header = content[0]
#the contents ... excluding the header
rest = content[1:]
#create dataframe with the column names
#and pair table name with dataframe
df = pd.DataFrame(rest, columns = header)
mapping[entry] = df
This will get you a dictionary with all the tables in a specific sheet.
CodePudding user response:
I am not sure if this is the best approach, but you can use
pd.read_excel(file, skiprows=1, skipfooter=#)
So for the first dataframe you need to skip one line at the start and #number of lines below the last line of data you have
You can also read it all as dataframe and then slice it using df.loc