I've read excel file using pd.read_excel
and i also have specified from which sheets i want them using Regex.
def gettingSheetName(_directory,_regex):
'''reads the file using regex as a pattern for Sheet name'''
xl = pd.ExcelFile(_directory)
regex = re.compile(_regex)
sheets = [n for n in xl.sheet_names if regex.match(n)]
if sheets:
return pd.read_excel(xl, sheet_name=sheets,index_col=False)
Therefor in return i get a Dictionary with 3 Dimensions.
{'SheetName': Col1 Col2
0 27 someText
1 35 someText
.. ... ...
171 4444 someText
172 6666 someText
i've tried pd.DataFrame.from_dict(Dic)
to convert this to Pandas Data frame. and i got this error
If using all scalar values, you must pass an index
i also tried this code that i found pd.DataFrame(list(Sach.items()), columns=['Col1', 'Col2'])
but the resualt looks like this
Col1 col2
0 Sheetname Col1 Col2 0....
Sheet name goes to value and after that Columns and finally the values that i'm assuming are starting with the first index which was 0 I've search everywhere and all i could find was for 2D Dic which doesn't consider the Sheet name. although i used this for loop instead of other popular methods which does the job, my Question is is there any way that i can use a Pandas function to achieve this? or possibly can i Not take the Sheetname from the beginning?
for key in Dic:
Dic=Dic[key]
EDIT 1:
so after using return pd.concat(pd.read_excel(xl, sheet_name=sheets,index_col=False))
and then pd.DataFrame.from_dict(Sach)
i got this Data Frame which is almost what i wanted but i only need the Cols and Values
Col1 Col2
Sheetname 0 Val1 Val2
1
CodePudding user response:
Use concat
with DataFrame.reset_index
:
return pd.concat(pd.read_excel(xl,sheet_name=sheets,index_col=False)).reset_index(drop=True)
d = {'SheetName': pd.DataFrame({'Col1':[27,25], 'Col2':list('ab')}),
'SheetName1': pd.DataFrame({'Col1':[270,205], 'Col2':list('th')})}
print (d)
{'SheetName': Col1 Col2
0 27 a
1 25 b, 'SheetName1': Col1 Col2
0 270 t
1 205 h}
By default get MultiIndex
:
df = pd.concat(d)
print (df)
Col1 Col2
SheetName 0 27 a
1 25 b
SheetName1 0 270 t
1 205 h
For convert it to column use:
df = pd.concat(d).droplevel(1).rename_axis('sheets').reset_index()
print (df)
sheets Col1 Col2
0 SheetName 27 a
1 SheetName 25 b
2 SheetName1 270 t
3 SheetName1 205 h
For remove MultiIndex
use:
df = pd.concat(d).reset_index(drop=True)
print (df)
Col1 Col2
0 27 a
1 25 b
2 270 t
3 205 h