I have a spreadsheet with 100 rows for example, and want to export each row to an individual JSON file named <the row number>.json.
However I get this error and am not sure what I could be doing wrong.
Code is as follows:
import pandas as pd
from pathlib import Path
from pandas import read_excel
root_location = Path("/fullpath/")
df = pd.read_excel("/fullpath/toexcelfile.xlsx")
rowNumber = 0
for index, row in df.iterrows():
with open(root_location / (str([rowNumber]) ".json"), "w") as f:
f.write(str(row[rowNumber]))
rowNumber =1
Thank you so much for your help.
EDIT:
I have tried
import pandas as pd
from pathlib import Path
from pandas import read_excel
df = pd.read_excel("/fullpath/excel.xlsx", index=[0, 1], columns=['A'])
for index, row in df.iterrows():
with open((str([index]) ".json"), "w") as f:
f.write(row.to_string(row))
Which is now giving me the following error:
TypeError: read_excel() got an unexpected keyword argument 'index'
CodePudding user response:
You are trying to access an index out of the bounds of the array row
in the f.write(str(row[rowNumber]))
line, where you're trying to access the index rowNumber
of row
.
You have already the row in your for
and with row[rowNumber]
you are not accessing the rowNumber
-th row, but the rowNumber
-th element of the current row and this, if the data are not properly formatted, could lead to an index out of bounds as we saw.
In summary, on each iteration of the for
you were accessing the element row[rowNumber]
but each time incrementing both row
and rowNumber
by 1; by this we can say that the program would only consider the values of the diagonal of the dataframe
(i.e. [0,0],[1,1],... ,[n,n]).
Since you have to export each single row, you can do something similar to the simplified example below, using only the row
given from iterrows()
:
import pandas as pd
df = pd.DataFrame((['Value0', 'Value1'],['Value2', 'Value3']), index=[0, 1], columns=['A', 'B'])
# | A | B |
# 0 | Value0 | Value1 |
# 1 | Value2 | Value3 |
for column, row in df.iterrows():
with open((str([column]) ".txt"), "w") as f:
f.write(str(row))
Output:
[0].txt
A Value0
B Value1
Name: 0, dtype: object
[1].txt
A Value2
B Value3
Name: 1, dtype: object
Edit for the updated question:
You've got a TypeError
because read_excel()
doesn't accept the same arguments as the constructor DataFrame()
. To handle this, you'll need to read the excel file and formatting the dataframe
in base of your needs. Since I don't know the exact formatting of that file, I can't provide an exact solution. The best I can do is to refer you to the documentation which explain each and every argument of the matter.
Besides, just a hint, I suggest you to print a dummy dataframe
as an example of the actual one, in order to better understand how to correctly approach it