Home > Enterprise >  IndexError: index 1 is out of bounds for axis 0 with size 1 when exporting individual excel row to J
IndexError: index 1 is out of bounds for axis 0 with size 1 when exporting individual excel row to J

Time:10-11

I have a spreadsheet with 100 rows for example, and want to export each row to an individual JSON file named <the row number>.json.
However I get this error and am not sure what I could be doing wrong.
Code is as follows:

import pandas as pd
from pathlib import Path
from pandas import read_excel


root_location = Path("/fullpath/")
df = pd.read_excel("/fullpath/toexcelfile.xlsx") 
rowNumber = 0

for index, row in df.iterrows():
    with open(root_location / (str([rowNumber])   ".json"), "w") as f:
        f.write(str(row[rowNumber]))
        rowNumber  =1

Thank you so much for your help.

EDIT:

I have tried

import pandas as pd
from pathlib import Path
from pandas import read_excel

df = pd.read_excel("/fullpath/excel.xlsx", index=[0, 1], columns=['A']) 

for index, row in df.iterrows():
    with open((str([index])   ".json"), "w") as f:
        f.write(row.to_string(row))
        

Which is now giving me the following error:


TypeError: read_excel() got an unexpected keyword argument 'index'

CodePudding user response:

You are trying to access an index out of the bounds of the array row in the f.write(str(row[rowNumber])) line, where you're trying to access the index rowNumber of row.

You have already the row in your for and with row[rowNumber] you are not accessing the rowNumber-th row, but the rowNumber-th element of the current row and this, if the data are not properly formatted, could lead to an index out of bounds as we saw.

In summary, on each iteration of the for you were accessing the element row[rowNumber] but each time incrementing both row and rowNumber by 1; by this we can say that the program would only consider the values of the diagonal of the dataframe (i.e. [0,0],[1,1],... ,[n,n]).

Since you have to export each single row, you can do something similar to the simplified example below, using only the row given from iterrows():

import pandas as pd

df = pd.DataFrame((['Value0', 'Value1'],['Value2', 'Value3']), index=[0, 1], columns=['A', 'B'])
#   |    A   |    B   |
# 0 | Value0 | Value1 |
# 1 | Value2 | Value3 |

for column, row in df.iterrows():
    with open((str([column])   ".txt"), "w") as f:
        f.write(str(row))

Output:

[0].txt

A    Value0
B    Value1
Name: 0, dtype: object

[1].txt

A    Value2
B    Value3
Name: 1, dtype: object

Edit for the updated question:

You've got a TypeError because read_excel() doesn't accept the same arguments as the constructor DataFrame(). To handle this, you'll need to read the excel file and formatting the dataframe in base of your needs. Since I don't know the exact formatting of that file, I can't provide an exact solution. The best I can do is to refer you to the documentation which explain each and every argument of the matter.

Besides, just a hint, I suggest you to print a dummy dataframe as an example of the actual one, in order to better understand how to correctly approach it

  • Related