Home > front end >  Python DataFrame chunk extract issue
Python DataFrame chunk extract issue

Time:09-17

I want to divide a data frame into chunks (eg: if we have 100 rows, I divide them as 20 chunks) and for each chunk with 5 values in it, I need to apply 5 update queries(5 different tables) on this chunked data.

How can I achieve this task, as am new to this and learning as I work, could you suggest the approach for this ?

for item in np.array_split(df1, 10):
 print(item) ##I was able to divide into chunks
 for i,j in item.iterrows():
   print(item.iloc[i]['ColumnName'])

My idea is to add the update query line after this print statement.

But this code gives an exception.

Traceback (most recent call last):
  File "/Users/gd/Documents/myproj/test.py", line 63, in <module>
    func()
  File "/Users/gd/Documents/myproj/test.py", line 45, in dedupe_pe
    print(item.iloc[i]['ColumnName'])
  File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 931, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 1566, in _getitem_axis
    self._validate_integer(key, axis)
  File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

CodePudding user response:

item.iterrows() yields a row index, and row itself, so you could try as follows:

for item in np.array_split(df1, 10):
    print(item) ##I was able to divide into chunks
    item["sql"] = "UPDATE "   item["table_name"]   " SET column1 = '"   item["ColumnName_DATA"]   "' WHERE condition"
    for i, j in item.iterrows():
        print(j['ColumnName'])
        print(j['sql'])
  • Related