I want to divide a data frame into chunks (eg: if we have 100 rows, I divide them as 20 chunks) and for each chunk with 5 values in it, I need to apply 5 update queries(5 different tables) on this chunked data.
How can I achieve this task, as am new to this and learning as I work, could you suggest the approach for this ?
for item in np.array_split(df1, 10):
print(item) ##I was able to divide into chunks
for i,j in item.iterrows():
print(item.iloc[i]['ColumnName'])
My idea is to add the update query line after this print statement.
But this code gives an exception.
Traceback (most recent call last):
File "/Users/gd/Documents/myproj/test.py", line 63, in <module>
func()
File "/Users/gd/Documents/myproj/test.py", line 45, in dedupe_pe
print(item.iloc[i]['ColumnName'])
File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 931, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 1566, in _getitem_axis
self._validate_integer(key, axis)
File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
CodePudding user response:
item.iterrows()
yields a row index, and row itself, so you could try as follows:
for item in np.array_split(df1, 10):
print(item) ##I was able to divide into chunks
item["sql"] = "UPDATE " item["table_name"] " SET column1 = '" item["ColumnName_DATA"] "' WHERE condition"
for i, j in item.iterrows():
print(j['ColumnName'])
print(j['sql'])