Home > Enterprise >  How to access particular elements of a dataframe in Pandas. Gives error
How to access particular elements of a dataframe in Pandas. Gives error

Time:05-06

I have a dataframe df_params. It contains parameters for the stored procedure.

   PurchaseOrderID   OrderDate SupplierReference     DF_Name
0                1  2013-01-01          B2084020  dataframe1
1                2  2013-01-01            293092  dataframe2
2                3  2013-01-01          08803922  dataframe3
3                4  2013-01-01         BC0280982  dataframe4
4                5  2013-01-01         ML0300202  dataframe5

I simply want to access the elements of the dataframe in a loop:

for i in range(len(df_params)):
    print(df_params[i][0])

But it gives me an error without really explanation:

Traceback (most recent call last):
  File "C:my\path\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Test3.py", line 35, in <module>
    print(df_params[i][0])
  File "C:\Users\my\path\Python37\lib\site-packages\pandas\core\frame.py", line 2995, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\my\path\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
PS Microsoft.PowerShell.Core\FileSystem::\\my\path>

The goal is to supply value to the stored procedure:

for i in range(len(df_params)):
    query = "EXEC Purchasing.GetPurchaseOrder "   df_params[i][0]   ","   str(df_params[i][1])   ","   df_params[i][2])
    df = pd.read_sql(query, conn)

desired outcome from print(query):

EXEC Purchasing.GetPurchaseOrder 1, '2013-01-01', 'B2084020'
EXEC Purchasing.GetPurchaseOrder 2, '2013-01-01', '293092'
EXEC Purchasing.GetPurchaseOrder 3, '2013-01-01', '08803922'
EXEC Purchasing.GetPurchaseOrder 4, '2013-01-01', 'BC0280982'
EXEC Purchasing.GetPurchaseOrder 5, '2013-01-01', 'ML0300202'

CodePudding user response:

On data frame there are better ways to access values, you can use lambda. with lambda will have an access to any row.

df.apply(lambda row : print(row['DF_Name']))

now the variable 'row' is each row on the dataframe, and you can access to each properties on the row.

CodePudding user response:

pandas.DataFrames don't behave exactly like numpy.ndarrays. There are basically three options:

option 1: iterrows-method: You can iterate over rows of a pandas.dataframe by

for idx, row in df_params.iterrows():
    print(row['PurchaseOrderID'])

This is a particularly readable way, so personally I prefer this

option 2: indexing: if you want to index pandas.dataframe just like an numpy.ndarray object, go with the method .iat[]

for i in range(len(df_params)):
    print(df_params.iat[i, 0])

This actually indexes all elements and ignores the index of the dataframe! So assuming that you have a different index (in the extreme some strings or a table with a pandas.DataTimeIndex) this still works... just as if you would have done a df_params.to_numpy()[i, 0]. Note: There exists a similar function that uses the column name: .at[]

There is a second way to index a pandas.DataFrame object and it is just a little safer with regard to columns:.loc[] It takes an index and column name(s)

for idx in df_params.index:
    print(df_params.iloc[idx, 'PurchaseOrderID'])

option 3: slicing a pandas.Series object: Every column in a pandas.DataFrame is a pandas.Series object, which you can index similar (you actually index the series as described above) to a numpy.ndarray:

col = df_params['PurchaseOrderID']
for idx in col.index:
    print(col[idx])

So what went wrong in your case? The double indexing is almost the same as the last example but it calls calls .loc[] under the hood and thus expects a column name and not a number (that would have been the method .iloc[]. And it is expecting to see the column first and then the row. So if you really want, you could go like this:

for i in range(len(df_params)):
    print(df_params.iloc[0][i])

but this only works because your pandas.DataFrame has continuous numeric indices starting from 0! So please don't do this and use the actual indices of your table (actually use one of the options above and not the last one ;) )

  • Related