I am using a pandas data-frame and for some reason when trying to access one entry after another in a for loop it does gives me an error.
Here is my (simplified) code snippet:
df_original = pd.read_csv(csv_dataframe_filename, sep='\t', header=[0, 1], encoding_errors="replace")
df_original.columns = ['A', 'B',
'Count_Number', 'D',
'E', 'F',
'use_first', 'H', 'I']
df_use = df_original
df_use = df_use.drop(df_use[((df_use['use_first']=='no'))].index)
df_use.columns = ['A', 'B',
'Count_Number', 'D',
'E', 'F',
'use_first', 'H', 'I']
c_mag = np.zeros((len(df_use), 1))
x = 0
for i in range(len(df_use)):
print(df_use['Count_Number'][x]) #THIS IS THE LINE THAT IS THE ISSUE
x = 1
print(c_mag)
print(df_use['Count_Number'][x])
The line that is the issue is marked by a comment. If I enter a specific number instead of the variable x, it works (both outside and inside the loop, but inside the loop it of course then prints always the same value each time which is not what I want). It also works with df_original instead of df_use (but for my purpose I really need df_use). The printing in the very last line also works (even with variable x that at that point has a certain value). I also entered the column naming for df_use in the middle later on, so I got the issue with and without it in the same way. I tried whether all other parts of the code work and they do, so both dataframes can be printed correctly etc. Using x instead of i as a variable is also a result of playing around and trying to find a solution, so using i was giving the same result.
The column contains floats, if that matters.
But for the code as it is I get the following error message ("folder of file" is of course just a replacement for the actual file path):
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "[folder of file]", line 74, in <module>
print(df_use['Count_Number'][x])
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 942, in __getitem__
return self._get_value(key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
Process finished with exit code 1
I searched for answers and tried out different things, such as checking the spelling etc. But I can not find a solution and do not understand what I am doing wrong. Does anyone have an idea on how to solve this issue?
Thank you in advance for any helpful comment!
UPDATE: Found a solution after all. using .iloc[x] instead of just [x] solves the issue. Now I am still curious though why that happens - for other variables it worked even without the .iloc, so why not in this case? I feel like an answer would help me to better understand how things are working in python, so thanks for any hints even if I got the code working already.
What I already tried: The line that is the issue is marked by a comment. If I enter a specific number instead of the variable x, it works. It also works with df_original instead of df_use (but for my purpose I really need df_use). The printing in the very last line also works (even with variable x that at that point has a certain value). I also entered the column naming for df_use in the middle later on, so I got the issue with and without it in the same way. I tried whether all other parts of the code work and they do, so both data-frames can be printed correctly etc. Using x instead of i as a variable is also a result of playing around and trying to find a solution, so using i was giving the same result. I also played around with different ways of how to run the loop, but that did not help either. I searched for answers and tried out different things, such as checking the spelling etc.
What I am expecting: The entries of the data-frame columns can be called and used successfully (in this simplified case: can be printed) in the for loop one entry after another. If the printing itself can be done differently, that does not help me (of course I can just print the whole column, that is working), because my actual purpose is to do further calculations with each value. print() is just for now to simplify the issue and try to find a solution.
CodePudding user response:
The issue is that you are manually incrementing i
in the for loop, but this is something the for loop already does for you. This causes i
to increment by 2 every loop.
Try:
...
c_mag = np.zeros((len(df_use), 1))
for i in range(len(df_use)):
print(df_use['Count_Number'][x]) #THIS IS THE LINE THAT IS THE ISSUE
print(c_mag)
...
CodePudding user response:
This is the answer focusing on the UPDATE section you have provided.
The first thing you need to understand between normal indexing of DataFrame
and using iloc. iloc
basically use position indexing (just like in lists
we have positions of elements 0, 1, ... len(list)-1
), but the normal indexing, in your case [x]
matches the column name (in your case, it is row
) with what you have entered rather than checking the position.
The traceback
tells us that there is no row
name 0
, that's why it is producing KeyError
. In the case of iloc
, it uses position indexing, so it will return the very first value of the column Count_Number
(for x=0
).
In your case, if you want to use the for
loop to print
the values of the column
in sequence, using iloc
is recommended.
As for the last line of your code, it will print
the very last value of your column Count_Number
, as the very last value of x
in for
loop is the length of the DataFrame
- 1.
I was unable to understand completely the rest of your issue, so if you still have them, please do ask but in short and specific manner.