I'm using a for loop to slice a dataframe and then extract information from each slice. I then store that information in a dict so I can append it to a list for later use. My problem is that the infomation is not useable: it exists as a pandas Series rather than as the actual scalar value of the cell I'm trying to extract. Below is an example of the process I'm trying to execute:
df = pd.DataFrame({'c1': np.arange(0,15),'c2': np.arange(0,15), 'c3': ['A']*5 ['B']*5 ['C']*5})
iterable = ['A', 'B', 'C']
dict_list = []
for i in iterable:
out_dict = dict()
data = df[df.c3==i]
out = data.c1[-1:].iloc[0]
out_dict['out'] = out
dict_list.append(out_dict)
out_df = pd.DataFrame.from_records(dict_list)
Bizzarrely, the code above works, but when I change the dataframe to my real data, I get an IndexError: single positional indexer is out-of-bounds
error at line 7, which I believe means that there is no index. In both my data and the example above, the type of data.c1[-1:]
is pandas.core.series.Series and they both have length 1.
Even stranger is that If I run out = data.c1[-1:]
inside the for loop, and then run out.iloc[0]
outside the for loop I don't get an error.
Does anyone know why iloc would fail in this case? Is there a way to force out
to be indexable?
CodePudding user response:
This happens when you index a row/column with a number that is larger than the dimensions of your dataframe.
dataframe1.fillna("nan") # or whatever you want as a fill value
dataframe2.fillna("nan")
for example
df.iloc[:, 10]
would refer to the eleventh column.
CodePudding user response:
Okay I don't have an answer to the original question, but replacing the .iloc[0] with .squeeze() solved my issue, like so: out = data.c1[-1:].squeeze()