I have a pandas dataframe without colum/row names, just indexes [10 rows x 10001 colums]. I am trying to loop through my data to find and print the values (and print their indexes) which are below a certain value (-1).
0 1 2 ... 9998 9999 10000
0 -0.007941 -0.001512 -0.001382 ... -0.014795 -0.012467 -0.013895
1 0.006133 0.008272 0.008863 ... 0.006959 0.005816 0.010471
2 0.034539 0.039303 0.025629 ... 0.004146 0.007729 0.016468
3 0.016329 0.032751 0.020361 ... -0.001196 0.000477 -0.003695
4 0.027603 0.047889 0.028451 ... -0.001866 0.003521 -0.011133
5 0.030001 0.040376 0.022477 ... -0.024666 -0.023214 -0.020742
6 0.043001 0.054916 0.028356 ... -0.029666 -0.035219 -0.053880
7 0.000211 0.003178 -0.000271 ... -0.016128 -0.035698 -0.032700
8 0.054058 0.044326 0.023248 ... -0.029225 -0.033486 -0.032040
I tried with iterrows()
but can't really figure out the exact code. Hope someone can help.
CodePudding user response:
xind, yind = np.where(df < -1)
will give you the x (row) and y (column) indices of the cells where the value is less than -1.
You can then loop over xind
and yind
(in parallel) and use df.iloc
to iterate over the found values.
Example:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[-1, 2, 3, 4, 5], [-2, -3, 4, 5, -1], [0, 0, 2, -3, 2]])
>>> df
0 1 2 3 4
0 -1 2 3 4 5
1 -2 -3 4 5 -1
2 0 0 2 -3 2
>>> xind, yind = np.where(df < 0)
>>> xind
array([0, 1, 1, 1, 2])
>>> yind
array([0, 0, 1, 4, 3])
>>> for x, y in zip(xind, yind):
... print(x, y, df.iloc[x, y])
...
0 0 -1
1 0 -2
1 1 -3
1 4 -1
2 3 -3
Or, depending on your needs and if you like to keep things short:
>>> xind, yind, df.values[xind, yind]
(array([0, 1, 1, 1, 2]), array([0, 0, 1, 4, 3]), array([-1, -2, -3, -1, -3]))
which of course then allows for looping like
>>> for x, y, value in zip(xind, yind, df.values[xind, yind]):
... print(x, y, value)
...
0 0 -1
1 0 -2
1 1 -3
1 4 -1
2 3 -3
CodePudding user response:
Another approach: With df
your dataframe you could .stack
df
into a series with a 2-dimensional index (df
-row x df
-column), then extract the relevant parts, and print them:
value = -1
s = df.stack()
for (i, j), val in s[s < value].items():
print(i, j, val)
Result for sample dataframe
0 1 2 3 4
0 0 5 -4 5 -4
1 1 3 2 2 -5
2 -5 5 -1 -5 2
3 5 5 -3 0 -2
4 4 2 -5 5 4
would be
0 2 -4
0 4 -4
1 4 -5
2 0 -5
2 3 -5
3 2 -3
3 4 -2
4 2 -5
Or if you want to collect the result in a dictionary:
res = s[s.lt(value)].to_dict()
{(0, 2): -4,
(0, 4): -4,
(1, 4): -5,
(2, 0): -5,
(2, 3): -5,
(3, 2): -3,
(3, 4): -2,
(4, 2): -5}