Home > Net >  Iterate through pandas dataframe with 10 rows x 10001 colums
Iterate through pandas dataframe with 10 rows x 10001 colums

Time:09-13

I have a pandas dataframe without colum/row names, just indexes [10 rows x 10001 colums]. I am trying to loop through my data to find and print the values (and print their indexes) which are below a certain value (-1).

      0         1         2      ...     9998      9999      10000
0 -0.007941 -0.001512 -0.001382  ... -0.014795 -0.012467 -0.013895
1  0.006133  0.008272  0.008863  ...  0.006959  0.005816  0.010471
2  0.034539  0.039303  0.025629  ...  0.004146  0.007729  0.016468
3  0.016329  0.032751  0.020361  ... -0.001196  0.000477 -0.003695
4  0.027603  0.047889  0.028451  ... -0.001866  0.003521 -0.011133
5  0.030001  0.040376  0.022477  ... -0.024666 -0.023214 -0.020742
6  0.043001  0.054916  0.028356  ... -0.029666 -0.035219 -0.053880
7  0.000211  0.003178 -0.000271  ... -0.016128 -0.035698 -0.032700
8  0.054058  0.044326  0.023248  ... -0.029225 -0.033486 -0.032040

I tried with iterrows() but can't really figure out the exact code. Hope someone can help.

CodePudding user response:

xind, yind = np.where(df < -1) will give you the x (row) and y (column) indices of the cells where the value is less than -1.

You can then loop over xind and yind (in parallel) and use df.iloc to iterate over the found values.

Example:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[-1, 2, 3, 4, 5], [-2, -3, 4, 5, -1], [0, 0, 2, -3, 2]])
>>> df
   0  1  2  3  4
0 -1  2  3  4  5
1 -2 -3  4  5 -1
2  0  0  2 -3  2
>>> xind, yind = np.where(df < 0)
>>> xind
array([0, 1, 1, 1, 2])
>>> yind
array([0, 0, 1, 4, 3])
>>> for x, y in zip(xind, yind):
...     print(x, y, df.iloc[x, y])
...
0 0 -1
1 0 -2
1 1 -3
1 4 -1
2 3 -3

Or, depending on your needs and if you like to keep things short:

>>> xind, yind, df.values[xind, yind]
(array([0, 1, 1, 1, 2]), array([0, 0, 1, 4, 3]), array([-1, -2, -3, -1, -3]))

which of course then allows for looping like

>>> for x, y, value in zip(xind, yind, df.values[xind, yind]):
...     print(x, y, value)
...
0 0 -1
1 0 -2
1 1 -3
1 4 -1
2 3 -3

CodePudding user response:

Another approach: With df your dataframe you could .stack df into a series with a 2-dimensional index (df-row x df-column), then extract the relevant parts, and print them:

value = -1
s = df.stack()
for (i, j), val in s[s < value].items():
    print(i, j, val)

Result for sample dataframe

   0  1  2  3  4
0  0  5 -4  5 -4
1  1  3  2  2 -5
2 -5  5 -1 -5  2
3  5  5 -3  0 -2
4  4  2 -5  5  4

would be

0 2 -4
0 4 -4
1 4 -5
2 0 -5
2 3 -5
3 2 -3
3 4 -2
4 2 -5

Or if you want to collect the result in a dictionary:

res = s[s.lt(value)].to_dict()
{(0, 2): -4,
 (0, 4): -4,
 (1, 4): -5,
 (2, 0): -5,
 (2, 3): -5,
 (3, 2): -3,
 (3, 4): -2,
 (4, 2): -5}
  • Related