Home > Software design >  How to iterate over a pandas dataframe using a list
How to iterate over a pandas dataframe using a list

Time:04-29

I have the following pandas dataframe

import pandas as pd
foo = pd.DataFrame({'source': ['tomato', 'carrots', 'cheese'],
                              'tomato': [0.7, 0.4, 0.8],
                              'carrots': [0.15,0.3,0.1],
                               'cheese': [0.15,0.3,0.1]})

foo

source  tomato  carrots cheese
0   tomato  0.7 0.15    0.15
1   carrots 0.4 0.30    0.30
2   cheese  0.8 0.10    0.10

and the following list:

sequence = ['carrots','carrots','tomato','carrots']

The sequence describes the sequential steps: carrots -> carrots -> tomato -> carrots, which means going from carrots to carrots, then going from carrots to tomato and lastly going from tomato to carrots.

I would like, to iterate over foo, according to the above sequence, and calculate foo['carrots']['carrots']*foo['carrots']['tomato']*foo['tomato']['carrots']

where foo['carrots']['carrots'] is the element of foo that corresponds to source carrots and column carrots, so 0.3

How could I achieve that efficiently ?

CodePudding user response:

Maybe you could use zip:

In [1]: import pandas as pd
In [2]: foo = pd.DataFrame({
   ...:     'source': ['tomato', 'carrots', 'cheese'],
   ...:     'tomato': [0.7, 0.4, 0.8],
   ...:     'carrots': [0.15, 0.3, 0.1],
   ...:     'cheese': [0.15, 0.3, 0.1]
   ...: })
In [3]: foo
Out[3]: 
    source  tomato  carrots  cheese
0   tomato     0.7     0.15    0.15
1  carrots     0.4     0.30    0.30
2   cheese     0.8     0.10    0.10
In [4]: sequence = ['carrots', 'carrots', 'tomato', 'carrots']
   ...: product = 1
   ...: for row, col in zip(sequence[:-1], sequence[1:]):
   ...:     val = foo[foo.source == row].iloc[0][col]
   ...:     product *= val
   ...:     print(f'foo[{row}][{col}] = {val}')
   ...: 
   ...: 
foo[carrots][carrots] = 0.3
foo[carrots][tomato] = 0.4
foo[tomato][carrots] = 0.15
In [5]: product
Out[5]: 0.018

CodePudding user response:

I create MultiIndex Series, then pairs from sequence, Series.reindex and get product for multiple all values:

s = foo.set_index('source').stack()

print (s)
source          
tomato   tomato     0.70
         carrots    0.15
         cheese     0.15
carrots  tomato     0.40
         carrots    0.30
         cheese     0.30
cheese   tomato     0.80
         carrots    0.10
         cheese     0.10
dtype: float64

sequence = ['carrots','carrots','tomato','carrots']

print (list(zip(sequence[:-1], sequence[1:])))
[('carrots', 'carrots'), ('carrots', 'tomato'), ('tomato', 'carrots')]

print (s.reindex(list(zip(sequence[:-1], sequence[1:]))))
source          
carrots  carrots    0.30
         tomato     0.40
tomato   carrots    0.15
dtype: float64


out = s.reindex(list(zip(sequence[:-1], sequence[1:]))).prod()
print (out)
0.018
  • Related