I have the following pandas
dataframe
import pandas as pd
foo = pd.DataFrame({'source': ['tomato', 'carrots', 'cheese'],
'tomato': [0.7, 0.4, 0.8],
'carrots': [0.15,0.3,0.1],
'cheese': [0.15,0.3,0.1]})
foo
source tomato carrots cheese
0 tomato 0.7 0.15 0.15
1 carrots 0.4 0.30 0.30
2 cheese 0.8 0.10 0.10
and the following list:
sequence = ['carrots','carrots','tomato','carrots']
The sequence
describes the sequential steps: carrots -> carrots -> tomato -> carrots
, which means going from carrots to carrots, then going from carrots to tomato and lastly going from tomato to carrots.
I would like, to iterate over foo
, according to the above sequence, and calculate
foo['carrots']['carrots']*foo['carrots']['tomato']*foo['tomato']['carrots']
where foo['carrots']['carrots']
is the element of foo
that corresponds to source
carrots
and column carrots
, so 0.3
How could I achieve that efficiently ?
CodePudding user response:
Maybe you could use zip
:
In [1]: import pandas as pd
In [2]: foo = pd.DataFrame({
...: 'source': ['tomato', 'carrots', 'cheese'],
...: 'tomato': [0.7, 0.4, 0.8],
...: 'carrots': [0.15, 0.3, 0.1],
...: 'cheese': [0.15, 0.3, 0.1]
...: })
In [3]: foo
Out[3]:
source tomato carrots cheese
0 tomato 0.7 0.15 0.15
1 carrots 0.4 0.30 0.30
2 cheese 0.8 0.10 0.10
In [4]: sequence = ['carrots', 'carrots', 'tomato', 'carrots']
...: product = 1
...: for row, col in zip(sequence[:-1], sequence[1:]):
...: val = foo[foo.source == row].iloc[0][col]
...: product *= val
...: print(f'foo[{row}][{col}] = {val}')
...:
...:
foo[carrots][carrots] = 0.3
foo[carrots][tomato] = 0.4
foo[tomato][carrots] = 0.15
In [5]: product
Out[5]: 0.018
CodePudding user response:
I create MultiIndex Series
, then pairs from sequence, Series.reindex
and get product for multiple all values:
s = foo.set_index('source').stack()
print (s)
source
tomato tomato 0.70
carrots 0.15
cheese 0.15
carrots tomato 0.40
carrots 0.30
cheese 0.30
cheese tomato 0.80
carrots 0.10
cheese 0.10
dtype: float64
sequence = ['carrots','carrots','tomato','carrots']
print (list(zip(sequence[:-1], sequence[1:])))
[('carrots', 'carrots'), ('carrots', 'tomato'), ('tomato', 'carrots')]
print (s.reindex(list(zip(sequence[:-1], sequence[1:]))))
source
carrots carrots 0.30
tomato 0.40
tomato carrots 0.15
dtype: float64
out = s.reindex(list(zip(sequence[:-1], sequence[1:]))).prod()
print (out)
0.018