I have the following dataframe
value | A | B |
---|---|---|
1.0 | 7.0 | 8.0 |
2.0 | 9.0 | 8.8 |
3.0 | 9.5 | 9.1 |
4.0 | 10.0 | 9.4 |
5.0 | 13.0 | 9.7 |
6.0 | 15.0 | 9.9 |
7.0 | 16.0 | 10.6 |
9.0 | 17.0 | 17.0 |
What I'm attempting to do:
example:
I'm thinking some sort of if/else statetement:
-if
A < B
return 1.0 ==> when A=7.0 < B=8.0
- if A=B
return 9.0 ==> when A=17 == B=17
-else:
if A > B
look at the two smaller values from B column that match the target value A and return the value of that value at index 1.
Let's say the value A=9.0 so in this example I'm going to look at 8.0 and 8.8, but I'm going to return the next value B=9.1 which is 3.0.
Couple more examples in case it's unclear:
if A=9, check B=8.0 and B=8.8 and return 3.0
if A=9.5, check B=9.1 and B=9.4 and return 5.0
if A=10.0, check B=9.7 and B=9.9 and return 7.0
if A=16, check B=9.9 and B=10.6 and return 9.0
I tried using numpy for this, and indexing it... np.where looked promising but I keep getting stuck in the second part. Can anyone help? All the values are sorted in the ascending order.
Excel formula, I'm not sure if this is helpful at all but I'll add it here anyway:
IF(A<B,value, INDEX(value, match(A,B,1) 1))
Working code:
import pandas as pd
A=7.0
# A=9.0
df = pd.DataFrame({
"value": [1., 2., 3., 4., 5., 6],
'A': [7., 9., 9.5, 10., 13., 15.],
'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
})
for i in range(df.shape[0]):
if A < df['B'][i]:
#this should print 1.0 if A=7.0
print(df["value"][i])
else:
for j in range(i, df.shape[0]):
if A < df['B'][j]:
# a little lost here but if A=9.0, it should print 3.0
print(df["value"][j])
CodePudding user response:
Your code almost work for me - the biggest problem is that in code DataFrame has different values then table at the top - so it gives different results and it doesn't match to expected results.
And it needs
df['A'][i]
instead ofA
break
in secondfor
-loop to exist loop when it finds first matching element<=
in firstif
to run for<
and==
import pandas as pd
df = pd.DataFrame({
"value": [1., 2., 3., 4., 5., 6, 7., 9.],
"A": [7., 9., 9.5, 10., 13., 15., 16., 17.],
"B": [8., 8.8, 9.1, 9.4, 9.7, 9.9, 10.6, 17.0]
})
for i in range(df.shape[0]):
if df['A'][i] <= df['B'][i]: # both `<` and `==`
#print(df["value"][i])
print(f'{df["A"][i]:5.1f} <= {df["B"][i]:5.1f} ==> {df["value"][i]:5.1f}')
else:
for j in range(i 1, df.shape[0]):
if df['A'][i] < df['B'][j]:
#print(df["value"][j])
print(f'{df["A"][i]:5.1f} > {df["B"][i]:5.1f} (<{df["B"][j]:5.1f}) ==> {df["value"][j]:5.1f}')
break
Result:
7.0 <= 8.0 ==> 1.0
9.0 > 8.8 (< 9.1) ==> 3.0
9.5 > 9.1 (< 9.7) ==> 5.0
10.0 > 9.4 (< 10.6) ==> 7.0
13.0 > 9.7 (< 17.0) ==> 9.0
15.0 > 9.9 (< 17.0) ==> 9.0
16.0 > 10.6 (< 17.0) ==> 9.0
17.0 <= 17.0 ==> 9.0
To make it more readable you could use df.iterrows()
import pandas as pd
df = pd.DataFrame({
"value": [1., 2., 3., 4., 5., 6, 7., 9.],
"A": [7., 9., 9.5, 10., 13., 15., 16., 17.],
"B": [8., 8.8, 9.1, 9.4, 9.7, 9.9, 10.6, 17.0]
})
for index, row in df.iterrows():
if row['A'] <= row['B']: # both `<` and `==`
#print(row["value"])
print(f'{row["A"]:5.1f} <= {row["B"]:5.1f} ==> {row["value"]:5.1f}')
else:
for other_index, other_row in df[index 1:].iterrows():
if row['A'] < other_row['B']:
#print(df["value"][j])
print(f'{row["A"]:5.1f} > {row["B"]:5.1f} (<{other_row["B"]:5.1f}) ==> {other_row["value"]:5.1f}')
break