Home > Software design >  Return value based on column comparison
Return value based on column comparison

Time:07-22

I have the following dataframe

value A B
1.0 7.0 8.0
2.0 9.0 8.8
3.0 9.5 9.1
4.0 10.0 9.4
5.0 13.0 9.7
6.0 15.0 9.9
7.0 16.0 10.6
9.0 17.0 17.0

What I'm attempting to do:

example:

I'm thinking some sort of if/else statetement:

    -if 
        A < B
    return 1.0 ==> when A=7.0 < B=8.0
    - if A=B
    return 9.0 ==> when A=17 == B=17
    -else: 
        if A > B
         look at the two smaller values from B column that match the target value A and return the value of that value at index 1.
         Let's say the value A=9.0 so in this example I'm going to look at 8.0 and 8.8, but I'm going to return the next value B=9.1 which is 3.0. 

Couple more examples in case it's unclear:

if A=9, check B=8.0 and B=8.8 and return 3.0

if A=9.5, check B=9.1 and B=9.4 and return 5.0

if A=10.0, check B=9.7 and B=9.9 and return 7.0

if A=16, check B=9.9 and B=10.6 and return 9.0

I tried using numpy for this, and indexing it... np.where looked promising but I keep getting stuck in the second part. Can anyone help? All the values are sorted in the ascending order.

Excel formula, I'm not sure if this is helpful at all but I'll add it here anyway:

IF(A<B,value, INDEX(value, match(A,B,1) 1))

Working code:

import pandas as pd
A=7.0
# A=9.0
df = pd.DataFrame({
               "value": [1., 2., 3., 4., 5., 6], 
               'A': [7., 9., 9.5, 10., 13., 15.],
               'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
             })
for i in range(df.shape[0]):
    if A < df['B'][i]:
        #this should print 1.0 if A=7.0
        print(df["value"][i])
    else:
        for j in range(i, df.shape[0]):
            if A < df['B'][j]:
                # a little lost here but if A=9.0, it should print 3.0
                print(df["value"][j])

CodePudding user response:

Your code almost work for me - the biggest problem is that in code DataFrame has different values then table at the top - so it gives different results and it doesn't match to expected results.

And it needs

  • df['A'][i] instead of A
  • break in second for-loop to exist loop when it finds first matching element
  • <= in first if to run for < and ==
import pandas as pd

df = pd.DataFrame({
    "value": [1.,  2.,  3.,  4.,  5.,   6,   7.,   9.], 
    "A":     [7.,  9., 9.5, 10., 13., 15.,  16.,  17.],
    "B":     [8., 8.8, 9.1, 9.4, 9.7, 9.9, 10.6, 17.0]
})

for i in range(df.shape[0]):
    if df['A'][i] <= df['B'][i]:  # both `<` and `==`
        #print(df["value"][i])
        print(f'{df["A"][i]:5.1f} <= {df["B"][i]:5.1f}          ==> {df["value"][i]:5.1f}')
    else:
        for j in range(i 1, df.shape[0]):
            if df['A'][i] < df['B'][j]:
                #print(df["value"][j])
                print(f'{df["A"][i]:5.1f}  > {df["B"][i]:5.1f} (<{df["B"][j]:5.1f}) ==> {df["value"][j]:5.1f}')
                break

Result:

  7.0 <=   8.0          ==>   1.0
  9.0  >   8.8 (<  9.1) ==>   3.0
  9.5  >   9.1 (<  9.7) ==>   5.0
 10.0  >   9.4 (< 10.6) ==>   7.0
 13.0  >   9.7 (< 17.0) ==>   9.0
 15.0  >   9.9 (< 17.0) ==>   9.0
 16.0  >  10.6 (< 17.0) ==>   9.0
 17.0 <=  17.0          ==>   9.0

To make it more readable you could use df.iterrows()

import pandas as pd

df = pd.DataFrame({
    "value": [1.,  2.,  3.,  4.,  5.,   6,   7.,   9.], 
    "A":     [7.,  9., 9.5, 10., 13., 15.,  16.,  17.],
    "B":     [8., 8.8, 9.1, 9.4, 9.7, 9.9, 10.6, 17.0]
})

for index, row in df.iterrows():
    if row['A'] <= row['B']:  # both `<` and `==`
        #print(row["value"])
        print(f'{row["A"]:5.1f} <= {row["B"]:5.1f}          ==> {row["value"]:5.1f}')
    else:
        for other_index, other_row in df[index 1:].iterrows():
            if row['A'] < other_row['B']:
                #print(df["value"][j])
                print(f'{row["A"]:5.1f}  > {row["B"]:5.1f} (<{other_row["B"]:5.1f}) ==> {other_row["value"]:5.1f}')
                break
  • Related