index match in Python-CodePudding

I have the following dataframe

value	A	B
1.0	7.0	8.0
2.0	9.0	8.8
3.0	9.5	9.1
4.0	10.0	9.4
5.0	13.0	9.7
6.0	15.0	9.9
7.0	16.0	10.6
8.0	17.0	17.0

What I'm attempting to do:

example:

I'm thinking some sort of if/else statetement:

    -if 
        A < B
      return 1.0 ==> since A=7.0 < B=8.0
    if A=B,
      return value => if A=17, B=17 return 8.0
    -else: 
        if A > B
         look at the two smaller values closest to match A from B column and return value from B 1.
         Let's say the value A=9.0 so in this example it's going to check B = 8.0 and B=8.8 and return the value for B=9.1 which is 3.0.

Couple more examples in case it's unclear:

if A=9, check B=8.0 and B=8.8 and return 3.0

if A=9.5, check B=9.1 and B=9.4 and return 5.0

if A=10.0, check B=9.7 and B=9.9 and return 7.0

if A=16, check B=9.9 and B=10.6 and return 8.0

I tried using numpy for this, and indexing it... np.where looked promising but I keep getting stuck in the second part. Can anyone help? It's safe to assume that the values are sorted in the ascending order.

CodePudding user response：

I'm sorry to say it, but your question is not clear at all. You made strange mistakes and did not explain clearly what you wanted.

So, what this program does:

If A < B, return "value" from the same line;
If A > B, looks through the whole dataframe from the current B up to the end and takes "value" from the first line where B is more than A.

If it is not what you wanted, please, give more clear explanation) Delighted to help you.

import pandas as pd
df = pd.DataFrame({
               "value": [1., 2., 3., 4., 5., 6], 
               'A': [7., 9., 9.5, 10., 13., 15.],
               'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
             })
for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: 
        print(df["value"][i])
    else:
        for j in range(i, df.shape[0]):
            if df['A'][i] < df['B'][j]:
                print(df["value"][j])

Edited: output is 1.0, 3.0, 4.0. I've just seen that you noticed your signs, sorry)

The second code (looks for the closest number from the whole dataframe):

import pandas as pd
df = pd.DataFrame({
               "value": [1., 2., 3., 4., 5., 6], 
               'A': [7., 9., 9.5, 10., 13., 15.],
               'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
             })
for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        b_column = df['B']
        b_column_residuals = abs(df['B'] - df['A'][i])
        print(df["value"][b_column_residuals.idxmin()])

The trird code (looks for the closest number from the current B and one below (or above when the last B)):

import pandas as pd
df = pd.DataFrame({
               "value": [1., 2., 3., 4., 5., 6], 
               'A': [7., 9., 9.5, 10., 13., 15.],
               'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
             })
for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        try:
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i   1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i   1])
        except:
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i - 1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i - 1])

The fourth code (compares the current B, the B above and below):

for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        if i == 0: # When the first row.
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i   1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i   1])
        elif i == df.shape[0] - 1: # When the last row.
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i - 1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i - 1])
        else: # When in the middle.
            AB_i = abs(df['B'][i] - df['A'][i])
            AB_iabove = abs(df['B'][i   1] - df['A'][i])
            AB_ibelow = abs(df['B'][i - 1] - df['A'][i])
            if AB_i == min(AB_i, AB_iabove, AB_ibelow): # if the current B is the closest.
                print(df["value"][i]) 
            elif AB_iabove == min(AB_i, AB_iabove, AB_ibelow): # if B above the current is the closest.
                print(df["value"][i - 1])
            else: # if B below the current is the closest.
                print(df["value"][i   1])

It CAN'T return you only 1.0 and 3.0: if it does, it is illogical based on your current description of the program.

The fifth code (checks all the values forward and prints "value" to the first B > A. If there is none B > A, prints nothing.)

for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        for j in range(i, df.shape[0]): # from i so that it won't raise an error.
            if df['A'][i] < df['B'][j]: print(df["value"][j]); break

The sixth code (checks B and B above, prints "value" from the string below closer B) (sorry for unformatted first time code):

for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        try: # it is not needed with current data, it is made for other data.
            if df['B'][i] - df['A'][i] <= df['B'][i - 1] - df['A'][i]: print(df["value"][i])
            else: print(df["value"][i   1])
        except: print(df["value"][i])

I asked you about the first line for this code may be used not only for the given data - data can be different, but the program must work anyway. In this case the program checks B in front of A and B above, returns "value" below the closer B. If it is impossible to compare current B with one above (it can only happen in the first line if A isn't < B), returns "value" from the same line (i.e. from the first). Gosh, I hope that's what you need.

The seventh code (does the same as the previous one but only for inputed A):

try:
    A = float(input())
    i = df[df['A'] == A].index.values.astype(int)[0]
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        try: # it is not needed with current data, it is made for other data.
            if df['B'][i] - df['A'][i] <= df['B'][i - 1] - df['A'][i]: print(df["value"][i])
            else: print(df["value"][i   1])
        except: print(df["value"][i])
except: print("No such value for A in dataframe.")

The eighth code (replaces all the values in A column with inputed A and searches for "value" for the closest B in the B column):

A = float(input())
for i in range(df.shape[0]):
    df['A'][i] = A
residuals = abs(df['A'] - df['B'])
i = residuals.idxmin()
print(df["value"][i])