Home > Enterprise >  index match in Python
index match in Python

Time:07-22

I have the following dataframe

value A B
1.0 7.0 8.0
2.0 9.0 8.8
3.0 9.5 9.1
4.0 10.0 9.4
5.0 13.0 9.7
6.0 15.0 9.9
7.0 16.0 10.6
8.0 17.0 17.0

What I'm attempting to do:

example:

I'm thinking some sort of if/else statetement:

    -if 
        A < B
      return 1.0 ==> since A=7.0 < B=8.0
    if A=B,
      return value => if A=17, B=17 return 8.0
    -else: 
        if A > B
         look at the two smaller values closest to match A from B column and return value from B 1.
         Let's say the value A=9.0 so in this example it's going to check B = 8.0 and B=8.8 and return the value for B=9.1 which is 3.0. 

Couple more examples in case it's unclear:

if A=9, check B=8.0 and B=8.8 and return 3.0

if A=9.5, check B=9.1 and B=9.4 and return 5.0

if A=10.0, check B=9.7 and B=9.9 and return 7.0

if A=16, check B=9.9 and B=10.6 and return 8.0

I tried using numpy for this, and indexing it... np.where looked promising but I keep getting stuck in the second part. Can anyone help? It's safe to assume that the values are sorted in the ascending order.

CodePudding user response:

I'm sorry to say it, but your question is not clear at all. You made strange mistakes and did not explain clearly what you wanted.

So, what this program does:

  1. If A < B, return "value" from the same line;
  2. If A > B, looks through the whole dataframe from the current B up to the end and takes "value" from the first line where B is more than A.

If it is not what you wanted, please, give more clear explanation) Delighted to help you.

import pandas as pd
df = pd.DataFrame({
               "value": [1., 2., 3., 4., 5., 6], 
               'A': [7., 9., 9.5, 10., 13., 15.],
               'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
             })
for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: 
        print(df["value"][i])
    else:
        for j in range(i, df.shape[0]):
            if df['A'][i] < df['B'][j]:
                print(df["value"][j])

Edited: output is 1.0, 3.0, 4.0. I've just seen that you noticed your signs, sorry)

The second code (looks for the closest number from the whole dataframe):

import pandas as pd
df = pd.DataFrame({
               "value": [1., 2., 3., 4., 5., 6], 
               'A': [7., 9., 9.5, 10., 13., 15.],
               'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
             })
for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        b_column = df['B']
        b_column_residuals = abs(df['B'] - df['A'][i])
        print(df["value"][b_column_residuals.idxmin()])

The trird code (looks for the closest number from the current B and one below (or above when the last B)):

import pandas as pd
df = pd.DataFrame({
               "value": [1., 2., 3., 4., 5., 6], 
               'A': [7., 9., 9.5, 10., 13., 15.],
               'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
             })
for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        try:
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i   1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i   1])
        except:
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i - 1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i - 1])

The fourth code (compares the current B, the B above and below):

for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        if i == 0: # When the first row.
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i   1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i   1])
        elif i == df.shape[0] - 1: # When the last row.
            if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i - 1] - df['A'][i]): print(df["value"][i])
            else: print(df["value"][i - 1])
        else: # When in the middle.
            AB_i = abs(df['B'][i] - df['A'][i])
            AB_iabove = abs(df['B'][i   1] - df['A'][i])
            AB_ibelow = abs(df['B'][i - 1] - df['A'][i])
            if AB_i == min(AB_i, AB_iabove, AB_ibelow): # if the current B is the closest.
                print(df["value"][i]) 
            elif AB_iabove == min(AB_i, AB_iabove, AB_ibelow): # if B above the current is the closest.
                print(df["value"][i - 1])
            else: # if B below the current is the closest.
                print(df["value"][i   1])

It CAN'T return you only 1.0 and 3.0: if it does, it is illogical based on your current description of the program.

The fifth code (checks all the values forward and prints "value" to the first B > A. If there is none B > A, prints nothing.)

for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        for j in range(i, df.shape[0]): # from i so that it won't raise an error.
            if df['A'][i] < df['B'][j]: print(df["value"][j]); break

The sixth code (checks B and B above, prints "value" from the string below closer B) (sorry for unformatted first time code):

for i in range(df.shape[0]):
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        try: # it is not needed with current data, it is made for other data.
            if df['B'][i] - df['A'][i] <= df['B'][i - 1] - df['A'][i]: print(df["value"][i])
            else: print(df["value"][i   1])
        except: print(df["value"][i])

I asked you about the first line for this code may be used not only for the given data - data can be different, but the program must work anyway. In this case the program checks B in front of A and B above, returns "value" below the closer B. If it is impossible to compare current B with one above (it can only happen in the first line if A isn't < B), returns "value" from the same line (i.e. from the first). Gosh, I hope that's what you need.

The seventh code (does the same as the previous one but only for inputed A):

try:
    A = float(input())
    i = df[df['A'] == A].index.values.astype(int)[0]
    if df['A'][i] < df['B'][i]: print(df["value"][i])
    else:
        try: # it is not needed with current data, it is made for other data.
            if df['B'][i] - df['A'][i] <= df['B'][i - 1] - df['A'][i]: print(df["value"][i])
            else: print(df["value"][i   1])
        except: print(df["value"][i])
except: print("No such value for A in dataframe.")

The eighth code (replaces all the values in A column with inputed A and searches for "value" for the closest B in the B column):

A = float(input())
for i in range(df.shape[0]):
    df['A'][i] = A
residuals = abs(df['A'] - df['B'])
i = residuals.idxmin()
print(df["value"][i])
  • Related