I have the following dataframe
value | A | B |
---|---|---|
1.0 | 7.0 | 8.0 |
2.0 | 9.0 | 8.8 |
3.0 | 9.5 | 9.1 |
4.0 | 10.0 | 9.4 |
5.0 | 13.0 | 9.7 |
6.0 | 15.0 | 9.9 |
7.0 | 16.0 | 10.6 |
8.0 | 17.0 | 17.0 |
What I'm attempting to do:
example:
I'm thinking some sort of if/else statetement:
-if
A < B
return 1.0 ==> since A=7.0 < B=8.0
if A=B,
return value => if A=17, B=17 return 8.0
-else:
if A > B
look at the two smaller values closest to match A from B column and return value from B 1.
Let's say the value A=9.0 so in this example it's going to check B = 8.0 and B=8.8 and return the value for B=9.1 which is 3.0.
Couple more examples in case it's unclear:
if A=9, check B=8.0 and B=8.8 and return 3.0
if A=9.5, check B=9.1 and B=9.4 and return 5.0
if A=10.0, check B=9.7 and B=9.9 and return 7.0
if A=16, check B=9.9 and B=10.6 and return 8.0
I tried using numpy for this, and indexing it... np.where looked promising but I keep getting stuck in the second part. Can anyone help? It's safe to assume that the values are sorted in the ascending order.
CodePudding user response:
I'm sorry to say it, but your question is not clear at all. You made strange mistakes and did not explain clearly what you wanted.
So, what this program does:
- If A < B, return "value" from the same line;
- If A > B, looks through the whole dataframe from the current B up to the end and takes "value" from the first line where B is more than A.
If it is not what you wanted, please, give more clear explanation) Delighted to help you.
import pandas as pd
df = pd.DataFrame({
"value": [1., 2., 3., 4., 5., 6],
'A': [7., 9., 9.5, 10., 13., 15.],
'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
})
for i in range(df.shape[0]):
if df['A'][i] < df['B'][i]:
print(df["value"][i])
else:
for j in range(i, df.shape[0]):
if df['A'][i] < df['B'][j]:
print(df["value"][j])
Edited: output is 1.0, 3.0, 4.0. I've just seen that you noticed your signs, sorry)
The second code (looks for the closest number from the whole dataframe):
import pandas as pd
df = pd.DataFrame({
"value": [1., 2., 3., 4., 5., 6],
'A': [7., 9., 9.5, 10., 13., 15.],
'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
})
for i in range(df.shape[0]):
if df['A'][i] < df['B'][i]: print(df["value"][i])
else:
b_column = df['B']
b_column_residuals = abs(df['B'] - df['A'][i])
print(df["value"][b_column_residuals.idxmin()])
The trird code (looks for the closest number from the current B and one below (or above when the last B)):
import pandas as pd
df = pd.DataFrame({
"value": [1., 2., 3., 4., 5., 6],
'A': [7., 9., 9.5, 10., 13., 15.],
'B': [8., 8.8, 9.1, 9.4, 8.4, 8.5]
})
for i in range(df.shape[0]):
if df['A'][i] < df['B'][i]: print(df["value"][i])
else:
try:
if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i 1] - df['A'][i]): print(df["value"][i])
else: print(df["value"][i 1])
except:
if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i - 1] - df['A'][i]): print(df["value"][i])
else: print(df["value"][i - 1])
The fourth code (compares the current B, the B above and below):
for i in range(df.shape[0]):
if df['A'][i] < df['B'][i]: print(df["value"][i])
else:
if i == 0: # When the first row.
if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i 1] - df['A'][i]): print(df["value"][i])
else: print(df["value"][i 1])
elif i == df.shape[0] - 1: # When the last row.
if abs(df['B'][i] - df['A'][i]) < abs(df['B'][i - 1] - df['A'][i]): print(df["value"][i])
else: print(df["value"][i - 1])
else: # When in the middle.
AB_i = abs(df['B'][i] - df['A'][i])
AB_iabove = abs(df['B'][i 1] - df['A'][i])
AB_ibelow = abs(df['B'][i - 1] - df['A'][i])
if AB_i == min(AB_i, AB_iabove, AB_ibelow): # if the current B is the closest.
print(df["value"][i])
elif AB_iabove == min(AB_i, AB_iabove, AB_ibelow): # if B above the current is the closest.
print(df["value"][i - 1])
else: # if B below the current is the closest.
print(df["value"][i 1])
It CAN'T return you only 1.0 and 3.0: if it does, it is illogical based on your current description of the program.
The fifth code (checks all the values forward and prints "value" to the first B > A. If there is none B > A, prints nothing.)
for i in range(df.shape[0]):
if df['A'][i] < df['B'][i]: print(df["value"][i])
else:
for j in range(i, df.shape[0]): # from i so that it won't raise an error.
if df['A'][i] < df['B'][j]: print(df["value"][j]); break
The sixth code (checks B and B above, prints "value" from the string below closer B) (sorry for unformatted first time code):
for i in range(df.shape[0]):
if df['A'][i] < df['B'][i]: print(df["value"][i])
else:
try: # it is not needed with current data, it is made for other data.
if df['B'][i] - df['A'][i] <= df['B'][i - 1] - df['A'][i]: print(df["value"][i])
else: print(df["value"][i 1])
except: print(df["value"][i])
I asked you about the first line for this code may be used not only for the given data - data can be different, but the program must work anyway. In this case the program checks B in front of A and B above, returns "value" below the closer B. If it is impossible to compare current B with one above (it can only happen in the first line if A isn't < B), returns "value" from the same line (i.e. from the first). Gosh, I hope that's what you need.
The seventh code (does the same as the previous one but only for inputed A):
try:
A = float(input())
i = df[df['A'] == A].index.values.astype(int)[0]
if df['A'][i] < df['B'][i]: print(df["value"][i])
else:
try: # it is not needed with current data, it is made for other data.
if df['B'][i] - df['A'][i] <= df['B'][i - 1] - df['A'][i]: print(df["value"][i])
else: print(df["value"][i 1])
except: print(df["value"][i])
except: print("No such value for A in dataframe.")
The eighth code (replaces all the values in A column with inputed A and searches for "value" for the closest B in the B column):
A = float(input())
for i in range(df.shape[0]):
df['A'][i] = A
residuals = abs(df['A'] - df['B'])
i = residuals.idxmin()
print(df["value"][i])