I have a dataframe like this:
import pandas as pd
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
df
x1 y1 x2 y2
0 20 5 22 10
1 25 8 27 2
X and Y pair together. I need to compare y1 and y2 and get the max in every row. And find the corresponding x. Hence the max of row [0] is y2 (=10), and the corresponding x is x2 (=22). The second row will be y1 (=8) and x1(=25). Expected result, new columns x and y:
x1 y1 x2 y2 x y
0 20 5 22 10 22 10
1 25 8 27 2 25 8
This is a simple dataframe I made to elaborate on the question. X and Y pairs, in my case, can be 30 pairs.
CodePudding user response:
# get a hold on "y*" columns
y_cols = df.filter(like="y")
# get the maximal y-values' suffixes, and then add from front "x" to them
max_x_vals = y_cols.idxmax(axis=1).str.extract(r"(\d )$", expand=False).radd("x")
# get the locations of those x* values
max_x_ids = df.columns.get_indexer(max_x_vals)
# now we have the indexes of x*'s in the columns; NumPy's indexing
# helps to get a cross section
df["max_xs"] = df.to_numpy()[np.arange(len(df)), max_x_ids]
# for y*'s, it's directly the maximum per row
df["max_ys"] = y_cols.max(axis=1)
to get
>>> df
x1 y1 x2 y2 max_xs max_ys
0 20 5 22 10 22 10
1 25 8 27 2 25 8
CodePudding user response:
You can do it with the help of .apply function.
import pandas as pd
import numpy as np
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
y_cols = [col for col in df.columns if col[0] == 'y']
x_cols = [col for col in df.columns if col[0] == 'x']
def find_corresponding_x(row):
max_y_index = np.argmax(row[y_cols])
return row[f'{x_cols[max_y_index]}']
df['corresponding_x'] = df.apply(find_corresponding_x, axis = 1)
CodePudding user response:
this is one solution:
a = df[df['y1'] < df['y2']].drop(columns=['y1','x1']).rename(columns={'y2':'y', 'x2':'x'})
b = df[df['y1'] >= df['y2']].drop(columns=['y2','x2']).rename(columns={'y1':'y', 'x1':'x'})
result = pd.concat([a,b])
if you need to keep order then maybe add another column with original index and sort by it after concatenation
CodePudding user response:
you can use the function below. remember to import pandas and numpy like I did in this code. import your data set and use Max_number function.
import pandas as pd
import numpy as np
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
def Max_number (df):
columns = list(df.columns)
rows = df.shape[0]
max_value = []
column_name = []
for i in range(rows):
row_array = list(np.array(df[i:i 1])[0])
maximum = max(row_array)
max_value.append(maximum)
index=row_array.index(maximum)
column_name.append(columns[index])
return pd.DataFrame({"column":column_name,"max_value":max_value})
returns this:
row index | column | max_value |
---|---|---|
0 | x2 | 22 |
1 | x2 | 27 |
CodePudding user response:
if x1 column comes first and then y1, then x2, y2 and so on, you can just try:
a = df.columns.get_indexer(y_cols.idxmax(axis=1))
df[['y', 'x']] = df.to_numpy()[np.arange(len(df)), [a, a - 1]].T
CodePudding user response:
I hope it works for your solution,
import pandas as pd
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
df['x_max'] = df[['x1', 'x2']].max(axis=1)
df['y_max'] = df[['y1', 'y2']].max(axis=1)
df