Pandas dataframe, in a row, to find the max in selected column, and find value of another column bas-CodePudding

I have a dataframe like this:

import pandas as pd
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})

df

       x1  y1  x2  y2
    0  20   5  22  10
    1  25   8  27   2

X and Y pair together. I need to compare y1 and y2 and get the max in every row. And find the corresponding x. Hence the max of row [0] is y2 (=10), and the corresponding x is x2 (=22). The second row will be y1 (=8) and x1(=25). Expected result, new columns x and y:

   x1  y1  x2  y2   x   y
0  20   5  22  10  22  10
1  25   8  27   2  25   8

This is a simple dataframe I made to elaborate on the question. X and Y pairs, in my case, can be 30 pairs.

CodePudding user response：

# get a hold on "y*" columns
y_cols = df.filter(like="y")

# get the maximal y-values' suffixes, and then add from front "x" to them
max_x_vals = y_cols.idxmax(axis=1).str.extract(r"(\d )$", expand=False).radd("x")
# get the locations of those x* values
max_x_ids = df.columns.get_indexer(max_x_vals)

# now we have the indexes of x*'s in the columns; NumPy's indexing
# helps to get a cross section
df["max_xs"] = df.to_numpy()[np.arange(len(df)), max_x_ids]

# for y*'s, it's directly the maximum per row
df["max_ys"] = y_cols.max(axis=1)

to get

>>> df

   x1  y1  x2  y2  max_xs  max_ys
0  20   5  22  10      22      10
1  25   8  27   2      25       8

CodePudding user response：

You can do it with the help of .apply function.

import pandas as pd
import numpy as np

df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
y_cols = [col for col in df.columns if col[0] == 'y'] 
x_cols = [col for col in df.columns if col[0] == 'x'] 

def find_corresponding_x(row):
    max_y_index = np.argmax(row[y_cols])
    return row[f'{x_cols[max_y_index]}']

df['corresponding_x'] = df.apply(find_corresponding_x, axis = 1)

CodePudding user response：

this is one solution:

a = df[df['y1'] < df['y2']].drop(columns=['y1','x1']).rename(columns={'y2':'y', 'x2':'x'})
b = df[df['y1'] >= df['y2']].drop(columns=['y2','x2']).rename(columns={'y1':'y', 'x1':'x'})

result = pd.concat([a,b])

if you need to keep order then maybe add another column with original index and sort by it after concatenation

CodePudding user response：

you can use the function below. remember to import pandas and numpy like I did in this code. import your data set and use Max_number function.

import pandas as pd
import numpy as np
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})

def Max_number (df):
    columns = list(df.columns)
    rows = df.shape[0]
    max_value = []
    column_name = []

    for i in range(rows):
        row_array = list(np.array(df[i:i 1])[0])
        maximum = max(row_array)
        max_value.append(maximum)
        index=row_array.index(maximum)
        column_name.append(columns[index])
    
    return pd.DataFrame({"column":column_name,"max_value":max_value})

returns this:

row index	column	max_value
0	x2	22
1	x2	27

CodePudding user response：

if x1 column comes first and then y1, then x2, y2 and so on, you can just try:

a = df.columns.get_indexer(y_cols.idxmax(axis=1))
df[['y', 'x']] = df.to_numpy()[np.arange(len(df)), [a, a - 1]].T

CodePudding user response：

I hope it works for your solution,

import pandas as pd
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
df['x_max'] = df[['x1', 'x2']].max(axis=1)
df['y_max'] = df[['y1', 'y2']].max(axis=1)
df