Home > Software engineering >  Vectorized operation on three columns
Vectorized operation on three columns

Time:11-20

First, let us create random dataframe:

df = pd.DataFrame(
    {
    "A": np.random.randint(0, 70, size=5),
    "B": np.random.randint(-10, 35, size=5),
    "C": np.random.randint(10, 50, size=5)
    }
)

Then, I am using min and max functions to create two additional columns:

df['max'] = df[['A', 'B', 'C']].max(axis=1)
df['min'] = df[['A', 'B', 'C']].min(axis=1)

Output:

    A   B   C  max  min
0  17  26  31   31   17
1  45  31  17   45   17
2  36  24  31   36   24
3  16  17  24   24   16
4  16  12  23   23   12

What would be the most efficient and elegant way to get remaining value to the 'mid' column so that the output looked like this:

    A   B   C  max  min  mid
0  17  26  31   31   17   26
1  45  31  17   45   17   31
2  36  24  31   36   24   31
3  16  17  24   24   16   17
4  16  12  23   23   12   16

I am looking for vectorized solution. I was able to achieve this using conditions:

conditions = [((df['A'] > df['B']) & (df['A'] < df['C']) | (df['A'] > df['C']) & (df['A'] < df['B'])), 
              ((df['B'] > df['A']) & (df['B'] < df['C']) | (df['B'] > df['C']) & (df['B'] < df['A'])), 
              ((df['C'] > df['A']) & (df['C'] < df['B']) | (df['C'] > df['B']) & (df['C'] < df['A']))]

choices = [df['A'], df['B'], df['C']]

df['mid'] = np.select(conditions, choices, default=0)

However, I think there is more elegant solution for that.

CodePudding user response:

Should you use median?

df[["A","B","C"]].median(axis=1)

By the way, instead of running the aggregations one-by-one, you should everything in one go as follows:

df.join(df.agg([min, max, 'median'], axis=1))

OUTPUT

    A   B   C   min   max  median
0   2  22  38   2.0  38.0    22.0
1  29  15  40  15.0  40.0    29.0
2  48  -5  17  -5.0  48.0    17.0
3  17  18  43  17.0  43.0    18.0
4  60 -10  39 -10.0  60.0    39.0

The advantage of this is that, in a case like the one you described (i.e. you want to aggregate the entire row), you don't need to specify the name of the columns you want to aggregate. If you start adding one column with an aggregation, you need to make you sure you don't include the new column in the following aggregation - so you will need to speficy the columns you want to aggregate.

  • Related