Home > front end >  Apply a function that return 2 columns to all columns of a panda frame
Apply a function that return 2 columns to all columns of a panda frame

Time:09-17

I have a pandas frame with several measurements for a number of samples. E.g. measuring temperature, length, and width of multiple widgets.

I also have a frame with the acceptable limits for each type of measurements. I need to apply both low and high limits to the measurements and return a frame that has the low limit subtracted from the measurement and the measurement subtracted from the high limit. I.e. for each column in the measurement table I should get back two columns, and the number of rows in the output is the same as the number of rows in the input

Here is a toy example:

import pandas as pd

df = pd.DataFrame({'a':[1,2,3], 'b':[30,40,50]})
limits = pd.DataFrame({'col':['a','b'], 'low':[1,10], 'high':[10,100]})

What I was able to come up with almost does the job, expect it appends the low and high limit applied columns along axis= 0 instead of axis=1

import pandas as pd

def apply_limits(col):
    temp = limits[limits.col == col.name]
    return pd.concat([col - temp.low.values[0], col - temp.high.values[0]])

df = pd.DataFrame({'a':[1,2,3], 'b':[30,40,50]})
limits = pd.DataFrame({'col':['a','b'], 'low':[1,10], 'high':[10,100]})

df.apply(apply_limits)

returns

    a   b
0   0   20
1   1   30
2   2   40
0   9   70
1   8   60
2   7   50

I get what I need if I do

temp = df.apply(apply_limits)
pd.concat([temp.iloc[:3], temp.iloc[3:]], axis = 1)

    a   b   a   b
0   0   20  9   70
1   1   30  8   60
2   2   40  7   50

I wonder if there is a more elegant way of doing it.

CodePudding user response:

Not sure it's more elegant but a try without an apply and no temporary dataframes:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [30, 40, 50]})
limits = pd.DataFrame({'col': ['a', 'b'], 'low': [1, 10], 'high': [10, 100]})

# I transform the limits into a dict in order to ease looping on it
min_max_dict = limits.set_index("col").transpose().to_dict(orient="list")

print(
    pd.concat(
        [
            pd.DataFrame(
                {
                    f"{key}_min": df[key] - min(min_max_dict[key]),
                    f"{key}_max": max(min_max_dict[key]) - df[key],
                }
            ).rename({f"{key}_min": key, f"{key}_max": key}, axis="columns")
            # you can remove the rename if you don't care to have a_min and
            # a_min column name.
            for key in min_max_dict
        ],
        axis="columns",
    )
)

It gives (only the dataframe print):

   a  a   b   b
0  0  9  20  70
1  1  8  30  60
2  2  7  40  50

CodePudding user response:

Here is another solution on your problem:

get the index and build the dummulative sum everytime the index starts at 0 again, then use a list comprehension for all groups of a groupby and concat these along axis=1.

out = out.reset_index()
m = out.pop('index').eq(0).cumsum().sub(1)
print(m)
res = pd.concat([grp.reset_index(drop=True) for idx,grp in out.groupby(m)],axis=1)
print(res)

Output:

# m
0    0
1    0
2    0
3    1
4    1
5    1
Name: index, dtype: int32

# res
   a   b  a   b
0  0  20 -9 -70
1  1  30 -8 -60
2  2  40 -7 -50

(don't know why in the 2nd half of the df all values are negative and not like in your output all positive, it happens when i run your code)

  • Related