I have a pandas frame with several measurements for a number of samples. E.g. measuring temperature, length, and width of multiple widgets.
I also have a frame with the acceptable limits for each type of measurements. I need to apply both low and high limits to the measurements and return a frame that has the low limit subtracted from the measurement and the measurement subtracted from the high limit. I.e. for each column in the measurement table I should get back two columns, and the number of rows in the output is the same as the number of rows in the input
Here is a toy example:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[30,40,50]})
limits = pd.DataFrame({'col':['a','b'], 'low':[1,10], 'high':[10,100]})
What I was able to come up with almost does the job, expect it appends the low and high limit applied columns along axis= 0 instead of axis=1
import pandas as pd
def apply_limits(col):
temp = limits[limits.col == col.name]
return pd.concat([col - temp.low.values[0], col - temp.high.values[0]])
df = pd.DataFrame({'a':[1,2,3], 'b':[30,40,50]})
limits = pd.DataFrame({'col':['a','b'], 'low':[1,10], 'high':[10,100]})
df.apply(apply_limits)
returns
a b
0 0 20
1 1 30
2 2 40
0 9 70
1 8 60
2 7 50
I get what I need if I do
temp = df.apply(apply_limits)
pd.concat([temp.iloc[:3], temp.iloc[3:]], axis = 1)
a b a b
0 0 20 9 70
1 1 30 8 60
2 2 40 7 50
I wonder if there is a more elegant way of doing it.
CodePudding user response:
Not sure it's more elegant but a try without an apply and no temporary dataframes:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [30, 40, 50]})
limits = pd.DataFrame({'col': ['a', 'b'], 'low': [1, 10], 'high': [10, 100]})
# I transform the limits into a dict in order to ease looping on it
min_max_dict = limits.set_index("col").transpose().to_dict(orient="list")
print(
pd.concat(
[
pd.DataFrame(
{
f"{key}_min": df[key] - min(min_max_dict[key]),
f"{key}_max": max(min_max_dict[key]) - df[key],
}
).rename({f"{key}_min": key, f"{key}_max": key}, axis="columns")
# you can remove the rename if you don't care to have a_min and
# a_min column name.
for key in min_max_dict
],
axis="columns",
)
)
It gives (only the dataframe print):
a a b b
0 0 9 20 70
1 1 8 30 60
2 2 7 40 50
CodePudding user response:
Here is another solution on your problem:
get the index and build the dummulative sum everytime the index starts at 0 again, then use a list comprehension for all groups of a groupby and concat these along axis=1.
out = out.reset_index()
m = out.pop('index').eq(0).cumsum().sub(1)
print(m)
res = pd.concat([grp.reset_index(drop=True) for idx,grp in out.groupby(m)],axis=1)
print(res)
Output:
# m
0 0
1 0
2 0
3 1
4 1
5 1
Name: index, dtype: int32
# res
a b a b
0 0 20 -9 -70
1 1 30 -8 -60
2 2 40 -7 -50
(don't know why in the 2nd half of the df all values are negative and not like in your output all positive, it happens when i run your code)