I am working with one data set. Data contains values with different decimal places. Data and code you can see below :
data = {
'value':[9.1,10.5,11.8,
20.1,21.2,22.8,
9.5,10.3,11.9,
]
}
df = pd.DataFrame(data, columns = ['value'])
Which gives the following dataframe:
value
0 9.1
1 10.5
2 11.8
3 20.1
4 21.2
5 22.8
6 9.5
7 10.3
8 11.9
Now I want to add a new column with the title adjusted
.This column I want to calculate with numpy.isclose
function with a tolerance of 2 (plus or minus 1). At the end I expect to have results as result shown in the next table
value adjusted
0 9.1 10
1 10.5 10
2 11.8 10
3 20.1 21
4 21.2 21
5 22.8 21
6 9.5 10
7 10.3 10
8 11.9 10
I tried with this line but I get only results such true and false and also this is only for one value (10) not for all values.
np.isclose(df1['value'],10,atol=2)
So can anybody help me how to solve this problem and calculate tolerance for values 10 and 21 with one line ?
CodePudding user response:
For only two distinct values, one possible solution is to use np.where:
df['adjusted'] = np.where((df['value'] >= 8) & (df['value'] <= 12), 10, 21)
CodePudding user response:
The exact logic and how this would generalize are not fully clear. Below are two options.
Assuming you want to test your values against a list of defined references, you can use the underlying numpy array and broadcasting:
vals = np.array([10, 21])
a = df['value'].to_numpy()
m = np.isclose(a[:, None], vals, atol=2)
df['adjusted'] = np.where(m.any(1), vals[m.argmax(1)], np.nan)
Assuming you want to group successive values, you can get the diff
and start a new group when the difference is above threshold. Then round
and get the median per group with groupby.transform
:
group = df['value'].diff().abs().gt(2).cumsum()
df['adjusted'] = df['value'].round().groupby(group).transform('median')
Output:
value adjusted
0 9.1 10.0
1 10.5 10.0
2 11.8 10.0
3 20.1 21.0
4 21.2 21.0
5 22.8 21.0
6 9.5 10.0
7 10.3 10.0
8 11.9 10.0