Consider the following table. The first column, Data1, contains data values that are clustered in groups: there are values around 100 and 200. I am wondering how I can apply a function that deals with each data grouping separately, perhaps by applying an if statement that excludes data points with values too far apart to be considered a neighboring data point.
Data1 Value1
99 1
100 2
101 3
102 4
199 5
200 6
201 7
... ...
For example, if I want to generate a third column called "Result1" that adds every Data1 cluster's corresponding Value1 together. The result would look something like this, where 1 2 3 4=10 and 5 6 7=18:
Data1 Value1 Result1
99 1 10
100 2 10
101 3 10
102 4 10
199 5 18
200 6 18
201 7 18
... ... ...
CodePudding user response:
Try merge_asof
:
data = [100,200]
labels = pd.merge_asof(df, pd.DataFrame({'label':data}),
left_on='Data1', right_on='label',
direction='nearest')['label']
df['Result1'] = df.groupby(labels)['Value1'].transform('sum')
Output:
Data1 Value1 Result1
0 99 1 10
1 100 2 10
2 101 3 10
3 102 4 10
4 199 5 18
5 200 6 18
6 201 7 18
CodePudding user response:
In your case, a simple mask aught to do.
mask = df[“Data1”]<150
df.loc[mask,”Result1”] = df.loc[mask,”Value1”].sum()
df.loc[~mask,”Result1”] = ”df.loc[~mask,”Value1”].sum()