check if an element is a neighbor and then create a new column-CodePudding

I have a dataframe which consists of the five columns "ID", "Name", "pos_x", "pos_y" and "Volume", something like this:

ID | Name | pos_x | pos_y | volume
1  |  A   |  1    |  1.5  |  10
2  |  A   |  3.5  |  3    |   6
3  |  A   |  4    |  4    |   8
4  |  A   |  4.5  |  4.5  |   9
5  |  A   |  5    |  6    |  10
1  |  B   |  1.2  |  1.2  |   4
3  |  B   |  4.3  |  4.4  |   8
4  |  B   |  4.5  |  4.2  |   7
2  |  C   |  3    |  3.3  |   9
3  |  C   |  4.2  |  4.1  |  10

I would now like to create a new column in the dataframe ("volume_avg") in which the average of the volume of all ID`s is calculated, which lie within a radius of 2.5 mm to the respective ID.

My idea was to first run through each ID for each name (because the name ID is the primary key) and then look at the position x and y for each ID. With an if loop I would then check if the ID is a neighbor or not.

So the conditions would be:

if pos_x (current ID) < pos_x (all other IDs) - 2.5 OR > pos_x (all other IDs) 2.5

else if pos_y (current ID) < pos_y (all other IDs) - 2.5 OR > pos_y (all other IDs) 2.5

then calculate sum(volume)/count(ID's)

otherwise nothing

Unfortunately I don't know how to write these considerations in a python code.... I would be glad if you can help me. Thanks a lot.

David

CodePudding user response：

I'm interpreting the conditions in the if and the else portion to be rectangular limits on the (x,y) coordinates. If this is incorrect please let me know and I can adjust the logic in the if/else.

import pandas as pd

# create the structure of example dataframe
df = pd.DataFrame([[1, 'A', 1, 1.5, 10],
                   [2, 'A', 3.5, 3, 6],
                   [3, 'A', 4, 4, 8],
                   [4, 'A', 4.5, 4.5, 9],
                   [5 ,'A' , 5 , 6 , 10],
                   [1 ,'B' , 1.2 , 1.2 , 4],
                   [3 ,'B' , 4.3 , 4.4 , 8],
                   [4 ,'B' , 4.5 , 4.2 , 7],
                   [2 ,'C' , 3 , 3.3 , 9],
                   [3 ,'C' , 4.2 , 4.1 , 10],
                   ],
                  columns=['ID', 'Name', 'pos_x', 'pos_y', 'volume'])

# create volume average with rows that meet criterion
volume_avgs = []
for g, grouper in df.groupby('Name'):
    for base_row in grouper.iterrows():
        v_sum = 0
        v_count = 0
        for row in grouper.iterrows():
            if row[1].ID == base_row[1].ID: # skip the item itself
                continue
            elif abs(row[1].pos_x - base_row[1].pos_x) < 2.5 or abs(row[1].pos_y - base_row[1].pos_y) < 2.5:
                 v_sum  = row[1].volume
                 v_count  = 1
        volume_avgs.append([base_row[1].ID, g, v_sum / v_count if v_count > 0 else 0, v_count])

df_vol_avgs = pd.DataFrame(volume_avgs, columns=['ID', 'Name', 'volume_avg', 'volume_count'])

merged_df = df.merge(df_vol_avgs, on=['ID', 'Name'])
merged_df

Be aware that if the actual data are large this probably won't scale very well. Particularly to groups with a lot of data because of the double for loop. Also if you want a None or Nan if the total number of nearby points to average is 0 you can change the if/else from

v_sum / v_count if v_count > 0 else 0

v_sum / v_count if v_count > 0 else None

for the None case and similarly for Nan, say using numpy.nan.