Find new value occur and nearest value from another column-CodePudding

I found this is complicated, I have a Dataframe and want the first row when a value change in column A and want the row that contains nearest value in column B to new value from A in the previous group.

For example:

A   | B    
----------
803 |803.4 <- first row
803 |803.5
803 |803.6
803 |803.9 <- nearest to next col A value 805
803 |803.7
803 |803.8
----------
805 |804.4 <- first row
805 |804.5
805 |804.6
805 |804.9
805 |804.3
805 |804.2 <- nearest to next col A value 804
----------
804 |804.2 <- first row
804 |804.1
804 |803.9 <- nearest to next  col A value 803
----------
803 |803.4 <- first row
...

CodePudding user response：

I would say that u shuld give the list in a int varaible and then search for the bigest number

i = 1 
while i < 7:
if yourint<yournewint:
   yourint=yournewint
if i == 6:
   break
i  = 1

CodePudding user response：

One solution is to first identify the separete groups using shift and cumsum, then get the first value of each group as a series. This series can then be used to find the first value of the next group (i.e., the value to compare with) for each group. To do this, we use add(1) and map.

With the first A value of the next group obtained, the difference between all B values and this can be computed. We identify all rows with the minimum difference with idxmin and return only these ones with loc.

Full code:

df['grp'] = (df['A'] != df['A'].shift(1)).cumsum()
grp_first = df.groupby('grp')['A'].first()

df['next_grp_val'] = df['grp'].add(1).map(grp_first).fillna(0)
df['diff'] = df['B'].sub(df['next_grp_val']).abs()

df.loc[df.groupby('grp')['diff'].idxmin(), ['A', 'B']]

Result:

      A      B
3   803  803.9
11  805  804.2
14  804  803.9
15  803  803.4

CodePudding user response：

Create a list of values in column A but remove all neighboring duplicates: values = [803, 805, 804, 803]. This can be done using some panda functions.
Loop through each row in dataframe.
For each row, calculate the difference between next value in values and current value in Column B.
Finally, find the row which gives the minimum difference.

data :

A,B
803,803.4
803,803.5
803,803.6
803,803.9
803,803.7
803,803.8
805,804.4
805,804.5
805,804.6
805,804.9
805,804.3
805,804.2
804,804.2
804,804.1
804,803.9
803,803.4

Code :

import pandas as pd

df = pd.read_csv('table')
minimum_difference = 1000
best_row = df.iloc[0]
first_row = df.iloc[0]

# obtain a list of values in column A with no neighbouring duplicates
ind = df['A'] == df['A'].shift()
values = df.loc[~ind].iloc[:, 0].tolist()  # values in column A
# print(values)  # [803, 805, 804, 803]
ptr = 0  # points to current value in values

for index, row in df.iterrows():
    if row['A'] != values[ptr]:
        # print first row
        print("First row :\n", first_row, "\n")
        # print previous row
        print("Best row :\n", best_row, "\n")
        print("\n")

        # reset everything
        ptr  = 1
        first_row = row
        minimum_difference = 1000

    # Skip last value in column A
    if ptr == (len(values)-1):
        break
    next_unique_value = values[ptr 1]
    if(abs(next_unique_value-row['B']) < minimum_difference):
        best_row = row
        minimum_difference = abs(next_unique_value-row['B'])

Output :

First row :
 A    803.0
B    803.4
Name: 0, dtype: float64 

Best row :
 A    803.0
B    803.9
Name: 3, dtype: float64 



First row :
 A    805.0
B    804.4
Name: 6, dtype: float64 

Best row :
 A    805.0
B    804.2
Name: 11, dtype: float64 



First row :
 A    804.0
B    804.2
Name: 12, dtype: float64 

Best row :
 A    804.0
B    803.9
Name: 14, dtype: float64

CodePudding user response：

Use:

#consecutive groups
g = df.A.ne(df.A.shift()).cumsum()
#aggregate lists
s = df.groupby(['A',g], sort=False)['B'].agg(list)

#Series with next lists
s1 = s.shift(-1,fill_value=[[]])

#get nearest values of previous group
vals = [a[(np.abs(np.array(a) - b[0])).argmin()] 
        if len(b) > 0 else None 
        for a, b in zip(s, s1)]
print (vals)
[803.9, 804.2, 803.9, None]

#convert to DataFrame
df = (s.to_frame().assign(first = lambda x: x.pop('B').str[0], nearest = vals)
       .droplevel(-1).reset_index())

print (df)
     A  first  nearest
0  803  803.4    803.9
1  805  804.4    804.2
2  804  804.2    803.9
3  803  803.4      NaN

Another idea:

#consecutive groups
g = df.A.ne(df.A.shift()).cumsum()
#first value of next group
next_g = df['B'].shift(-1)

#difference of next value of previous group
diff = df['B'].sub(next_g.groupby(g, sort=False).transform('last')).abs()

#get row by minimal difference and add first value of group
df = (df.loc[diff.fillna(0).groupby(g).idxmin()]
        .reset_index(drop=True)
        .assign(first=lambda x: df.B.groupby(g).first().to_numpy())
        .rename(columns={'B':'nearest'}))
print (df)
     A  nearest  first
0  803    803.9  803.4
1  805    804.2  804.4
2  804    803.9  804.2
3  803    803.4  803.4