I found this is complicated, I have a Dataframe
and want
the first row when a value change in column A
and want the row that contains nearest value in column B
to new value from A
in the previous group.
For example:
A | B
----------
803 |803.4 <- first row
803 |803.5
803 |803.6
803 |803.9 <- nearest to next col A value 805
803 |803.7
803 |803.8
----------
805 |804.4 <- first row
805 |804.5
805 |804.6
805 |804.9
805 |804.3
805 |804.2 <- nearest to next col A value 804
----------
804 |804.2 <- first row
804 |804.1
804 |803.9 <- nearest to next col A value 803
----------
803 |803.4 <- first row
...
CodePudding user response:
I would say that u shuld give the list in a int varaible and then search for the bigest number
i = 1
while i < 7:
if yourint<yournewint:
yourint=yournewint
if i == 6:
break
i = 1
CodePudding user response:
One solution is to first identify the separete groups using shift
and cumsum
, then get the first value of each group as a series. This series can then be used to find the first value of the next group (i.e., the value to compare with) for each group. To do this, we use add(1)
and map
.
With the first A
value of the next group obtained, the difference between all B
values and this can be computed. We identify all rows with the minimum difference with idxmin
and return only these ones with loc
.
Full code:
df['grp'] = (df['A'] != df['A'].shift(1)).cumsum()
grp_first = df.groupby('grp')['A'].first()
df['next_grp_val'] = df['grp'].add(1).map(grp_first).fillna(0)
df['diff'] = df['B'].sub(df['next_grp_val']).abs()
df.loc[df.groupby('grp')['diff'].idxmin(), ['A', 'B']]
Result:
A B
3 803 803.9
11 805 804.2
14 804 803.9
15 803 803.4
CodePudding user response:
Create a list of values in column A but remove all neighboring duplicates:
values = [803, 805, 804, 803]
. This can be done using some panda functions.Loop through each row in dataframe.
For each row, calculate the difference between next value in
values
and current value in Column B.Finally, find the row which gives the minimum difference.
data :
A,B
803,803.4
803,803.5
803,803.6
803,803.9
803,803.7
803,803.8
805,804.4
805,804.5
805,804.6
805,804.9
805,804.3
805,804.2
804,804.2
804,804.1
804,803.9
803,803.4
Code :
import pandas as pd
df = pd.read_csv('table')
minimum_difference = 1000
best_row = df.iloc[0]
first_row = df.iloc[0]
# obtain a list of values in column A with no neighbouring duplicates
ind = df['A'] == df['A'].shift()
values = df.loc[~ind].iloc[:, 0].tolist() # values in column A
# print(values) # [803, 805, 804, 803]
ptr = 0 # points to current value in values
for index, row in df.iterrows():
if row['A'] != values[ptr]:
# print first row
print("First row :\n", first_row, "\n")
# print previous row
print("Best row :\n", best_row, "\n")
print("\n")
# reset everything
ptr = 1
first_row = row
minimum_difference = 1000
# Skip last value in column A
if ptr == (len(values)-1):
break
next_unique_value = values[ptr 1]
if(abs(next_unique_value-row['B']) < minimum_difference):
best_row = row
minimum_difference = abs(next_unique_value-row['B'])
Output :
First row :
A 803.0
B 803.4
Name: 0, dtype: float64
Best row :
A 803.0
B 803.9
Name: 3, dtype: float64
First row :
A 805.0
B 804.4
Name: 6, dtype: float64
Best row :
A 805.0
B 804.2
Name: 11, dtype: float64
First row :
A 804.0
B 804.2
Name: 12, dtype: float64
Best row :
A 804.0
B 803.9
Name: 14, dtype: float64
CodePudding user response:
Use:
#consecutive groups
g = df.A.ne(df.A.shift()).cumsum()
#aggregate lists
s = df.groupby(['A',g], sort=False)['B'].agg(list)
#Series with next lists
s1 = s.shift(-1,fill_value=[[]])
#get nearest values of previous group
vals = [a[(np.abs(np.array(a) - b[0])).argmin()]
if len(b) > 0 else None
for a, b in zip(s, s1)]
print (vals)
[803.9, 804.2, 803.9, None]
#convert to DataFrame
df = (s.to_frame().assign(first = lambda x: x.pop('B').str[0], nearest = vals)
.droplevel(-1).reset_index())
print (df)
A first nearest
0 803 803.4 803.9
1 805 804.4 804.2
2 804 804.2 803.9
3 803 803.4 NaN
Another idea:
#consecutive groups
g = df.A.ne(df.A.shift()).cumsum()
#first value of next group
next_g = df['B'].shift(-1)
#difference of next value of previous group
diff = df['B'].sub(next_g.groupby(g, sort=False).transform('last')).abs()
#get row by minimal difference and add first value of group
df = (df.loc[diff.fillna(0).groupby(g).idxmin()]
.reset_index(drop=True)
.assign(first=lambda x: df.B.groupby(g).first().to_numpy())
.rename(columns={'B':'nearest'}))
print (df)
A nearest first
0 803 803.9 803.4
1 805 804.2 804.4
2 804 803.9 804.2
3 803 803.4 803.4