I have looked everywhere for this answer which must exist. I am trying to find the smallest positive integer per row in a data frame.
Imagine a dataframe
'lat':[-120, -90, -100, -100],
'long':[20, 21, 19, 18],
'dist1':[2, 6, 8, 1],
'dist2':[1,3,10,5]}```
The following function gives me the minimum value, but includes negatives. i.e. the df['lat'] column.
df.min(axis = 1)
Obviously, I could drop the lat column, or convert to string or something, but I will need it later. The lat column is the only column with negative values. I am trying to return a new column such as
df['min_dist'] = [1,3,8,1]
I hope this all makes sense. Thanks in advance for any help.
CodePudding user response:
In general you can use DataFrame.where
to mark negative values as null and exclude them from min
calculation:
df['min_dist'] = df.where(df > 0).min(1)
df
lat long dist1 dist2 min_dist
0 -120 20 2 1 1.0
1 -90 21 6 3 3.0
2 -100 19 8 10 8.0
3 -100 18 1 5 1.0
CodePudding user response:
Filter for just the dist
columns and apply the minimum function :
df.assign(min_dist = df.iloc[:, -2:].min(1))
Out[205]:
lat long dist1 dist2 min_dist
0 -120 20 2 1 1
1 -90 21 6 3 3
2 -100 19 8 10 8
3 -100 18 1 5 1
CodePudding user response:
You can sort each row
then get second element like below:
>>> import numpy as np
>>> df['min_dist'] = df.apply(lambda row : np.sort(row)[1], axis=1)
>>> df
lat long dist1 dist2 min_dist
0 -120 20 2 1 1
1 -90 21 6 3 3
2 -100 19 8 10 8
3 -100 18 1 5 1
CodePudding user response:
Just use:
df['min_dist'] = df[df > 0].min(1)