Considering the following dataframe df
:
df = pd.DataFrame(
{
"col1": [0,1,2,3,4,5,6,7,8,9,10],
"col2": ["A","B","C","D","E","F","G","H","I","J","K"],
"col3": [1e-0,1e-1,1e-2,1e-3,1e-4,1e-5,1e-6,1e-7,1e-8,1e-9,1e-10],
"col4": [0,4,2,5,6,7,6,3,6,2,1]
}
)
I would like to select rows when the col4 value of the current row is greater than the col4 values of the previous and next rows and to store them in an empty frame.
I wrote the following code that works :
df1=pd.DataFrame()
for i in range(1,len(df)-1,1):
if ( (df.iloc[i]['col4'] > df.iloc[i 1]['col4']) and (df.iloc[i]['col4'] > df.iloc[i-1]['col4']) ):
df1=pd.concat([df1,df.iloc[i:i 1]])
I got the expected dataframe df1
col1 col2 col3 col4
1 1 B 1.000000e-01 4
5 5 F 1.000000e-05 7
8 8 I 1.000000e-08 6
But this code is very ugly, not readable, ... Is there a best solution ?
CodePudding user response:
Use boolean indexing
with compare next and previous values by Series.shift
and Series.gt
for greater values, for chain bitwise AND
use &
:
df = df[df['col4'].gt(df['col4'].shift()) & df['col4'].gt(df['col4'].shift(-1))]
print (df)
col1 col2 col3 col4
1 1 B 1.000000e-01 4
5 5 F 1.000000e-05 7
8 8 I 1.000000e-08 6
EDIT: Solution for always include first and last rows:
mask = df['col4'].gt(df['col4'].shift()) & df['col4'].gt(df['col4'].shift(-1))
mask.iloc[[0, -1]] = True
df = df[mask]
print (df)
col1 col2 col3 col4
0 0 A 1.000000e 00 0
1 1 B 1.000000e-01 4
5 5 F 1.000000e-05 7
8 8 I 1.000000e-08 6
10 10 K 1.000000e-10 1