Home > Software engineering >  Get specific rows which match condition pandas
Get specific rows which match condition pandas

Time:05-06

I have the following dataframe

dataframe

My current code is as follows: Outcome is to only show instances where ImageFileName is services.exe and the PPIDName is not wininit.exe. Right now my result shows all other rows which are not matching this condition.

services = dfprocs[(dfprocs.ImageFileName.str.lower() == "services.exe") & (dfprocs.PPIDName.str.lower() == "wininit.exe") == False]
if  len(services) == 0:
    print("Services.exe was spawned using known parent")
else:
    print("[!]Suspicious services.exe process found")
    print(services)

CodePudding user response:

Use:

services = dfprocs[(dfprocs.ImageFileName.str.lower() == "services.exe") & (dfprocs.PPIDName.str.lower() != "wininit.exe")]

CodePudding user response:

It looks like there is an operator precedence issue here. The == operator (which in normal python code would take precedence over and) does not take precedence over the bitwise (or in pandas, element-wise) operator &.

You can fix it either by putting parens around (dfprocs.PPIDName.str.lower() == "wininit.exe") == False or changing to (dfprocs.PPIDName.str.lower() != "wininit.exe").

import pandas as pd
import numpy as np

dfprocs = pd.DataFrame({
    "TreeDepth":[0,1,1,0,0,1,1],
    "PID":[4,88,404,556,632,768,900],
    "PPID":[0,4,4,548,548,632,632],
    "ImageFileName":["System","Registry","smss.exe","csrss.exe","wininit.exe","services.exe","services.exe"],
    "Offset(V)":['0xac818d45d080','0xac818d45d080','0xac818d45d080','0xac818d45d080','0xac818d45d080','0xac818d45d080','0xac818d45d080'],
    "Threads":[158,4,2,10,1,7,7],
    "Handles":[np.NaN]*7,
    "SessionID":[np.NaN]*3 [0.0]*4,
    "Wow64":[False]*7,
    "CreateTime":[0]*7,
    "ExitTime":[np.NaN]*7,
    "PPIDName":[np.NaN, "System", "System", np.NaN, np.NaN, "wininit.exe", "wininit.exe"]})
print(dfprocs)
services = dfprocs[(dfprocs.ImageFileName.str.lower() == "services.exe") & ((dfprocs.PPIDName.str.lower() == "wininit.exe") == False)]
if  len(services) == 0:
    print("Services.exe was spawned using known parent")
else:
    print("[!]Suspicious services.exe process found")
    print(services)

Output:

   TreeDepth  PID  PPID ImageFileName       Offset(V)  Threads  Handles  SessionID  Wow64  CreateTime  ExitTime     PPIDName
0          0    4     0        System  0xac818d45d080      158      NaN        NaN  False           0       NaN          NaN
1          1   88     4      Registry  0xac818d45d080        4      NaN        NaN  False           0       NaN       System
2          1  404     4      smss.exe  0xac818d45d080        2      NaN        NaN  False           0       NaN       System
3          0  556   548     csrss.exe  0xac818d45d080       10      NaN        0.0  False           0       NaN          NaN
4          0  632   548   wininit.exe  0xac818d45d080        1      NaN        0.0  False           0       NaN          NaN
5          1  768   632  services.exe  0xac818d45d080        7      NaN        0.0  False           0       NaN  wininit.exe
6          1  900   632  services.exe  0xac818d45d080        7      NaN        0.0  False           0       NaN  wininit.exe
Services.exe was spawned using known parent

CodePudding user response:

As per the requirement mentioned, this should work.

services = dfprocs.loc[(dfprocs.ImageFileName == 'services.exe') & (dfprocs.PPIDName != 'wininit.exe')]

Let me know, thanks.

  • Related