Home > database >  Accessing multiple columns when corresponding to certain rows
Accessing multiple columns when corresponding to certain rows

Time:09-02

Right now, I am running this line of code:

PIDdf = clean_CSV['PID'].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str)   '%'

on this dataset

Proto Local Address Foreign Address State PID Process_name
TCP [0.0.0.0:7] 0.0.0.0:0 LISTENING 4112 tcpsvcs.exe
TCP 0.0.0.0:111 0.0.0.0:0 LISTENING 4 Can not obtain ownership information

and the code returns just this

PID
6356 11.1%
32744 10.4%
9196 3.3%
2652 3.3%
27468 3.3%

But I would like to see this:

PID Process_name
6356 11.1% sdfasdfa
32744 10.4% adsfasdf
9196 3.3% asdfasd
2652 3.3% asdfsad
27468 3.3% asdfsdaf

Is there a better of doing this rather than just finding the largest columns of the same process_names and appending it?

CodePudding user response:

IIUC , here is one way :

PIDdf = df[['PID','Process_name']].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str)   '%'

another way :

PIDdf = df.groupby(['PID','Process_name'])['PID'].count().divide(df.shape[0]).nlargest(5).mul(100).round(1).astype(str)   '%'

output:

>>>
PID   Process_name                        
4     Can not obtain ownership information    50.0%
4112  tcpsvcs.exe                             50.0%
  • Related