Right now, I am running this line of code:
PIDdf = clean_CSV['PID'].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str) '%'
on this dataset
Proto | Local Address | Foreign Address | State | PID | Process_name |
---|---|---|---|---|---|
TCP | [0.0.0.0:7] | 0.0.0.0:0 | LISTENING | 4112 | tcpsvcs.exe |
TCP | 0.0.0.0:111 | 0.0.0.0:0 | LISTENING | 4 | Can not obtain ownership information |
and the code returns just this
PID | |
---|---|
6356 | 11.1% |
32744 | 10.4% |
9196 | 3.3% |
2652 | 3.3% |
27468 | 3.3% |
But I would like to see this:
PID | Process_name | |
---|---|---|
6356 | 11.1% | sdfasdfa |
32744 | 10.4% | adsfasdf |
9196 | 3.3% | asdfasd |
2652 | 3.3% | asdfsad |
27468 | 3.3% | asdfsdaf |
Is there a better of doing this rather than just finding the largest columns of the same process_names and appending it?
CodePudding user response:
IIUC , here is one way :
PIDdf = df[['PID','Process_name']].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str) '%'
another way :
PIDdf = df.groupby(['PID','Process_name'])['PID'].count().divide(df.shape[0]).nlargest(5).mul(100).round(1).astype(str) '%'
output:
>>>
PID Process_name
4 Can not obtain ownership information 50.0%
4112 tcpsvcs.exe 50.0%