I am parsing a lot of netstat data and the way I have been handling my solution now is by just removing the row and referencing manually. If I see proto is NaN, I just parse the row completely. But I am unable to append the row and the rest of the dataframe due to mismatched sizing.
I was wondering if it would be possible to just take the row with empty columns and just move it to the preceding row appending it to a column.
E.g - This is what my dataframe looks like as of now
Proto | LocalAddress | ForeignAdress | State | PID |
---|---|---|---|---|
TCP | [0.0.0.0:7] | 0.0.0.0:0 | LISTENING | 4112 |
[tcpsvcs.exe] | ||||
TCP | 0.0.0.0:111 | 0.0.0.0:0 | LISTENING | 4 |
Can not obtain ownership information |
Which will hopefully turn into
E.g
Proto | LocalAddress | ForeignAdress | State | PID | Process_name |
---|---|---|---|---|---|
TCP | [0.0.0.0:7] | 0.0.0.0:0 | LISTENING | 4112 | tcpsvcs.exe |
TCP | 0.0.0.0:111 | 0.0.0.0:0 | LISTENING | 4 | Can not obtain ownership information |
Basically create a new column for the process names and keep appending to the prior line.
CodePudding user response:
Try this:
You said it is always in the next row, so we just need to get a Series of Proto
which only contains the values of the rows with NaN
. Then we just shift it by 1 and create a new column with it.
cols = ['LocalAddress', 'ForeignAdress', 'State', 'PID']
df['process_name'] = df[df[cols].isna().all(axis=1)]['Proto'].reindex_like(df).shift(-1)
df = df.dropna(subset=cols)
Output:
Proto LocalAddress ForeignAdress State PID process_name
0 TCP [0.0.0.0:7] 0.0.0.0:0 LISTENING 4112.0 [tcpsvcs.exe]
2 TCP 0.0.0.0:111 0.0.0.0:0 LISTENING 4.0 Can not obtain ownership information
CodePudding user response:
Something like this should work:
mask = df['LocalAddress'].isna()
missing_vals = df.loc[mask, 'Proto'].values
df = df[~mask].copy()
df['Process_name'] = missing_vals
CodePudding user response: