I am parsing netstat data and my data looks like this
Proto Local Address Foreign Adress State PID Process name |
---|
TCP [0.0.0.0:7] 0.0.0.0:0 LISTENING 4112 tcpsvcs.exe |
TCP 0.0.0.0:111 0.0.0.0:0 LISTENING 4 Can not obtain ownership information |
When I try to parse this using this Python command
pd.read_csv("C:\\Users\\xxxxx\\Documents\\test.csv", on_bad_lines='skip', header=1, delim_whitespace= True)
my data ends up looking like this (Notice how Process_name isn't properly filled out and everything is being shifted to the left due to the space between Local address and foreign address):
Proto | Local | Address | Foreign | Address | State | PID | Process_name |
---|---|---|---|---|---|---|---|
TCP | [0.0.0.0:7] | 0.0.0.0:0 | LISTENING | 4112 | tcpsvcs.exe | ||
TCP | 0.0.0.0:111 | 0.0.0.0:0 | LISTENING | 4 | Can not obtain ownership information |
When it should end up looking like this:
Proto | Local Address | Foreign Address | State | PID | Process_name |
---|---|---|---|---|---|
TCP | [0.0.0.0:7] | 0.0.0.0:0 | LISTENING | 4112 | tcpsvcs.exe |
TCP | 0.0.0.0:111 | 0.0.0.0:0 | LISTENING | 4 | Can not obtain ownership information |
And that's solely due to the spacing of the headers. Right now I am just adding my own headers to my CSV file but the only issue with that is my file is extremely large and appending a header row at the top of a CSV is very heavy. I was wondering if there are any ways to parse the header that I do have and ignore the singular space and only delimit the spaces that are larger than the singular space that I have.
CodePudding user response:
Use sep='\s\s '
on read_csv
to separate columns that are separated by more than one space:
Example Data
sim_csv = io.StringIO(
''' Proto Local Address Foreign Address State PID Process_name
TCP 0.0.0.0:80 0.0.0.0:0 LISTENING 4 tcpsvcs.exe
TCP 0.0.0.0:135 0.0.0.0:0 LISTENING 1552 Can not obtain ownership information''')
Execute
df = pd.read_csv(sim_csv, sep='\s\s ')
print(df)
Result
Proto Local Address Foreign Address State PID Process_name
0 TCP 0.0.0.0:80 0.0.0.0:0 LISTENING 4 tcpsvcs.exe
1 TCP 0.0.0.0:135 0.0.0.0:0 LISTENING 1552 Can not obtain ownership information
CodePudding user response:
I cannot test it with the raw file but try this:
pd.read_csv("C:\\Users\\xxxxx\\Documents\\test.csv", on_bad_lines='skip', header=1, sep="\t")