How to parse and access columns based on headers in file?

I believe this is a 3 step process but please bear with me. I'm currently reading Shell output which is being saved to a file and the output looks like this:

Current Output:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 123.345.789:1234        0.0.0.0:*               LISTEN      23044/test          
tcp        0      0 0.0.0.0:5915            0.0.0.0:*               LISTEN      99800/./serv    
tcp        0      0 0.0.0.0:1501            0.0.0.0:*               LISTEN      -

I'm trying to access each columns information based on the header value. This is something I was able to do in Powershell but not sure how to achieve it in Python.

Expected Output:

Proto,Recv-Q,Send-Q,Local Address,Foreign Address,State,PID/Program name
tcp,0,0,123.345.789:1234,0.0.0.0:*,LISTEN,23044/test          
tcp,0,0,0.0.0.0:5915,0.0.0.0:*,LISTEN,99800/./serv    
tcp,0,0,0.0.0.0:1501,0.0.0.0:*,LISTEN,-

proto = data["Proto"]
for p in proto:
    print(p)

Output: tcp tcp tcp

What I've tried?:

Where do I begin.. I've tried Splitting, Replacing and Translate. Also, I did try Regex but couldn't quite figure it out :/

Proto,Recv-Q,Send-Q,Local,Address,,,,,,,,,,,Foreign Address,,,,,,,,,State,,,,,, PID/Program,name    
tcp,,,,,,,,0,,,,,,0 123.345.789:1234,,,,,,,,0.0.0.0:*,,,,,,,,,,,,,,,LISTEN,,,,,,23021/java,,,,,,,,  
tcp,,,,,,,,0,,,,,,0 0.0.0.0:5915,,,,,,,,,,,,0.0.0.0:*,,,,,,,,,,,,,,,LISTEN,,,,,,99859/./statserv    
tcp,,,,,,,,0,,,,,,0 0.0.0.0:1501,,,,,,,,,,,,0.0.0.0:*,,,,,,,,,,,,,,,LISTEN,,,,,,-

Since some of the headers contain a space in between them it's sort of difficult to map the columns accordingly.

Looking for the best way to approach this.

Thank you.

CodePudding user response：

You are post-processing the output of the netstat command. netstat itself is just reformatting the information in /proc/net/tcp, which you can also read. As with the netstat output, you may need to make your own header line, but the data lines are all space separated. A simple line.split() should do it.

If you still want to use netstat, as I said, just throw away the header line and use split. You know what the columns are.

for ln in output:
    fields = ln.split()
    print( ','.join(fields) )

CodePudding user response：

Skip the first row, indicate that there is no header, assign header names and then split on one or more spaces.

df = pd.read_csv('netstat.txt', skiprows=1, header=None, sep='\s ', 
                 names=['cv-Q','Send-Q','Local Address','Foreign Address','State','PID/Program name'])
print(df)

  Proto cv-Q  Send-Q     Local Address Foreign Address   State PID/Program name
0 tcp      0       0  123.345.789:1234       0.0.0.0:*  LISTEN       23044/test
1 tcp      0       0      0.0.0.0:5915       0.0.0.0:*  LISTEN     99800/./serv
2 tcp      0       0      0.0.0.0:1501       0.0.0.0:*  LISTEN                -

df.to_csv('output.csv', index=None)

CodePudding user response：

Split based on a string with two or more spaces using a regex.

for ln in testset:
    splitted = re.split(r'\s{2,}', ln.replace("\n", ""))
    print(splitted)