Home > Software engineering >  Convert string to nested dictionary with nested list and custom key-value pairs
Convert string to nested dictionary with nested list and custom key-value pairs

Time:12-09

I'm pulling data from a network switch and it comes out as a string like this.

Gi1/0/1   COMPUTER1  Full   1000    Auto Down   off   A  (1),5-7777
Gi1/0/2   COMPUTER2  Full   1000    Auto Down   On    T  (1),5-7777
Gi1/0/3   COMPUTER3  Full   1000    Auto Up     Off   A  (1),5-7777
Gi1/0/4   COMPUTER4  Full   1000    Auto Down   Off   A  (1),5-7777
Gi1/0/5   COMPUTER5  Full   1000    Auto Down   Off   A  1
Gi1/0/6   COMPUTER6  Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/7   COMPUTER7  N/A    Unknown Auto Down   Off   A  1
Gi1/0/8   COMPUTER8  Full   1000    Auto Up     Off   A  1
Gi1/0/9   COMPUTER9  Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/10  COMPUTER10 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/11  COMPUTER11 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/12  COMPUTER12 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/13  COMPUTER13 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/14  COMPUTER14 Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/15  Server1    N/A    Unknown Auto Down   Off   A  55
Gi1/0/16  Server2    N/A    Unknown Auto Down   Off
Gi1/0/17  Server3    N/A    Unknown Auto Down   Off
Gi1/0/18  Server4    N/A    Unknown Auto Down   Off
Gi1/0/19  Server5    Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/20  Server6    Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/21  Server7    Full   1000    Auto Up     On    T  (1),5-7777
Gi1/0/22  Server8    Full   1000    Auto Up     On    A  3311
Gi1/0/23  COMPUTER15 Full   1000    Auto Up     Off   A  25
Gi1/0/24  COMPUTER16 Full   1000    Auto Up     On    A  99
Gi1/0/25  COMPUTER17 Full   1000    Auto Up     On    A  99
Gi1/0/26  Server9    Full   10      Auto Up     On    A  99
Gi1/0/27  COMPUTER18 Full   10      Auto Up     On    A  99
Gi1/0/28             N/A    Unknown Auto Down   Off   A  1
Gi1/0/29             N/A    Unknown Auto Down   Off   A  1
Gi1/0/30             N/A    Unknown Auto Down   Off   A  1
Gi1/0/31             N/A    Unknown Auto Down   Off   A  1
Gi1/0/32             N/A    Unknown Auto Down   Off   A  1
Gi1/0/33             N/A    Unknown Auto Down   Off   A  1
Gi1/0/34             N/A    Unknown Auto Down   Off   A  1
Gi1/0/35             N/A    Unknown Auto Down   Off   A  1
Gi1/0/36             N/A    Unknown Auto Down   Off   A  1
Gi1/0/37             N/A    Unknown Auto Down   Off   A  1
Gi1/0/38             N/A    Unknown Auto Down   Off   A  1
Gi1/0/39             N/A    Unknown Auto Down   Off   A  1
Gi1/0/40             N/A    Unknown Auto Down   Off   A  1
Gi1/0/41             N/A    Unknown Auto Down   Off   A  1
Gi1/0/42             N/A    Unknown Auto Down   Off   A  1
Gi1/0/43             N/A    Unknown Auto Down   Off   A  1
Gi1/0/44             N/A    Unknown Auto Down   Off   A  1
Gi1/0/45             N/A    Unknown Auto Down   Off   A  1
Gi1/0/46             N/A    Unknown Auto Down   Off   A  1
Gi1/0/47             N/A    Unknown Auto D-Down Off   A  1
Gi1/0/48             N/A    Unknown Auto Down   Off   A  1

I'm using TextFSM with Netmiko but I want to know how to format the data without using TextFSM.

I want to convert the data to where I can parse through it like this:

print(port[14]['Description'])

And I would get COMPUTER14

I'm thinking the structure needs to be like this:

{port: {14: {
               'Interface': 'Gi1/0/14',
               'Description': 'COMPUTER14',
               'Duplex': 'Full',
               'Speed': '1000',
               'Neg': 'Auto',
               'Linkstate': 'Up',
               'Flowctrl': 'On',
               'M': 'T'
               'VLAN': ['(1)', '5-7777']
               },
          15: {
               'Interface': 'Gi1/0/15',
               'Description': 'SERVER1',
               'Duplex': 'N/A',
               'Speed': 'Unknown',
               'Neg': 'Auto',
               'Linkstate': 'Down',
               'Flowctrl': 'off',
               'M': 'A',
               'VLAN': [55]
               }
         }
}
# VLAN would be a list and anything that doesn't have data would return 'None'

But not sure how to go about this using Python. The most I was able to do was convert to a list using splitlines().

Edit: Before I tried doing:

data_list = output.splitlines()

 for data in data_list:
      print(data.split(' '))

But the list from there came out like this:

['Gi1/0/1', '', '', 'COMPUTER1', 'Full', '', '', '1000', '', '', '', 'Auto', 'Down', '', '', '', '', 'off', '', '', '', 'A', '', '(1),5-7777']

From here I see I need to turn the list into a dictionary, but I don't know how to account for the spaces and also the spaces where there are no data, I would still want to show that there is none.

That data comes out like this:

['Gi1/0/28', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'N/A', '', '', '', 'Unknown', 'Auto', 'Down', '', '', 'Off', '', '', 'A', '', '1']

Going forward I tired:

keys = ["Interface", "Description", "Duplex", "Speed", "Neg", "Linkstate", "Flowctrl", "M", "VLAN"]

for data in data_list:
    z = zip(keys, data.split(' '))
    dictionary = dict(z)
    print(dictionary)

Although this makes a dictionary, the keys don't match the values correctly because of the spaces.

{'Interface': 'Gi1/0/1', 'Description': '', 'Duplex': '', 'Speed': 'COMPUTER1', 'Neg': 'Full', 'Linkstate': '', 'Flowctrl': '', 'M': '1000', 'VLAN': ''}
{'Interface': 'Gi1/0/28', 'Description': '', 'Duplex': '', 'Speed': '', 'Neg': '', 'Linkstate': '', 'Flowctrl': '', 'M': '', 'VLAN': ''}

How do I account for the spaces, or am I going the wrong direction here?

CodePudding user response:

Going by the conversation on the chat, text parsing seems to be the only way forward. I copied the entire text and saved it on to a file because I assume you have the output of the command stored in a file. For the sake of length of answer though, I tried with running only a few ports instead of all 48. Also Note that this works only if every column has atleast one row with data. This fails if there is a column for which no port has any data

Instead of using readline, I used read() so that I could split it at \n. This essentially removed the \n at the end of each entry in lines when using readlines()

with open('port_data.txt', 'r') as file:
    contents = file.read()
lines = contents.split('\n')
noOfLine = len(lines)

We also maintain a list of column descriptions which I believe will not change in the future for the command in question.

columns = ['Interfaces', 'Description', 'Duplex', 'Speed', 'Neg', 'LinkState', 'Flow Control', 'M', 'VLAN']
colSize = len(columns)

What I do is keep a list for each column content so that it is easy to iterate over to get all the data.

# This is a list of lists. 
# finalList[0] is a list of all ports
# finalList[1] is a list of all descriptions, etc.
# finalList will have 9 lists because we have 9 columns here.
finalList = [] 

# This is the output dictionary
output = {}

To construct the dictionary, we split each line in the lines[] at a ' '. This would ensure that the first index of the list generated by the split() has the data you need. Once we obtain that data, we need to find out where the next column starts.

We know that the next column starts at some spaces after the longest entry in that column. For example, if the name in row1 is apple and name in row2 is photosynthesis, then we know that there is more space after the first word in the row1 to accommodate for the long word in the same column in row2. To find this size, we track the maximum length for the column in a variable. And we do this for every column.

for i in range(len(columns)): # Perform the loop for every column
    resultList = [] # This is used to keep the list of every entry in the current column
    maxSize = 0 # This is the max length of an entry in the current column. We compute this at run time.
    for line in lines:
        line = line.rstrip() # Remove any trailing spaces in the right end.
        curString = line.split(' ')[0] # Split and get the first word
        maxSize = max(maxSize, len(curString)) # Update the maxSize
        resultList.append(curString) # Add this word to the current column list.
    finalList.append(resultList) # Add the current column to the list of columns we already created.

Now that we have the length of the biggest word in the current column, we can be sure that for every row, the next column appears at some point after maxSize to accommodate the spacing. For a line with a non-empty next column, the next word could start with in a few spaces. However, for a line with an empty entry for the next column, there would be more number of spaces.

To accommodate for this, we find the row with the least number of spaces after current maxSize and trim all the strings in the lines list to start from that location. Inside the same main for loop

newLineList = [] # This is going to be the list of lines after we have removed the first entry
minLength = 10000 # Arbitrarily large number
if i != colSize-1: # We don't have to do this for the last column because we won't be processing it anymore.
    for line in lines: # Each line
        minLength = min(minLength, len(line[maxSize:])-len(line[maxSize:].lstrip())) # len('    Apple') - len('Apple') = 4 meaning there are 4 spaces from the current position of line[maxSize]
    # Now that we have the minimum length, we can say for sure that the next column starts at maxSize minLength for every row.
    for line in lines:
        newLineList.append(line[maxSize minLength:]) # temporary holder for recomputed lines.
    lines = [] # Set it to empty
    for line in newLineList:
        lines.append(line) # Append all the new lines to the original lines set for next iteration.

Now we should have a list like this:

>>> for x in finalList:
>>> ... print(x)
['Gi1/0/1', 'Gi1/0/2', 'Gi1/0/3', 'Gi1/0/4', 'Gi1/0/47', 'Gi1/0/48']
['COMPUTER1', 'COMPUTER2', 'COMPUTER3', 'COMPUTER4', '', '']
['Full', 'Full', 'Full', 'Full', 'N/A', 'N/A']
['1000', '1000', '1000', '1000', 'Unknown', 'Unknown']
['Auto', 'Auto', 'Auto', 'Auto', 'Auto', 'Auto']
['Down', 'Down', 'Up', 'Down', 'D-Down', 'Down']
['off', 'On', 'Off', 'Off', 'Off', 'Off']
['A', 'T', 'A', 'A', 'A', 'A']
['(1),5-7777', '(1),5-7777', '(1),5-7777', '(1),5-7777', '1', '1']

Now that we have each column in a list by itself, constructing the dictionary isn't that difficult.

for i in range(noOfLine): # For each row
    result = {} # This is a new entry in the final JSON of ports.
    for j in range(colSize): # For each column
        if(columns[j] == 'VLAN'): # For VLAN, we need to perform a split to get list output
            result[columns[j]] = finalList[j][i].split(',') # If the VLAN is a single entry, we would just get that entry in a list.
        else:
            result[columns[j]] = finalList[j][i].strip() # Removes any excessive spaces.
    name = finalList[0][i] # Get the port name like Gi1/0/1 or Gi1/0/2

    output[name.split('/')[-1]] = result # Compute the actual port number. Split at '/' gives ['Gi1','0','1'] from which we take a last entry as the port number.

Full code put together:

import pprint

with open('Data_to_Csv.txt', 'r') as file:
    contents = file.read()

lines = contents.split('\n')
noOfLine = len(lines)
columns = ['Interfaces', 'Description', 'Duplex', 'Speed', 'Neg', 'LinkState', 'Flow Control', 'M', 'VLAN']
colSize = len(columns)
finalList = []
output = {}

for i in range(len(columns)):
    resultList = []
    newLineList = []
    maxSize = 0
    for line in lines:
        line = line.rstrip()
        curString = line.split(' ')[0]
        maxSize = max(maxSize, len(curString))
        resultList.append(curString)
    finalList.append(resultList)
    minLength = 10000
    if i != colSize-1:
        for line in lines:
            minLength = min(minLength, len(line[maxSize:])-len(line[maxSize:].lstrip()))
        for line in lines:
            newLineList.append(line[maxSize minLength:])
        lines = []
        for line in newLineList:
            lines.append(line)

for x in finalList:
    print(x)

for i in range(noOfLine):
    result = {}
    for j in range(colSize):
        # print("J = {}".format(j))
        if(columns[j] == 'VLAN'):
            result[columns[j]] = finalList[j][i].split(',')
        else:
            result[columns[j]] = finalList[j][i].strip()
        # print("Columns = {}".format(columns[j]))
        # print("result = {}".format(result[columns[j]]))
    name = finalList[0][i]

    output[name.split('/')[-1]] = result
pprint.pprint(output)

Output of the above code:

{'1': {'Description': 'COMPUTER1',
       'Duplex': 'Full',
       'Flow Control': 'off',
       'Interfaces': 'Gi1/0/1',
       'LinkState': 'Down',
       'M': 'A',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '2': {'Description': 'COMPUTER2',
       'Duplex': 'Full',
       'Flow Control': 'On',
       'Interfaces': 'Gi1/0/2',
       'LinkState': 'Down',
       'M': 'T',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '3': {'Description': 'COMPUTER3',
       'Duplex': 'Full',
       'Flow Control': 'Off',
       'Interfaces': 'Gi1/0/3',
       'LinkState': 'Up',
       'M': 'A',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '4': {'Description': 'COMPUTER4',
       'Duplex': 'Full',
       'Flow Control': 'Off',
       'Interfaces': 'Gi1/0/4',
       'LinkState': 'Down',
       'M': 'A',
       'Neg': 'Auto',
       'Speed': '1000',
       'VLAN': ['(1)', '5-7777']},
 '47': {'Description': '',
        'Duplex': 'N/A',
        'Flow Control': 'Off',
        'Interfaces': 'Gi1/0/47',
        'LinkState': 'D-Down',
        'M': 'A',
        'Neg': 'Auto',
        'Speed': 'Unknown',
        'VLAN': ['1']},
 '48': {'Description': '',
        'Duplex': 'N/A',
        'Flow Control': 'Off',
        'Interfaces': 'Gi1/0/48',
        'LinkState': 'Down',
        'M': 'A',
        'Neg': 'Auto',
        'Speed': 'Unknown',
        'VLAN': ['1']}}
  • Related