Home > Mobile >  Read a file and match lines above or below from the matching pattern
Read a file and match lines above or below from the matching pattern

Time:09-24

I'm reading an input json file, and capturing the array values into a dictionary, by matching tar.gz and printing a line above that (essentially the yaml file).

{"Windows": [
        "/home/windows/work/input.yaml",
        "/home/windows/work/windows.tar.gz"
    ],
    "Mac": [
        "/home/macos/required/utilities/input.yaml",
        "/home/macos/required/utilities.tar.gz"
        
    ],
    "Unix": [
         "/home/unix/functional/plugins/input.yaml",
         "/home/unix/functional/plugins/Plugin.tar.gz"
     ]
      goes on..
}

Output of the dictionary:

{'/home/windows/work/windows.tar.gz': '/home/windows/work/input.yaml',
 
 '/home/macos/required/utilities/utilities.tar.gz' : '/home/macos/required/input.yaml' 
 ......

}

Problem being, if the entries of json changes, i.e. A) tar.gz entries can come as the 1st element in the list of values or B. or, its mix and match, Irrespective of the entries, how can I get the output dictionary to be of above mentioned format only.

        { "Windows": [
            "/home/windows/work/windows.tar.gz",
            "/home/windows/work/input.yaml"
        ],
        "Mac": [
            "/home/macos/required/utilities/utilities.tar.gz",
            "/home/macos/required/input.yaml"
            
        ],
        "Unix": [
             "/home/unix/functional/plugins/Plugin.tar.gz",
             "/home/unix/functional/plugins/input.yaml"
         ]
          goes on.. }

mix and match scenario.

{ "Windows": [
    "/home/windows/work/windows.tar.gz",
    "/home/windows/work/input.yaml"
],
"Mac": [
    "/home/macos/required/utilities/input.yaml",
    "/home/macos/required/utilities.tar.gz"
    
],
"Unix": [
     "/home/unix/functional/plugins/Plugin.tar.gz",
     "/home/unix/functional/plugins/input.yaml"
 ] }

My code snippet.

def read_input():
    files_to_be_processed = {}
    with open('input.json', 'r') as f:
        lines = f.read().splitlines()
        lines = [line.replace('"', '').replace(" ", '').replace(',', '') for line in lines]
        for index, value in enumerate(lines):
            match = re.match(r".*.tar.gz", line)
            if match:
                j = i-1 if i > 1 else 0
                for k in range(j, i):
                    read_input[match.string] = lines[k]
    print(read_input)

CodePudding user response:

One way could be for you to transform the list within your input json_dict into a dict that has a key for "yaml" and "gz"

json_dict_1 = dict.fromkeys(json_dict, dict())
for key in json_dict:
    list_val = json_dict[key]
    for entry in list_val:
        entry_key = 'yaml' if 'yaml' in entry[-4:] else 'gz'
        json_dict_1[key][entry_key] = entry

print(json_dict_1)
#{'Windows': {'yaml': '/home/unix/functional/plugins/input.yaml',
#  'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Mac': {'yaml': '/home/unix/functional/plugins/input.yaml',
#  'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Unix': {'yaml': '/home/unix/functional/plugins/input.yaml',
#  'gz': '/home/unix/functional/plugins/Plugin.tar.gz'}}

CodePudding user response:

A method here is to have the following:

1- Using the JSON class in python makes your whole process much easier.

2- After taking the data in the JSON class, you can check each object (aka Windows/Max/Unix), for both the tar-gz and the yaml

3- Assign to new dictionary

Here is a quick code:

import json

def read_input():
    files_to_be_processed = {}
    with open('input.json','r') as f:
        jsonObject = json.load(f)
        for value in jsonObject.items():
            tarGz = ""
            Yaml = ""
            for line in value[1]: #value[0] contains the key (e.g. Windows)
                if line.endswith('.tar.gz'):
                    tarGz = line
                elif line.endswith('.yaml'):
                    Yaml = line
            files_to_be_processed[tarGz] = Yaml
        print(files_to_be_processed)

read_input()

This code can be shortened and optimised using things like list comprehension and other methods, but it should be a good place to get started

  • Related