Home > Software design >  Reading multiple Json files and combining into one file as per the date in Python
Reading multiple Json files and combining into one file as per the date in Python

Time:11-18

I get JSON extracts throughout a day which is executed for different dates.

As a preprocess step I would like to combine all JSONs with same date and merge them into common file with date as name.

Multiple Json files

tmp/emp1.json
tmp/emp2.json
tmp/emp3.json
tmp/emp4.json
tmp/emp5.json
tmp/emp6.json

Format of each json file is somewhat like below with StartTime mentioned for each row

   {
   "SQLS": 
   [
   {
      "ID": "0001",
      "SQLText": "INSERT INTO tech as Select * from employee where emp department = 
     'tech'",
      "Properties": {
       "Type": "Static",
       "Source": "GP",
       "db": "emp_db",
        "StartTime": "2022-11-16 20:24:45.979057",
       "QueryID": "q101"
      },
     "ID": "0002",
      "SQLText": "INSERT INTO sales as Select * from employee where emp department = 
    'sales'",
     "Properties": {
        "Type": "Static",
        "Source": "sybase",
        "db": "emp_db",
      "StartTime": "2022-11-17 20:24:45.979057",
        "QueryID": "q102"
      },

     "ID": "0003",
      "SQLText": "INSERT INTO tech as Select * from employee where emp department = 
    'tech'",
      "Properties": {
        "Type": "Static",
        "Source": "postgres",
        "db": "emp_db",
         "StartTime": "2022-11-16 20:24:45.979057",
        "QueryID": "q103"
      },
    "ID": "0004",
      "SQLText": "INSERT INTO tech as Select * from employee where emp department = 
   'tech'",
     "Properties": {
        "Type": "Static",
        "Source": "GP",
        "db": "emp_db",
        "StartTime": "2022-11-17 20:24:45.979057",
        "QueryID": "q104"
      }
    }
             ]
              }

  

I want to read each of these files and put them into respective directory as per the date.

Destination directory would belike below

 tmp/20221115/
 tmp/20221116/
 tmp/20221117/

Let me know if more clarity is required.

Appreciate your suggestions on this.

CodePudding user response:

Here, is the solution for the issue. I have made one function you just need to pass the input directory path and output directory path. Rest it will handle.

  1. First, I converted the JSON into a pandas data frame.

  2. Then grouped them based on the startTime.

  3. After that club the data with the same date into the dictionary and at the end save them to the output path.

     import os
     import json
     import glob
     import pandas as pd
     output_dict = {}
     input_dirpath = 'filepath'
     output_dirpath = 'Dir_path'
     def merge_jsons(input_dirpath, output_dirpath):
         all_files = glob.glob(rf'{input_dirpath}/*.json')
         for file in all_files:
           with open(file) as file:
             data = json.loads(file.read())
    
         df = pd.DataFrame(data['SQLS'])
         df['StartTime'] = pd.to_datetime(df['Properties'].apply(lambda x: 
      x['StartTime'])).dt.date.astype(str)
         grps = df.groupby('StartTime')
    
         for start_time, data_obj in grps:
             date = start_time.replace('-','')
             data_obj = data_obj.drop('StartTime',axis=1).to_dict('records')
             if date not in output_dict:
                output_dict[date] = data_obj
             else:
                output_dict[date].append(data_obj)
    
         for filename, data in output_dict.items():
             with open(os.path.join(output_dirpath,filename '.json'),'w') as writer:
                 writer.write(json.dumps({'SQLS':data}))
             print('filename: ',os.path.join(output_dirpath,filename '.json'), 'saved..')
    

CodePudding user response:

import json

filenames = ['emp1', 'emp2']
data_list = []
new_filenames = []

# Open files and concatenate objects
for filename in filenames:
    f = open(f'./data/tmp/{filename}.json')
    data = json.load(f)
    data_list.append(data['SQLS'])

# Find unique different dates contained in objects
# Process them into new filenames
for data in data_list:
    for obj in data:
        date = obj['Properties']['StartTime'].split(' ')[0]
        new_filename = date.replace('-', '')
        if new_filename not in new_filenames:
            new_filenames.append(new_filename)

# Find objects with the right date/filename
def find_objects(data, filename):
    new_data = { 'SQLS': [] }

    for data in data_list:
        for obj in data:
            date = obj['Properties']['StartTime'].split(' ')[0]
            new_filename = date.replace('-', '')
            if new_filename == filename:
                new_data['SQLS'].append(obj)
    return new_data

# Write new files
for filename in new_filenames:
    with open(f'./data/tmp/new_files/{filename}.json', 'w') as outfile:
        data = find_objects(data_list, filename)
        json_data = json.dumps(data, indent = 4) 
        outfile.write(json_data)
  • Related