Home > Enterprise >  Generate a dictionary with two-level keys from list
Generate a dictionary with two-level keys from list

Time:10-13

Struggling to get the desired data structure. (Note - pandas implementation are preffered)

Currently I have the following lists of dictionaries:

list1 =[
{'ip': '11.22.33.44', 'timestamp': 1665480231699, 'message': '{"body": "Idle time larger than time period. retry:0"}', 'ingestionTime': 1665480263198},
{'ip': '11.22.33.42', 'timestamp': 1665480231698, 'message': '{"body": "Idle time larger than time period. retry:5"}', 'ingestionTime': 1665480263198}, 
{'ip': '11.22.33.44', 'timestamp': 1665480231698, 'message': '{"body": "Idle time larger than time period. retry:0"}', 'ingestionTime': 1665480263198}
]
whitelist_metadata = [
  {
    'LogLevel': 'WARNING',
    'SpecificVersion': 'None',
    'TimeInterval(Min)': 1,
    'MetricMsg': 'DDR: XXXX count got lost',
    'AllowedOccurrenceInTimeInterval': 0   --> this means that we are allowing this msg always 
  },
  {
    'LogLevel': 'WARNING',
    'SpecificVersion': 'None',
    'TimeInterval(Min)': 1,
    'MetricMsg': 'Idle time larger than XXX time. retry: \\d ',
    'AllowedOccurrenceInTimeInterval': 5  --> this means that are allowing this msg only if happened not more than 5 times within 1min.
  }
]

And my desired output is

{
  '11.22.33.42': {
    1665480231698: ['{"body": "Idle time larger than time period. retry:5"}']
  },
  '11.22.33.44': {
    1665480231698: ['{"body": "Idle time larger than time period. retry:0"}'],
    1665480231699: ['{"body": "Idle time larger than time period. retry:0"}']
  }
}

How do I achieve that?


Attempts: Tried to play with pandas pivot to convert the data structure, but failed - this is what i tried:

df = pd.DataFrame(list1)
s = df.pivot(['ip', 'timestamp'], 'message')   
ss = s.assign(r=s.to_dict('records'))['r'].unstack(0).to_dict() 

Here i already have issue with hows data looks like (the message part - i need it to be the timestamp value and not another key that appear as tupple)

>> print(S) 
                            ingestionTime                                                                                                  
message                     {"body": "Idle time larger than time period. retry:0"} {"body": "Idle time larger than time period. retry:5"}
ip timestamp                                                                                                                    
11.22.33.42   1665480231698           NaN                                            1.665480e 12                                          
11.22.33.44   1665480231698  1.665480e 12                                                     NaN                                          
              1665480231699  1.665480e 12                                                     NaN                                          
>> print(ss)
{
  '11.22.33.42': {
    1665480231698: {
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:0"}'
      ): nan,
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:5"}'
      ): 1665480263198.0
    },
    1665480231699: nan
  },
  '11.22.33.44': {
    1665480231698: {
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:0"}'
      ): 1665480263198.0,
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:5"}'
      ): nan
    },
    1665480231699: {
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:0"}'
      ): 1665480263198.0,
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:5"}'
      ): nan
    }
  }
}

CodePudding user response:

As the desired output is

{
  '11.22.33.42': {
    1665480231698: ['{"body": "Idle time larger than time period. retry:5"}']
  },
  '11.22.33.44': {
    1665480231698: ['{"body": "Idle time larger than time period. retry:0"}'],
    1665480231699: ['{"body": "Idle time larger than time period. retry:0"}']
  }
}

Considering the data that OP shared in the question, one doesn't actually need the second list. The list1 would be enough.

The following function will do the work (the comments make it self-explanatory)

def todict(list1):

    dict1 = {} # create an empty dictionary

    for item in list1: # iterate over the list

        if item['ip'] not in dict1: # if the ip is not in the dictionary
            dict1[item['ip']] = {} # create a new key with the ip as value

        if item['timestamp'] not in dict1[item['ip']]: # if the timestamp is not in the dictionary
            dict1[item['ip']][item['timestamp']] = [] # create a new key with the timestamp as value

        dict1[item['ip']][item['timestamp']].append(item['message']) # append the message to the list

    return dict1

Then one gets the following

dict = todict(list1)

[Out]:

{'11.22.33.42': {1665480231698: ['{"body": "Idle time larger than time period. '
                                 'retry:5"}']},
 '11.22.33.44': {1665480231698: ['{"body": "Idle time larger than time period. '
                                 'retry:0"}'],
                 1665480231699: ['{"body": "Idle time larger than time period. '
                                 'retry:0"}']}}
  • Related