Struggling to get the desired data structure. (Note - pandas implementation are preffered)
Currently I have the following lists of dictionaries:
list1 =[
{'ip': '11.22.33.44', 'timestamp': 1665480231699, 'message': '{"body": "Idle time larger than time period. retry:0"}', 'ingestionTime': 1665480263198},
{'ip': '11.22.33.42', 'timestamp': 1665480231698, 'message': '{"body": "Idle time larger than time period. retry:5"}', 'ingestionTime': 1665480263198},
{'ip': '11.22.33.44', 'timestamp': 1665480231698, 'message': '{"body": "Idle time larger than time period. retry:0"}', 'ingestionTime': 1665480263198}
]
whitelist_metadata = [
{
'LogLevel': 'WARNING',
'SpecificVersion': 'None',
'TimeInterval(Min)': 1,
'MetricMsg': 'DDR: XXXX count got lost',
'AllowedOccurrenceInTimeInterval': 0 --> this means that we are allowing this msg always
},
{
'LogLevel': 'WARNING',
'SpecificVersion': 'None',
'TimeInterval(Min)': 1,
'MetricMsg': 'Idle time larger than XXX time. retry: \\d ',
'AllowedOccurrenceInTimeInterval': 5 --> this means that are allowing this msg only if happened not more than 5 times within 1min.
}
]
And my desired output is
{
'11.22.33.42': {
1665480231698: ['{"body": "Idle time larger than time period. retry:5"}']
},
'11.22.33.44': {
1665480231698: ['{"body": "Idle time larger than time period. retry:0"}'],
1665480231699: ['{"body": "Idle time larger than time period. retry:0"}']
}
}
How do I achieve that?
Attempts: Tried to play with pandas pivot to convert the data structure, but failed - this is what i tried:
df = pd.DataFrame(list1)
s = df.pivot(['ip', 'timestamp'], 'message')
ss = s.assign(r=s.to_dict('records'))['r'].unstack(0).to_dict()
Here i already have issue with hows data looks like (the message part - i need it to be the timestamp value and not another key that appear as tupple)
>> print(S)
ingestionTime
message {"body": "Idle time larger than time period. retry:0"} {"body": "Idle time larger than time period. retry:5"}
ip timestamp
11.22.33.42 1665480231698 NaN 1.665480e 12
11.22.33.44 1665480231698 1.665480e 12 NaN
1665480231699 1.665480e 12 NaN
>> print(ss)
{
'11.22.33.42': {
1665480231698: {
(
'ingestionTime',
'{"body": "Idle time larger than time period. retry:0"}'
): nan,
(
'ingestionTime',
'{"body": "Idle time larger than time period. retry:5"}'
): 1665480263198.0
},
1665480231699: nan
},
'11.22.33.44': {
1665480231698: {
(
'ingestionTime',
'{"body": "Idle time larger than time period. retry:0"}'
): 1665480263198.0,
(
'ingestionTime',
'{"body": "Idle time larger than time period. retry:5"}'
): nan
},
1665480231699: {
(
'ingestionTime',
'{"body": "Idle time larger than time period. retry:0"}'
): 1665480263198.0,
(
'ingestionTime',
'{"body": "Idle time larger than time period. retry:5"}'
): nan
}
}
}
CodePudding user response:
As the desired output is
{
'11.22.33.42': {
1665480231698: ['{"body": "Idle time larger than time period. retry:5"}']
},
'11.22.33.44': {
1665480231698: ['{"body": "Idle time larger than time period. retry:0"}'],
1665480231699: ['{"body": "Idle time larger than time period. retry:0"}']
}
}
Considering the data that OP shared in the question, one doesn't actually need the second list. The list1
would be enough.
The following function will do the work (the comments make it self-explanatory)
def todict(list1):
dict1 = {} # create an empty dictionary
for item in list1: # iterate over the list
if item['ip'] not in dict1: # if the ip is not in the dictionary
dict1[item['ip']] = {} # create a new key with the ip as value
if item['timestamp'] not in dict1[item['ip']]: # if the timestamp is not in the dictionary
dict1[item['ip']][item['timestamp']] = [] # create a new key with the timestamp as value
dict1[item['ip']][item['timestamp']].append(item['message']) # append the message to the list
return dict1
Then one gets the following
dict = todict(list1)
[Out]:
{'11.22.33.42': {1665480231698: ['{"body": "Idle time larger than time period. '
'retry:5"}']},
'11.22.33.44': {1665480231698: ['{"body": "Idle time larger than time period. '
'retry:0"}'],
1665480231699: ['{"body": "Idle time larger than time period. '
'retry:0"}']}}