I have the following list of dictionaries:
"entities": [
{
"length": 6,
"offset": 0,
"type": "bold"
},
{
"length": 6,
"offset": 0,
"type": "italic"
},
{
"length": 4,
"offset": 7,
"type": "italic"
}
],
I would like to know how to use this input to derive the following list of dictionaries:
"entities": [
{
"length": 6,
"offset": 0,
"type": "bold_italic"
},
{
"length": 4,
"offset": 7,
"type": "italic"
}
],
CodePudding user response:
Group each entry by their length and offset into a dictionary, noting the seen types in a list. Then, read off the computed result back into a list, creating a new dictionary for each unique length/offset pair and joining all of the types with underscores:
from collections import defaultdict
data = [
{
"length": 6,
"offset": 0,
"type": "bold"
},
{
"length": 6,
"offset": 0,
"type": "italic"
},
{
"length": 4,
"offset": 7,
"type": "italic"
}
]
entry_types = defaultdict(list)
for item in data:
key = item['length'], item['offset']
entry_types[key].append(item['type'])
result = []
for (length, offset), types in entry_types.items():
result.append(dict(length=length, offset=offset, type='_'.join(types)))
print(result)
This outputs:
[{'length': 6, 'offset': 0, 'type': 'bold_italic'}, {'length': 4, 'offset': 7, 'type': 'italic'}]
CodePudding user response:
I'm interpreting your question as, "how do I combine the types of entities with the same length and offset, separating unique types by underscores?" Given that, the following will do:
Your question looked like entities
was a key in a parent dictionary,
but for simplicity, I'm treating it as a stand-alone variable.
In [1]: entities = [
...: {
...: "length": 6,
...: "offset": 0,
...: "type": "bold"
...: },
...: {
...: "length": 6,
...: "offset": 0,
...: "type": "italic"
...: },
...: {
...: "length": 4,
...: "offset": 7,
...: "type": "italic"
...: }
...: ]
In [2]: from collections import defaultdict
In [3]: merged = defaultdict(str)
A defaultdict
is like a dict
, except when a key doesn't exist
a default value will be used instead. In this case, I've specified
that values are strings, so the default will be ""
.
We combine entities by creating a tuple of (length, offset)
and using it as a key into the temporary structure merged
:
In [4]: for e in entities:
...: key = (e["length"], e["offset"])
...: if not e["type"] in merged[key]:
...: if merged[key]: # if the value at merged[key] is not the empty string
...: merged[key] = "_"
...: merged[key] = e["type"]
...:
In [5]: merged
Out[5]: defaultdict(str, {(6, 0): 'bold_italic', (4, 7): 'italic'})
Now, clear entities
and reconstruct it from merged
:
In [6]: entities.clear()
In [7]: for (length, offset), type in merged.items():
...: entities.append({"length": length, "offset": offset, "type": type})
...:
In [8]: entities
Out[8]:
[{'length': 6, 'offset': 0, 'type': 'bold_italic'},
{'length': 4, 'offset': 7, 'type': 'italic'}]