I have a json file with a series of dictionaries within a list, like this:
{"turns": [{
"speaker": "A",
"says": "Hello."
"other": "aaaa"},
{
"speaker": "B",
"says": "Hi."
"other": "bbbb"},
{
"speaker": "B",
"says": "I'm busy now."
"other": "ccccc"},
{
"speaker": "A",
"says": "See you later?"
"other": "dddd"},
{
"speaker": "B",
"says": "Sure."
"other": "eeee"},
{
"speaker": "B",
"says": "Bye"
"other": "ffff"},
{
"speaker": "A",
"says": "Bye bye."
"other": "gggg"}]
I want to combine the keys "says" and "other" keys I might have, when it is the same consecutive "speaker", like so:
{"turns": [{
"speaker": "A",
"says": "Hello."
"other": "aaaaa"},
{
"speaker": "B",
"says": "Hi. I'm busy now."
"other": "bbbb cccc"},
{
"speaker": "A",
"says": "See you later?"
"other": "dddd"},
{
"speaker": "B",
"says": "Sure. Bye"
"other": "eeee ffff"},
{
"speaker": "A",
"says": "Bye bye."
"other": "gggg"}]
I am still new to python and dealing with json files, so I honestly am unsure where to even begin. I assume I could use .join() somehow, but I don't know how to check for the same key-value paring appearing consecutively. Can anyone help?
CodePudding user response:
Assuming you have your JSON data loaded into data
, you can use the itertools.groupby
function to do this:
turns = data['turns']
from itertools import groupby
grouped_turns = groupby(turns, key=lambda e: e['speaker']) # groups consecutive items based on the 'speaker' value
joined_turns = []
for k, g in grouped_turns:
turn_group = list(g) # get all the values in the group
joined_says = ' '.join(t['says'] for t in turn_group) # join
joined_other = ' '.join(t['other'] for t in turn_group)
joined_turns.append({ # add the joined item
'speaker': k,
'says': joined_says,
'other': joined_other
})
print(json.dumps(joined_turns, indent=2))
Result:
[
{
"speaker": "A",
"says": "Hello.",
"other": "aaaa"
},
{
"speaker": "B",
"says": "Hi. I'm busy now.",
"other": "bbbb ccccc"
},
{
"speaker": "A",
"says": "See you later?",
"other": "dddd"
},
{
"speaker": "B",
"says": "Sure. Bye",
"other": "eeee ffff"
},
{
"speaker": "A",
"says": "Bye bye.",
"other": "gggg"
}
]
CodePudding user response:
Try itertools.groupby
:
dct = {
"turns": [
{"speaker": "A", "says": "Hello.", "other": "aaaa"},
{"speaker": "B", "says": "Hi.", "other": "bbbb"},
{"speaker": "B", "says": "I'm busy now.", "other": "ccccc"},
{"speaker": "A", "says": "See you later?", "other": "dddd"},
{"speaker": "B", "says": "Sure.", "other": "eeee"},
{"speaker": "B", "says": "Bye", "other": "ffff"},
{"speaker": "A", "says": "Bye bye.", "other": "gggg"},
]
}
from itertools import groupby
out = []
for s, g in groupby(dct["turns"], lambda d: d["speaker"]):
g = list(g)
out.append(
{
"speaker": s,
"says": " ".join(d["says"] for d in g),
"other": " ".join(d["other"] for d in g),
}
)
dct["turns"] = out
print(dct)
Prints:
{
"turns": [
{"speaker": "A", "says": "Hello.", "other": "aaaa"},
{"speaker": "B", "says": "Hi. I'm busy now.", "other": "bbbb ccccc"},
{"speaker": "A", "says": "See you later?", "other": "dddd"},
{"speaker": "B", "says": "Sure. Bye", "other": "eeee ffff"},
{"speaker": "A", "says": "Bye bye.", "other": "gggg"},
]
}