Combine dictionary string values with same key-CodePudding

I have a json file with a series of dictionaries within a list, like this:

  {"turns": [{
        "speaker": "A",
        "says": "Hello."
        "other": "aaaa"},
      {
        "speaker": "B",
        "says": "Hi."
        "other": "bbbb"},
      {
        "speaker": "B",
        "says": "I'm busy now."
        "other": "ccccc"},
      {
        "speaker": "A",
        "says": "See you later?"
        "other": "dddd"},
      {
        "speaker": "B",
        "says": "Sure."
        "other": "eeee"},
      {
        "speaker": "B",
        "says": "Bye"
        "other": "ffff"},
      {
        "speaker": "A",
        "says": "Bye bye."
        "other": "gggg"}]

I want to combine the keys "says" and "other" keys I might have, when it is the same consecutive "speaker", like so:

  {"turns": [{
        "speaker": "A",
        "says": "Hello."
        "other": "aaaaa"},
      {
        "speaker": "B",
        "says": "Hi. I'm busy now."
        "other": "bbbb cccc"},
      {
        "speaker": "A",
        "says": "See you later?"
        "other": "dddd"},
      {
        "speaker": "B",
        "says": "Sure. Bye"
        "other": "eeee ffff"},
      {
        "speaker": "A",
        "says": "Bye bye."
        "other": "gggg"}]

I am still new to python and dealing with json files, so I honestly am unsure where to even begin. I assume I could use .join() somehow, but I don't know how to check for the same key-value paring appearing consecutively. Can anyone help?

CodePudding user response：

Assuming you have your JSON data loaded into data, you can use the itertools.groupby function to do this:

turns = data['turns']

from itertools import groupby

grouped_turns = groupby(turns, key=lambda e: e['speaker'])  # groups consecutive items based on the 'speaker' value

joined_turns = []
for k, g in grouped_turns:
    turn_group = list(g)  # get all the values in the group
    joined_says = ' '.join(t['says'] for t in turn_group)  # join
    joined_other = ' '.join(t['other'] for t in turn_group)
    joined_turns.append({  # add the joined item
        'speaker': k,
        'says': joined_says,
        'other': joined_other
    })

print(json.dumps(joined_turns, indent=2))

Result:

[
  {
    "speaker": "A",
    "says": "Hello.",
    "other": "aaaa"
  },
  {
    "speaker": "B",
    "says": "Hi. I'm busy now.",
    "other": "bbbb ccccc"
  },
  {
    "speaker": "A",
    "says": "See you later?",
    "other": "dddd"
  },
  {
    "speaker": "B",
    "says": "Sure. Bye",
    "other": "eeee ffff"
  },
  {
    "speaker": "A",
    "says": "Bye bye.",
    "other": "gggg"
  }
]

CodePudding user response：

Try itertools.groupby:

dct = {
    "turns": [
        {"speaker": "A", "says": "Hello.", "other": "aaaa"},
        {"speaker": "B", "says": "Hi.", "other": "bbbb"},
        {"speaker": "B", "says": "I'm busy now.", "other": "ccccc"},
        {"speaker": "A", "says": "See you later?", "other": "dddd"},
        {"speaker": "B", "says": "Sure.", "other": "eeee"},
        {"speaker": "B", "says": "Bye", "other": "ffff"},
        {"speaker": "A", "says": "Bye bye.", "other": "gggg"},
    ]
}

from itertools import groupby

out = []
for s, g in groupby(dct["turns"], lambda d: d["speaker"]):
    g = list(g)
    out.append(
        {
            "speaker": s,
            "says": " ".join(d["says"] for d in g),
            "other": " ".join(d["other"] for d in g),
        }
    )

dct["turns"] = out

print(dct)

Prints:

{
    "turns": [
        {"speaker": "A", "says": "Hello.", "other": "aaaa"},
        {"speaker": "B", "says": "Hi. I'm busy now.", "other": "bbbb ccccc"},
        {"speaker": "A", "says": "See you later?", "other": "dddd"},
        {"speaker": "B", "says": "Sure. Bye", "other": "eeee ffff"},
        {"speaker": "A", "says": "Bye bye.", "other": "gggg"},
    ]
}