Home > Software design >  Python Dictionary comprehension with condition
Python Dictionary comprehension with condition

Time:12-12

Suppose that I have a dict named data like below:

{
  001: {
    'data': {
      'fruit': 'apple',
      'vegetable': 'spinach'
    },
    'text': 'lorem ipsum',
    'status': 10
  },
  002: {
    .
    .
    .
  }
}

I want to flatten(?) the data key and convert it to this:

{
  001: {
    'fruit': 'apple',
    'vegetable': 'spinach',
    'text': 'lorem ipsum',
    'status': 10
  },
  002: {
    .
    .
    .
  }
}

I am trying to achieve this using dict comprehensions. Below implementation is with for loops:

mydict = {}
for id, values in data.items():
    mydict[id] = {}
    for label, value in values.items():
        if label == 'data':
            for x, y in value.items():
                mydict[id][x] = y
        else:
            mydict[id][label] = value

I tried below comprehension but it gives syntax error:

mydict = {
    id: {x: y} for x, y in value.items() if label == 'data' else {label: value}
    for id, values in data.items() for label, value in values.items()}

Is there a way to achieve this using comprehensions only?

CodePudding user response:

With dict expansions:

mydict = {i:{**v['data'], **{k:u for k, u in v.items() if k != "data"}} for i, v in data.items()}

CodePudding user response:

The if clause in a comprehension (dict, list, set, generator) applies to the iteration itself, it can not be used for the production. For that you need conditionals in the production.

Generally speaking, comprehensions are really a reorganisation of a specific kind of (possibly nested) iterations:

  • a bunch of iterations and conditions, possibly nested
  • a single append/set

So

for a in b:
    if c:
        for d in e:
            for f in g:
                if h:
                    thing.append(i)

can be comprehension-ified, just move the production (i) to the head and put the other bits in a flat sequence:

thing = [
    i
    for a in b
    if c
    for d in e
    for f in g
    if h
]

Now your comprehension makes no sense, because it starts with iterating value, and there's no else in comprehension filter, and even if we add parens {x: y} for x, y in value.items() is not a value. Comprehensions also do not "merge" items, so with:

mydict = {
    id: {label: value}
    for id, values in data.items() for label, value in values.items()
}

Well you'll get only the last {label: value} for each id, because that's how dicts work.

Here if you consider the production loop, it's this:

for id, values in data.items():
    mydict[id] = {}

This means that is your dict comprehension:

mydict = {
    id: {}
    for id, values in data.items()
}

the rest of the iteration is filling the value, so it needs to be a separate iteration inside the production:

mydict = {
    id: {
        label: value ???
        for label, value in values.items()
    }
    for id, values in data.items()
}

In which case you hit the issue that this doesn't quite work, because you can't "conditionally iterate" in comprehensions, it's all or nothing.

Except you can: the right side of in is a normal expression, so you can do whatever you want with it, meaning you can unfold-or-refold:

mydict = {
    id: {
        x: y
        for label, value in values.items()
        for x, y in (value.items() if label == 'data' else [(label, value)])
    }
    for id, values in data.items()
}

This is a touch more expensive in the non-data case as you need to re-wrap the key and value in a tuple and list, but that's unlikely to be a huge deal.

An other alternative, instead of using a conditional comprehension, is to use splatting to merge the two dicts (once of which you create via a comp):

mydict = {
    id: {
        **values['data'],
        **{label: value for label, value in values.items() if label != 'data'}
    }
    for id, values in data.items()
}

This can also be applied to the original to simplify it:

mydict = {}
for id, values in data.items():
    mydict[id] = {}
    for label, value in values.items():
        if label == 'data':
            mydict[id].update(value)
        else:
            mydict[id][label] = value

CodePudding user response:

let me simplify;

sample_data = {
    "001": {
        "data": {
            "fruit": 'apple',
            "vegetable": 'spinach'
        },
        "text": 'lorem ipsum',
        "status": 10
    },
    "002": {
        "data": {
            "fruit": 'apple',
            "vegetable": 'spinach'
        },
        "text": 'lorem ipsum',
        "status": 10
    }
}
for key, row in sample_data.items():
    if 'data' in row.keys():
        info = sample_data[key].pop('data')
        sample_data[key] = {**row, **info}

print(sample_data)
  • Related