Home > OS >  Check for list of list and list with dict
Check for list of list and list with dict

Time:05-09

I have converted some XML files with xmltodict, to JSON files. The JSON files have a lot of "P" values. This "P" values can contain:

  • a list with strings
  • a list with dict
  • a list with a list of dict.
  • or a list with null / none and a string.

If the list contain only strings or if the list contain only stings and null / none. Then I will have it converted with join. If the list contain dict or list(s) then it will skip without processing.

How can I do that?

def recursive_iter(obj):
    if isinstance(obj, dict):
        for item in obj.values():
            if "P" in obj and isinstance(obj["P"], list) and not isinstance(obj["P"], dict):
                #need to add a check for not dict and list in list
                obj["P"] = " ".join([str(e) for e in obj["P"]])
            else:
                yield from recursive_iter(item)
    elif any(isinstance(obj, t) for t in (list, tuple)):
        for item in obj:
            yield from recursive_iter(item)
    else:
        yield obj

This I will have as a string

{"SHORT_DESCR": {"P": ["Bla Bla"]}}
{"SHORT_DESCR": {"P": [null,"Bla bla"]}}

This will be skipped without processing

{"CPV_CODE":{"CODE":79540000}}
[{"CPV_CODE":{"CODE":79530000}},{"CPV_CODE":{"CODE":79540000}}]

CodePudding user response:

Since you want to find the list with strings

if ("P" in obj) and isinstance(obj["P"], list):
    if all([isinstance(z, str) for z in obj["P"]]):
        ...  # keep list with strings

is it what you want?

CodePudding user response:

Let's start with the two conditions:

  1. the list contains only strings or the list contains only strings and null / none
  2. the list contains dict or list(s)

The first subcondition of the first condition is covered by the second subcondition, so #1 can be simplified to:

  1. the list contains only strings or None

Now let's rephrase them in something resembling a first order logic:

  1. All the list items are None or strings.
  2. Some list item is a dict or a list.

The way that condition #2 is written, it could use an "All" quantifier, but in the context of the operation (whether or not to join the list items), a "some" is appropriate, and more closely aligns with the negation of condition 1 ("Some list item is not None or a string"). Also, it allows for an illustration of another implementation (shown below).

These two conditions are mutually exclusive, though not necessarily exhaustive. To simplify matters, let's assume that, in practice, these are the only two possibilities. Leaving aside the quantifiers ("All", "Some"), these are easily translatable into generator expressions:

  1. (None == item or isinstance(item, str) for item in items)
  2. (isinstance(item, (dict, list)) for item in items)

Note that isinstance accepts a tuple of types (which basically functions as a union type) for the second argument, allowing multiple types to be checked in one call.

The "All" and "Some" quantifiers are expressed as the all and any functions, which take iterables (such as what is produced by generator expressions):

  1. all(item is None or isinstance(item, str) for item in items)
  2. any(isinstance(item, (dict, list)) for item in items)

Abstracting these expressions into functions gives two options for the implementation:

# 1
def shouldJoin(items):
    return all([item is None or isinstance(item, str) for item in items])

# 2
def shouldJoin(items):
    return not any([isinstance(item, (dict, list)) for item in items])

If you want a more general version of condition #2, you can use container abstract base classes:

import collections.abc as abc

def shouldJoin(items):
    return not any(isinstance(item, (abc.Mapping, abc.MutableSequence)) for item in items)

Both str and list share many abstract base classes; MutableSequence is the one that is unique to list, so that is what's used in the sample. To see exactly which ABCs each concrete type descends from, you can play around with the following:

import collections.abc as abc
ABCMeta = type(abc.Sequence)
abcs = {name: val for (name, val) in abc.__dict__.items() if isinstance(val, ABCMeta)}

def abcsOf(t):
    return {name for (name, kls) in abcs.items() if issubclass(t, kls)}

# examine ABCs
abcsOf(str)
abcsOf(list)
# which ABCs does list descend from, that str doesn't?
abcsOf(list) - abcsOf(str)
# result: {'MutableSequence'}
abcsOf(tuple) - abcsOf(str)
# result: set() (the empty set)

Note that it's not possible to distinguish strs from tuples using just ABCs.

  • Related