I have converted some XML files with xmltodict
, to JSON files.
The JSON files have a lot of "P"
values.
This "P"
values can contain:
- a list with strings
- a list with dict
- a list with a list of dict.
- or a list with null / none and a string.
If the list contain only strings or if the list contain only stings and null / none. Then I will have it converted with join. If the list contain dict or list(s) then it will skip without processing.
How can I do that?
def recursive_iter(obj):
if isinstance(obj, dict):
for item in obj.values():
if "P" in obj and isinstance(obj["P"], list) and not isinstance(obj["P"], dict):
#need to add a check for not dict and list in list
obj["P"] = " ".join([str(e) for e in obj["P"]])
else:
yield from recursive_iter(item)
elif any(isinstance(obj, t) for t in (list, tuple)):
for item in obj:
yield from recursive_iter(item)
else:
yield obj
This I will have as a string
{"SHORT_DESCR": {"P": ["Bla Bla"]}}
{"SHORT_DESCR": {"P": [null,"Bla bla"]}}
This will be skipped without processing
{"CPV_CODE":{"CODE":79540000}}
[{"CPV_CODE":{"CODE":79530000}},{"CPV_CODE":{"CODE":79540000}}]
CodePudding user response:
Since you want to find the list with strings
if ("P" in obj) and isinstance(obj["P"], list):
if all([isinstance(z, str) for z in obj["P"]]):
... # keep list with strings
is it what you want?
CodePudding user response:
Let's start with the two conditions:
- the list contains only strings or the list contains only strings and null / none
- the list contains dict or list(s)
The first subcondition of the first condition is covered by the second subcondition, so #1 can be simplified to:
- the list contains only strings or
None
Now let's rephrase them in something resembling a first order logic:
- All the list items are
None
or strings. - Some list item is a
dict
or alist
.
The way that condition #2 is written, it could use an "All" quantifier, but in the context of the operation (whether or not to join the list items), a "some" is appropriate, and more closely aligns with the negation of condition 1 ("Some list item is not None
or a string"). Also, it allows for an illustration of another implementation (shown below).
These two conditions are mutually exclusive, though not necessarily exhaustive. To simplify matters, let's assume that, in practice, these are the only two possibilities. Leaving aside the quantifiers ("All", "Some"), these are easily translatable into generator expressions:
(None == item or isinstance(item, str) for item in items)
(isinstance(item, (dict, list)) for item in items)
Note that isinstance
accepts a tuple of types (which basically functions as a union type) for the second argument, allowing multiple types to be checked in one call.
The "All" and "Some" quantifiers are expressed as the all
and any
functions, which take iterables (such as what is produced by generator expressions):
all(item is None or isinstance(item, str) for item in items)
any(isinstance(item, (dict, list)) for item in items)
Abstracting these expressions into functions gives two options for the implementation:
# 1
def shouldJoin(items):
return all([item is None or isinstance(item, str) for item in items])
# 2
def shouldJoin(items):
return not any([isinstance(item, (dict, list)) for item in items])
If you want a more general version of condition #2, you can use container abstract base classes:
import collections.abc as abc
def shouldJoin(items):
return not any(isinstance(item, (abc.Mapping, abc.MutableSequence)) for item in items)
Both str
and list
share many abstract base classes; MutableSequence
is the one that is unique to list
, so that is what's used in the sample. To see exactly which ABCs each concrete type descends from, you can play around with the following:
import collections.abc as abc
ABCMeta = type(abc.Sequence)
abcs = {name: val for (name, val) in abc.__dict__.items() if isinstance(val, ABCMeta)}
def abcsOf(t):
return {name for (name, kls) in abcs.items() if issubclass(t, kls)}
# examine ABCs
abcsOf(str)
abcsOf(list)
# which ABCs does list descend from, that str doesn't?
abcsOf(list) - abcsOf(str)
# result: {'MutableSequence'}
abcsOf(tuple) - abcsOf(str)
# result: set() (the empty set)
Note that it's not possible to distinguish str
s from tuple
s using just ABCs.