Pandas boolean condition from nested list of dictionaries-CodePudding

 [{'id': 123,
  'type': 'salary', #Parent node
  'tx': 'house',
  'sector': 'EU',
  'transition': [{'id': 'hash', #Child node
    'id': 123,
    'type': 'salary',
    'tx': 'house' }]},
 {'userid': 123,
  'type': 'salary', #Parent node
  'tx': 'office',
  'transition': [{'id': 'hash', # Child node
    'id': 123,
    'type': 'salary',
    'tx': 'office'}]}]

As a pandas column ('info') I have some information stored as a nested list of dictionary like the example above.

What I'm trying to do is a boolean condition weather this list has the following attributes:

More than one type == salary in any of all parents nodes
Field 'tx' is different in any of all parents nodes

So far I've tried flatten a list and filter but is not solving first and seconds nodes

a = df.iloc[0].info
values = [item for sublist in [[list(i.values()) for i in a]][0]for item in sublist]

CodePudding user response：

def check_conditions(row):
    a = row['info']
    types = [node['type'] for node in a if 'type' in node]
    salary_count = sum(1 for t in types if t == 'salary')
    tx_values = [node['tx'] for node in a if 'tx' in node]
    unique_tx_values = set(tx_values)
    return salary_count > 1 and len(unique_tx_values) > 1

# Apply the custom function to the DataFrame and create a new column
df['conditions_met'] = df.apply(check_conditions, axis=1)

CodePudding user response：

If you want to one line solution, you can use:

df['check'] = df['info'].apply(lambda x: True if sum([1 if i['type']=='salary' else 0 for i in x]) > 0 and [i['tx'] for i in x].count([i['tx'] for i in x][0]) != len([i['tx'] for i in x])  else False)

or (expanded):

def check(x):
    total_salary = sum([1 if i['type']=='salary' else 0 for i in x])
    tx_list = [i['tx'] for i in x]
    tx_check = tx_list.count(tx_list[0]) != len(tx_list)
    if total_salary > 1 and tx_check:
        return True
    else:
        return False
df['check'] = df['info'].apply(check)