[{'id': 123,
'type': 'salary', #Parent node
'tx': 'house',
'sector': 'EU',
'transition': [{'id': 'hash', #Child node
'id': 123,
'type': 'salary',
'tx': 'house' }]},
{'userid': 123,
'type': 'salary', #Parent node
'tx': 'office',
'transition': [{'id': 'hash', # Child node
'id': 123,
'type': 'salary',
'tx': 'office'}]}]
As a pandas column ('info'
) I have some information stored as a nested list of dictionary like the example above.
What I'm trying to do is a boolean condition weather this list has the following attributes:
- More than one
type == salary
in any of all parents nodes - Field
'tx'
is different in any of all parents nodes
So far I've tried flatten a list and filter but is not solving first and seconds nodes
a = df.iloc[0].info
values = [item for sublist in [[list(i.values()) for i in a]][0]for item in sublist]
CodePudding user response:
def check_conditions(row):
a = row['info']
types = [node['type'] for node in a if 'type' in node]
salary_count = sum(1 for t in types if t == 'salary')
tx_values = [node['tx'] for node in a if 'tx' in node]
unique_tx_values = set(tx_values)
return salary_count > 1 and len(unique_tx_values) > 1
# Apply the custom function to the DataFrame and create a new column
df['conditions_met'] = df.apply(check_conditions, axis=1)
CodePudding user response:
If you want to one line solution, you can use:
df['check'] = df['info'].apply(lambda x: True if sum([1 if i['type']=='salary' else 0 for i in x]) > 0 and [i['tx'] for i in x].count([i['tx'] for i in x][0]) != len([i['tx'] for i in x]) else False)
or (expanded):
def check(x):
total_salary = sum([1 if i['type']=='salary' else 0 for i in x])
tx_list = [i['tx'] for i in x]
tx_check = tx_list.count(tx_list[0]) != len(tx_list)
if total_salary > 1 and tx_check:
return True
else:
return False
df['check'] = df['info'].apply(check)