For the given sample input list, I want to dedupe the dicts based on the values of the keys code
, tc
, signal
, and in_force
all matching.
sample input:
signals = [
None,
None,
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 1, 'target': 0},
{'code': 'lr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 2, 'target': 1},
{'code': 'sr', 'tc': 1, 'signal': '2U-2D', 'in_force': True, 'trigger': 3, 'target': 2},
None,
{'code': 'sr', 'tc': 0, 'signal': '1-2U-2D', 'in_force': True, 'trigger': 4, 'target': 3},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': False, 'trigger': 5, 'target': 4},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 6, 'target': 5},
None,
{'code': 'lr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 7, 'target': 6},
{'code': 'sr', 'tc': 1, 'signal': '2U-2D', 'in_force': True, 'trigger': 8, 'target': 7},
{'code': 'sr', 'tc': 0, 'signal': '1-2U-2D', 'in_force': True, 'trigger': 9, 'target': 8},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': False, 'trigger': 0, 'target': 9},
]
expected/desired output:
[
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 1, 'target': 0},
{'code': 'lr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 2, 'target': 1},
{'code': 'sr', 'tc': 1, 'signal': '2U-2D', 'in_force': True, 'trigger': 3, 'target': 2},
{'code': 'sr', 'tc': 0, 'signal': '1-2U-2D', 'in_force': True, 'trigger': 4, 'target': 3},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': False, 'trigger': 5, 'target': 4},
]
The order of the list does not need to be preserved, and whether it returns the 1st or nth matching dict in the list does not matter.
I could make a very verbose version of this reference code that creates each list of matching key/values, but I feel like there's got to be a better way.
new_list = []
for position, signal in enumerate(signals):
if type(signal) == dict:
if {
key: value
for key, value in signal.items()
if signal["code"] == "sr"
and signal["tc"] == 0
and signal["signal"] == "2U-2D"
and signal["in_force"] == True
}:
new_list.append(signal)
CodePudding user response:
I'd suggest something like this, with only Python's standard library:
result = []
seen = set()
for s in signals:
if not isinstance(s, dict): continue
signature = (s['code'], s['tc'], s['signal'], s['in_force'])
if signature in seen: continue
seen.add(signature)
result.append(s)
CodePudding user response:
I don't know if that is wanted but pandas
could be come in quite handy here. Also if you have some other tasks to do with the data, a dataframe is a convenient way to do it.
import pandas as pd
# filter None to only have a list of dicts, then create a df with it
df = pd.DataFrame(filter(None,signals))
out = df.drop_duplicates(subset=['code', 'tc', 'signal', 'in_force'], keep='first')
out.to_dict('records')
Output:
[{'code': 'sr',
'tc': 0,
'signal': '2U-2D',
'in_force': True,
'trigger': 1,
'target': 0},
{'code': 'lr',
'tc': 0,
'signal': '2U-2D',
'in_force': True,
'trigger': 2,
'target': 1},
{'code': 'sr',
'tc': 1,
'signal': '2U-2D',
'in_force': True,
'trigger': 3,
'target': 2},
{'code': 'sr',
'tc': 0,
'signal': '1-2U-2D',
'in_force': True,
'trigger': 4,
'target': 3},
{'code': 'sr',
'tc': 0,
'signal': '2U-2D',
'in_force': False,
'trigger': 5,
'target': 4}]
CodePudding user response:
import pandas as pd
new_list = pd.Series([s for s in signals if isinstance(s, dict)])
keys = ['code', 'tc', 'signal', 'in_force']
idx = new_list.apply(lambda x: {x[k] for k in keys}).duplicated()
new_list = new_list[idx].tolist()
CodePudding user response:
You could use pandas dataframe to drop duplicates using df.duplicated()
import pandas as pd
signals = [
None,
None,
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 1, 'target': 0},
{'code': 'lr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 2, 'target': 1},
{'code': 'sr', 'tc': 1, 'signal': '2U-2D', 'in_force': True, 'trigger': 3, 'target': 2},
None,
{'code': 'sr', 'tc': 0, 'signal': '1-2U-2D', 'in_force': True, 'trigger': 4, 'target': 3},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': False, 'trigger': 5, 'target': 4},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 6, 'target': 5},
None,
{'code': 'lr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 7, 'target': 6},
{'code': 'sr', 'tc': 1, 'signal': '2U-2D', 'in_force': True, 'trigger': 8, 'target': 7},
{'code': 'sr', 'tc': 0, 'signal': '1-2U-2D', 'in_force': True, 'trigger': 9, 'target': 8},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': False, 'trigger': 0, 'target': 9},
]
signals = [x for x in signals if x != None]
df = pd.DataFrame(signals)
df1 = df[df.duplicated(['code', 'tc', 'signal', 'in_force'])]
print(df1)
code tc signal in_force trigger target
5 sr 0 2U-2D True 6 5
6 lr 0 2U-2D True 7 6
7 sr 1 2U-2D True 8 7
8 sr 0 1-2U-2D True 9 8
9 sr 0 2U-2D False 0 9
And if you need the output to be a list of dictionary, you could do
df1.to_dict()
{'code': {5: 'sr', 6: 'lr', 7: 'sr', 8: 'sr', 9: 'sr'},
'tc': {5: 0, 6: 0, 7: 1, 8: 0, 9: 0},
'signal': {5: '2U-2D', 6: '2U-2D', 7: '2U-2D', 8: '1-2U-2D', 9: '2U-2D'},
'in_force': {5: True, 6: True, 7: True, 8: True, 9: False},
'trigger': {5: 6, 6: 7, 7: 8, 8: 9, 9: 0},
'target': {5: 5, 6: 6, 7: 7, 8: 8, 9: 9}}
CodePudding user response:
I found a solution that fits into 1 line of code and does not use any external libraries.
To begin with, let's filter out all None values:
signals = filter(lambda x: not x is None, signals)
or
signals = [signal for signal in signals if not signal is None]
Now let's create a dict where keys will be string repr
representations of code
, tc
, signal
, and in_force
values of our input dicts (this should work until there's only simple types of values) and the values will be the complete dicts (consistent of all keys). As a dict may not contain several equal keys, all the duplications will be gone:
filter_dict = {repr([signal[key] for key in ('code', 'tc', 'signal', 'in_force')]): signal for signal in signals}
Here's what I've got at this point:
{
"['sr', 0, '2U-2D', True]": {'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 6, 'target': 5},
"['lr', 0, '2U-2D', True]": {'code': 'lr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 7, 'target': 6},
"['sr', 1, '2U-2D', True]": {'code': 'sr', 'tc': 1, 'signal': '2U-2D', 'in_force': True, 'trigger': 8, 'target': 7},
"['sr', 0, '1-2U-2D', True]": {'code': 'sr', 'tc': 0, 'signal': '1-2U-2D', 'in_force': True, 'trigger': 9, 'target': 8},
"['sr', 0, '2U-2D', False]": {'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': False, 'trigger': 0, 'target': 9}
}
Now let's just take the values of that dict, and its all done!:
result = list(filter_dict.values())
All these steps may be joined into 1 line of code:
result = list({repr([signal[key] for key in ('code', 'tc', 'signal', 'in_force')]): signal for signal in signals if not signal is None}.values())
Final result:
[
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 6, 'target': 5},
{'code': 'lr', 'tc': 0, 'signal': '2U-2D', 'in_force': True, 'trigger': 7, 'target': 6},
{'code': 'sr', 'tc': 1, 'signal': '2U-2D', 'in_force': True, 'trigger': 8, 'target': 7},
{'code': 'sr', 'tc': 0, 'signal': '1-2U-2D', 'in_force': True, 'trigger': 9, 'target': 8},
{'code': 'sr', 'tc': 0, 'signal': '2U-2D', 'in_force': False, 'trigger': 0, 'target': 9}
]
May be my solution is not fastest (because I'm using strings) and it may not work with all possible classes that may be in the original dicts (because some classes may not convert into strings correctly by repr
function). But at least it's very simple.
CodePudding user response:
Use filter to skip the None
entries and keep tuples of "seen" values in a set for efficient checking.
import operator
seen = set()
clean = []
# Function to get the values for the keys that we are interested in.
getter = operator.itemgetter('code', 'tc', 'signal', 'in_force')
for signal in filter(None, signals):
if (vals := getter(signal)) in seen:
# We have already got a dict with these values - skip.
continue
seen.add(vals)
clean.append(signal)
assert len(clean) == len(expected)
assert all(item in expected for item in clean)