Home > database >  How can I filter out duplicate ids in a loop from list
How can I filter out duplicate ids in a loop from list

Time:10-16

I have such a list at hand. In this list, I want to filter the deposits under each witdrawal by removing the same ones from another list. This cluster is currently clustered over 2 withdrawals, but this may vary. Therefore, as much as a withdrawal cluster in one cycle, the deposit in one withdrawal should not be in another withdrawal cluster. For this, I tried various lambda functions over deposit id, but I could not get the desired output. How can I provide this?

exampleList = [
    {
      "withdrawal": {
        "amount": 250,
        "id": 70916631583,
        "date": "31-05-22 - 16:14:08",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    },
    {
      "withdrawal": {
        "amount": 220,
        "id": 71019072820,
        "date": "01-06-22 - 14:28:40",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71033338591,
          "amount": 100,
          "date": "01-06-22 - 17:03:19",
          "paytype": "deposit"
        },
        {
          "id": 71033144597,
          "amount": 250,
          "date": "01-06-22 - 17:01:20",
          "paytype": "deposit"
        },
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    }
  ]

Example Output:

exampleOutputList = [
    {
      "withdrawal": {
        "amount": 250,
        "id": 70916631583,
        "date": "31-05-22 - 16:14:08",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    },
    {
      "withdrawal": {
        "amount": 220,
        "id": 71019072820,
        "date": "01-06-22 - 14:28:40",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71033338591,
          "amount": 100,
          "date": "01-06-22 - 17:03:19",
          "paytype": "deposit"
        },
        {
          "id": 71033144597,
          "amount": 250,
          "date": "01-06-22 - 17:01:20",
          "paytype": "deposit"
        }
        
      ]
    }
  ]

The deposits with id 71018974368 and 71018971332 that I show in the sample printout are not available in the next one as they were in the previous withdrawal cluster. This is exactly what I wanted to do. This withdrawal clustering can be more than 2, so it can vary, so doing this by indexing the elements will not solve my problem.

I tried something like this. I waited for it to resend the ids into an empty list and filter through the loop, but the output I got did not change.

listLen = len(exampleList)
testList = []
if(listLen > 0):
    while listLen > 0:
        listLen -= 1
        deposits = exampleList[listLen]['deposit']
        withDrawal = exampleList[listLen]['withdrawal']
        idList = [x['id'] for x in deposits]
        filterFromList = list(filter(lambda x:x['id'] not in testList, deposits))
        testList.append({"withdrawal" : withDrawal,"deposit" : filterFromList})
        
    print(testList)

Output

[{'withdrawal': {'amount': 220, 'id': 71019072820, 'date': '01-06-22 - 14:28:40', 'paytype': 'withdrawal'}, 'deposit': [{'id': 71033338591, 'amount': 100, 'date': '01-06-22 - 17:03:19', 'paytype': 'deposit'}, {'id': 71033144597, 'amount': 250, 'date': '01-06-22 - 17:01:20', 'paytype': 'deposit'}, {'id': 71018974368, 'amount': 120, 'date': '01-06-22 - 14:27:26', 'paytype': 'deposit'}, {'id': 71018971332, 'amount': 100, 'date': '01-06-22 - 14:27:23', 'paytype': 'deposit'}]}, {'withdrawal': {'amount': 250, 'id': 70916631583, 'date': '31-05-22 - 16:14:08', 'paytype': 'withdrawal'}, 'deposit': [{'id': 71018974368, 'amount': 120, 'date': '01-06-22 - 14:27:26', 'paytype': 'deposit'}, {'id': 71018971332, 'amount': 100, 'date': '01-06-22 - 14:27:23', 'paytype': 'deposit'}]}]

There are repetitive deposit ids and elements as seen in the output.

CodePudding user response:

List = [
    {
      "withdrawal": {
        "amount": 250,
        "id": 70916631583,
        "date": "31-05-22 - 16:14:08",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    },
    {
      "withdrawal": {
        "amount": 220,
        "id": 71019072820,
        "date": "01-06-22 - 14:28:40",
        "paytype": "withdrawal"
      },
      "deposit": [
        {
          "id": 71033338591,
          "amount": 100,
          "date": "01-06-22 - 17:03:19",
          "paytype": "deposit"
        },
        {
          "id": 71033144597,
          "amount": 250,
          "date": "01-06-22 - 17:01:20",
          "paytype": "deposit"
        },
        {
          "id": 71018974368,
          "amount": 120,
          "date": "01-06-22 - 14:27:26",
          "paytype": "deposit"
        },
        {
          "id": 71018971332,
          "amount": 100,
          "date": "01-06-22 - 14:27:23",
          "paytype": "deposit"
        }
      ]
    }
  ]

def func(d):
     if type(d)==list:

          for i in reversed(range(len(d))):
               v=d[i]
       
               if v.get('id')  in (71018974368,
                               71018971332):
                    d.pop(i)
               else:
                    func(v)
     elif type(d)==dict:
          for k,v in d.items():
               func(v)

func(List)
print(List)

CodePudding user response:

You could keep a set of already-seen ids as you traverse the data. For each cluster, keep a side list of ids not seen and replace the the "deposit" list before advancing to the next cluster. This is a lot easier than trying to track indexes of the nested collections.

seen = set()

for cluster in exampleList:
    filtered = []
    for deposit in cluster["deposit"]:
        if deposit["id"] not in seen:
            seen.add(deposit["id"])
            filtered.append(deposit)
    cluster["deposit"][:] = filtered
  • Related