so recently I was learning python data structure, more specifically, the "doubly linked list". I came across a problem where I need to write code to remove duplicates in a doubly-linked list and I tried to use set() since it doesn't allow duplicates. I compared the length of the sets before adding the new node from the linked list and after adding that node to see if there were any changes with the length, if no, deleting that node since the length didn't change implying that the node is a duplicate. Unexpectedly, this method just doesn't work for some reason.
Here's my remove_duplicate function:
def remove_duplicates(self):
cur = self.head
seen = set()
while cur:
temp = cur
prev_len = len(seen)
seen.add(cur)
after_len = len(seen)
if prev_len == after_len:
self.delete_node(cur)
cur = temp.next
else:
cur = cur.next
Here's the delete_node function:
def delete_node(self, node):
cur = self.head
while cur:
if cur == node and cur == self.head:
# Case 1:
if not cur.next:
cur = None
self.head = None
return
# Case 2:
else:
nxt = cur.next
cur.next = None
nxt.prev = None
cur = None
self.head = nxt
return
elif cur == node:
# Case 3:
if cur.next:
nxt = cur.next
prev = cur.prev
prev.next = nxt
nxt.prev = prev
cur.next = None
cur.prev = None
cur = None
return
# Case 4:
else:
prev = cur.prev
prev.next = None
cur.prev = None
cur = None
return
cur = cur.next
And the node class and linked list class are:
class Node:
def __init__(self, data):
self.data = data
self.next = None
self.prev = None
class DoublyLinkedList:
def __init__(self):
self.head = None
def append(self, data):
if self.head is None:
new_node = Node(data)
self.head = new_node
else:
new_node = Node(data)
cur = self.head
while cur.next:
cur = cur.next
cur.next = new_node
new_node.prev = cur
def prepend(self, data):
if self.head is None:
new_node = Node(data)
self.head = new_node
else:
new_node = Node(data)
self.head.prev = new_node
new_node.next = self.head
self.head = new_node
def print_list(self):
cur = self.head
while cur:
print(cur.data)
cur = cur.next
I've been stuck on this issue for quite a while, so any advice and corrections on my remove_duplicate function will be very appreciated!
CodePudding user response:
There are two issues with your code. The first issue is that you're adding the Node
instance, not the value within the node, to the seen
set. Unless your linked list is circular, you're not going to run into the exact same node again, so you'll never see any duplicates this way.
Try adding the data
attribute to the set instead of the node itself:
prev_len = len(seen)
seen.add(cur.data) # add the data attribute, not the node itself to the set
after_len = len(seen)
As was pointed out in the comments on your question, checking the length of the set is sort of a round-about way of membership testing. A simpler way would be to directly check cur.data in seen
.
The other issue is more subtle, and has to do with how you delete your nodes and how you move to the next entry in the list, after finding and deleting your first duplicate.
Your code does:
temp = cur
and then later:
self.delete_node(cur)
cur = temp.next
The problem is that self.delete_node
modifies the node being deleted by setting its prev
and next
links to None
. Since temp
is just a reference to the node cur
(not a copy), it sees the modified value, and so you can never delete more than one node from the list.
What would work better is saving the next node before the deletion, and then going directly to it:
temp = cur.next
self.delete_node(cur)
cur = temp
But really, you could avoid the issue by making delete_node
not bother messing with the node being removed, since it's no longer linked to by any of the other nodes. The delete_node
method could be radically simplified in a number of other ways into something like this:
def delete_node(self, node): # no need to search for the node, we've already got it!
if node.next:
node.next.prev = node.prev
if node.prev:
node.prev.next = node.next
else:
self.head = node.next
With this deletion logic, you'd no longer need special case code for getting the next node in remove_duplicates
, you would just unconditionally move on to cur.next
.
def remove_duplicates(self):
cur = self.head
seen = set()
while cur:
if cur.data not in seen:
seen.add(cur.data)
else:
self.delete_node(cur)
cur = cur.next # this now works even if we deleted cur