I want to check whether a list is a valid sequence of chunks, where each chunk begins with some value and ends with the next occurrence of the same value. For example, this is a valid sequence of three chunks:
lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]
\___________/ \_____/ \_______________________/
And this is one is invalid:
lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]
\___________/ \_____/ \_____ ... missing the 2 to end the chunk
I have a solution but it's bad. Do you see something better?
def is_valid(lst):
while lst:
start = lst.pop(0)
if start not in lst:
return False
while lst[0] != start:
lst.pop(0)
lst.remove(start)
return True
# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))
CodePudding user response:
How about this, creating an iter
from the list and searching forward on that iter until the next
matching element is found. Note that this might fail is None
can be an element of the list; then you should rather define and compare against a sentinel obj = object()
.
def is_valid(lst):
it = iter(lst)
for x in it:
if next((y for y in it if y == x), None) is None:
return False
return True
Since we don't actually need the value returned by next
, we can also just use any
instead, at the same time solving the problem of the default
element. Like next
, any
will consume the iterator just as far as the matching element, if any:
def is_valid(lst):
it = iter(lst)
for x in it:
if not any(y == x for y in it):
return False
return True
This can be further shortened using all
instead of the outer for
loop:
def is_valid(lst):
it = iter(lst)
return all(any(y == x for y in it) for x in it)
And this can finally be reduced to the equally cryptic and intriguing:
def is_valid(lst):
it = iter(lst)
return all(x in it for x in it)
Each way, each element is visited exactly once, the original list is not changed, little to no extra space, and IMHO it's even somewhat easy to read and understand.
This never was about speed, but anyway: Here are some benchmarks of the different solutions (and some more variations), running the test cases from the question as well as two random lists of 1,000 integers, one valid and one invalid, 10,000 times, on Python 3.8.10:
# with long lists # only short test lists
1.52 is_valid_index 0.22 is_valid_index
3.28 is_valid_next 0.30 is_valid_next
2.78 is_valid_for_for_else 0.13 is_valid_for_for_else
5.26 is_valid_for_any 0.32 is_valid_for_any
5.29 is_valid_all_any 0.38 is_valid_all_any
3.42 is_valid_all_any_if 0.36 is_valid_all_any_if
2.02 is_valid_all_in 0.18 is_valid_all_in
1.97 is_valid_all_in_if 0.17 is_valid_all_in_if
1.87 is_valid_for_in 0.11 is_valid_for_in
Of course, all are O(n). With the long 1000-element-lists, the solution using index
is fastest, but the one with x in it
is not too bad, either. The any
solutions lag somewhat behind, but are about as fast (or slow) as next
when using a generator with condition, but still slower than when using plain for
loops.
With only the short test-lists, it's a bit different: Here, the solutions using one iterator and for-for-else
and for-in
are fastest by quite some margin.
CodePudding user response:
Here's my take on the problem. I've optimized for readability, not speed (while keeping it in O(n) of course):
def is_valid(sequence):
iterator = iter(sequence)
for element in iterator:
for other in iterator:
if element == other:
break
else:
return False
return True
Each iteration of the outer loop corresponds to a chunk. When we're out of elements here we ended the sequence at a chunk border, and we can return True
. Otherwise, we loop through the iterator until we find a matching element. If we run out of elements (a for-loop that "naturally" terminates, without break
, goes into its else
) we return False
.
And here's another one using itertools
. I would not prefer it over the solution above, mostly because of the arcane use of next
with a sentinel:
from itertools import dropwhile
def is_valid(iterable):
iterator = iter(iterable)
sentinel = object()
for element in iterator:
if next(dropwhile(lambda x: x != element, iterator), sentinel) is sentinel:
return False
return True
CodePudding user response:
Mutating the list with pop(0)
is costly and not needed.
You could use index
... this may be particularly fast when the chunks are large:
def is_valid(lst):
i = 0
n = len(list)
while i < n:
try:
i = lst.index(lst[i], i 1) 1
except:
return False
return True
CodePudding user response:
It seems like you want to make sure the last "chunk" is closed at the end of the list. This ought to do that:
def is_valid(lst):
search = None
paired = True
for item in lst:
if paired:
search = item
paired = False
elif search == item:
paired = True
return paired
This is O(n)
, checks each element only one time, so you won't pay a cost for your start not in lst
check that is expensive for long input lists.
CodePudding user response:
Below is an alternate, recursive solution to the problem. Basically just checks if the next target is in the list, and skips to that index to check again. I am not an expert here, but wanted to try and contribute a different way to solve the question.
def is_valid(
input_list: list,
target_index: int = 0):
# If we have only one element remaining, or if an empty list is passed initially, there cannot be a pair.
if len(input_list) <= 1:
return False
target = input_list[target_index]
search_range = input_list[target_index 1 :]
# print(f"target index: {target_index}")
# print(f"target: {target}")
# print(f"search range: {search_range}")
# print("-------------------------------")
if target in search_range:
found_target_sublist_index = search_range.index(target)
# Plus 2: two indexes start at 0 -> off by two
next_target_index = target_index found_target_sublist_index 2
if next_target_index == len(input_list):
return True
return is_valid(input_list, next_target_index)
else:
return False
test_one = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]
test_two = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]
test_three = ['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']
test_four = ['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']
print(is_valid(test_one))
print(is_valid(test_two))
print(is_valid(test_three))
print(is_valid(test_four))
CodePudding user response:
Here is a recursive version:
def is_valid(lst):
def loop(i, match): # i = index of the current item
# match = item we're looking for to close the chunk
if i >= len(lst): # end of the list reached
return match is None
if match == lst[i]: # close the chunk
return loop(i 1, None)
if match is None: # no chunk opened => open one
return loop(i 1, lst[i])
return loop(i 1, match)
return loop(0, None) # we start at item 0 and for now there is no chunk opened
# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))
CodePudding user response:
The question does not fully explain if we need a greedy solution or not.
Consider an example - [1, 2, 1, 1]
if we consider a greedy approach, the solution will find the first sequence as [1, 2, 1] and will be left with [1]. And hence, will return False.
But without a greedy approach, the solution will consider [1, 2, 1, 1] as a full sequence and will return True.
I ran the solution provided by you and it returns False, so I am assuming we need a greedy approach.
So, here is one possible solution:
def is_valid(lst):
to_find = None
for value in lst:
if to_find is None:
to_find = value
continue
if to_find is value:
to_find = None
return to_find is None
# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))
CodePudding user response:
A short attempt at creating a solution for this:
def isValid(input):
if len(input) == 0:
return True
firstChar = input.pop(0)
if firstChar not in input:
return False
input = input[input.index(firstChar) 1:]
isValid(input)
While I do not think this is the fastest method, I think it is an interesting enough method to include here. Additionally, this can be optimised a bit further by removing the lines:
if firstChar not in input:
return False
And put the code in a try/except block, like so:
def isValid(input):
if len(input) == 0:
return True
firstChar = input.pop(0)
try:
input = input[input.index(firstChar) 1:]
isValid(input)
except:
return False
as this code would give a ValueError
if the index doesnt exist
I haven't tested the exact speed difference at the moment, but I am certain it is not the fastest method, but it should be relatively decent speed-wise.