I am trying to solve this problem:
Consider a simple number compression algorithm that works as follows:
111155522500 -> [('1', 4), ('5', 3), ('2', 2), ('5', 1), ('O', 2)]
The algorithm goes from left to right through each digit and returns a list of two-element tuples. Each tuple consists of a digit and the number of repetitions of a given digit until the next, different digit in the number is encountered.
Implement a function called compress () that compresses number as described above.
Examples:
[IN]: compress(lll)
[OUT]: [('l', 3)]
[IN]: conpress(1000000)
[OUT]: [('1', 1), ('O', 6)]
[IN]: conpress(10005000)
[OUT]: [('1', 1), ('O', 3), ('5', 1), ('O', 3)]
My code:
#para funcionar, é e necessário acrescentar um caracter que não existe ao final da string!
def compress(elemento):
elemento = str(elemento)
elemento = elemento "§"# the code only works if I add some char not in the string
saida = []
#cont = 0
contagem = []
for index,ele in enumerate(elemento):
ele = str(ele)
#print(f"contagem: {contagem}")
if len(contagem) == 0 or ele in contagem:
contagem.append(ele)
if ele not in contagem :
saida.append((elemento[index-1],len(contagem)) )
contagem = []
contagem.append(ele)
return saida
compress('1000500011')
OUT:
[('1', 1), ('0', 3), ('5', 1), ('0', 3), ('1', 2)]
The problem: The code only works if I add some char not in the string as "§" Could someone help to solve this problem using the itertools built-in module and the groupby and without this modules?
CodePudding user response:
The earlier post is a perfect example of groupby
. But if you're open to more_itertools, here is another version: run_length
. It basically doing something similar with groupby - compresses an iterable with run-length encoding. It yields groups of repeated items with the count of how many times they were repeated:
It even provides corresponding decode to decompress the compressed one. Please check it out. So there is no need to reinvent the wheels...
from more_itertools import run_length
num = 10005000
compressed = list(run_length.encode(str(num)))
Output:
[('1', 1), ('0', 3), ('5', 1), ('0', 3)]
So running it in IDE like this:
>>> num = 10005000
>>> compressed = list(run_length.encode(str(num)))
>>> compressed
[('1', 1), ('0', 3), ('5', 1), ('0', 3)]
>>> run_length.decode(compressed)
<itertools.chain object at 0x039931A8>
>>> list(run_length.decode(compressed))
['1', '0', '0', '0', '5', '0', '0', '0']
>>> num = [int(x) for x in run_length.decode(compressed)]
>>> num
[1, 0, 0, 0, 5, 0, 0, 0]
CodePudding user response:
Pretty straight forward with itertools
.
def compress(number):
return [
(c, sum(1 for _ in g))
for c, g in itertools.groupby(str(number))
]
demo:
>>> compress(10005000)
[('1', 1), ('0', 3), ('5', 1), ('0', 3)]
CodePudding user response:
Corrections to your code with '> Edit---' comments to highlight changes.
Code
def compress(elemento):
elemento = str(elemento) # > Edit-----not necessary, since already a string
# > Edit-----Remove adding last character
#elemento = elemento "§"# the code only works if I add some char not in the string
saida = []
#cont = 0
contagem = []
for index,ele in enumerate(elemento):
#ele = str(ele) # > Edit-----not necessary, since already a string
#print(f"contagem: {contagem}")
#f len(contagem) == 0 or ele in contagem: # > Edit-----Inefficient to check entire list
if len(contagem) == 0 or ele == contagem[-1]: # More efficient just to compare last element in list
contagem.append(ele)
#if ele not in contagem : # > Edit-----Inefficient to check entire list
if ele != contagem[-1]: # More efficient just to compare last element in list
saida.append((elemento[index-1],len(contagem)) )
contagem = []
contagem.append(ele)
# > Edit-----Not having next if block caused your error
# Add last contagem if it exists
if contagem:
saida.append((elemento[index-1], len(contagem)) )
return saida
compress('1000500011')
Output
[('1', 1), ('0', 3), ('5', 1), ('0', 3), ('1', 2)]
Code Refactoring
Code can be simplified to the following:
def compress(elements):
stack = []
for ele in elements:
if not stack or stack[-1][0] != ele:
stack.append((ele, 1))
else:
stack[-1] = (stack[-1][0], stack[-1][1] 1)
return stack