Home > Blockchain >  saving string matches to list of lists
saving string matches to list of lists

Time:10-05

I have a string while looks like this

Name: Thompson shipping co.
17VXCS947
Name: Orange juice no pulp, Price: 7, Weight: 2, Aisle:9, Shelf_life: 30,
67 Name: Orange juice pulp, Price: 7, Weight:2, Aisle:9, Shelf_life:30,
Photo is available,
Photo is available,
56GHIO098
Name: Cranberry Juice, Price: 3, Weight: 1, Aisle:9, Shelf_life:45,
Name: Lemonade, Price:1, Weight:1, Aisle:9, Shelf_life:10,

There are no new line characters and everything is one big string.

My end goal is to save them to an excel sheet. I am trying to either save these to list of lists of dictionary which looks like

[['Name:  Thompson shipping co.'],['Name: Orange juice no pulp', 'Price: 7', 'Weight: 2', 'Aisle:9', 'Shelf_life: 30'],['Name: Orange juice pulp', 'Price: 7', 'Weight:2', 'Aisle:9', 'Shelf_life:30',['.....']]

or a dictionary.

My current solution is to use regex to find Name, Price, Weight, Aisle, Shelf_life with

re.findall('(?<=,)[^,]*Name:[^,]*(?=,)'),re.findall('(?<=,)[^,]*Price:[^,]*(?=,)'),re.findall('(?<=,)[^,]*Weight:[^,]*(?=,)')....

How do I save them to a list of lists or a dict? Thinking out loud, I can count the iterations and save every 5th one to new list but the first Name occurrence is a corner case.

What's the neater way to do this?

CodePudding user response:

Would you please try the following:

import re

str = '''
Name: Thompson shipping co.
17VXCS947
Name: Orange juice no pulp, Price: 7, Weight: 2, Aisle:9, Shelf_life: 30,
67 Name: Orange juice pulp, Price: 7, Weight:2, Aisle:9, Shelf_life:30,
Photo is available,
Photo is available,
56GHIO098
Name: Cranberry Juice, Price: 3, Weight: 1, Aisle:9, Shelf_life:45,
Name: Lemonade, Price:1, Weight:1, Aisle:9, Shelf_life:10,
'''.replace('\n', ' ')

print([re.findall(r'\b\w :\s*[^,] ', x) for x in re.findall(r'\bName:\s*. ?(?=\s*\bName|$)', str)])

Output:

[['Name: Thompson shipping co. 17VXCS947'], ['Name: Orange juice no pulp', 'Price: 7', 'Weight: 2', 'Aisle:9', 'Shelf_life: 30'], ['Name: Orange juice pulp', 'Price: 7', 'Weight:2', 'Aisle:9', 'Shelf_life:30'], ['Name: Cranberry Juice', 'Price: 3', 'Weight: 1', 'Aisle:9', 'Shelf_life:45'], ['Name: Lemonade', 'Price:1', 'Weight:1', 'Aisle:9', 'Shelf_life:10']]
  • The second findall() function creates a list of strings which starts with Name:.
  • The first findall() function creates a list of name: value pairs out of the list items created above.

As seen, the string 17VXCS947 is appended to the first list element. If you want to remove it, we'll need another logic to exclude it.

  • Related