I am trying to return all phrases that matches a pattern using python re. Here's an example of the code:
mlocations=requests.get("https://m.happyfresh.id/supplier/tip-top-hfc?tracking_source=backlink_storehome").text
data = re.findall(r'(?={"id":)(.*?)(?=","address1":)', mlocations)
here's a snippet of the mlocations
~50%","seo_details":null,"store_categories_name":["Daily Basic Needs","Supermarket"]}},{"id":11649,"name":"HappyFresh Supermarket Depok","address1":"Jl. Gas Alam Raya No.90, RW.5, Curug, Kec. Cimanggis, Kota Depok, Jawa Barat","city":null,"zipcode":"16454","phone":"","lat":-6.38367246279211,"lon":106.876803265144,"slug":"happyfresh-supermarket-depok","photo":null,"state_name":"Depok","supplier":{"id":3468,"name":"HappyFresh Supermarket - ID","slug":"tip-top-hfc","supplier_type":"warehouse","instant_delivery":true,"delivery_time":"","delivery_price":"","brand_store_image":null,"photo":"https://cdn.happyfresh.com/spree/suppliers/photos/83817f71b5105cea577fdc3b7269004e97e27dcc-medium.jpg?1636468094","square_background":"#ffffff","square_photo":"https://cdn.happyfresh.com/spree/suppliers/square_photos/01c5f038b42d965441eb4ef55793ebb3e16d4213-medium.png?1636468095","store_photo":null,"background_square_photo":{"mini_url":"https://cdn.happyfresh.com/spree/suppliers/background_square_photos/ee855daf2c26df788660693c7c7c4f590fa6b24d-mini.png?1625202943","small_url":"https://cdn.happyfresh.com/spree/suppliers/background_square_photos/068a23faf0395d66a99d3322c52d4b27803ed9c3-small.png?1625202943","medium_url":"https://cdn.happyfresh.com/spree/suppliers/background_square_photos/8e33766f8f2957f2f7df1c6190c4edb700438e6f-medium.png?1625202943","large_url":"https://cdn.happyfresh.com/spree/suppliers/background_square_photos/0b790dd195f28fb2788cc4567e72b1c7e260505e-large.png?1625202943"},"display_promotion_label":"Diskon ~50%","seo_details":null,"store_categories_name":["Daily Basic Needs","Supermarket"]}},{"id":6501,"name":"HappyFresh Supermarket Cilandak","address1":"Pergudangan Perum Peruri, Gudang 7, Jl Lebak Bulus I, Cilandak, Jakarta Selatan ","city":"Jakarta","zipcode":"12430","phone":"","lat":-6.29639070332991,"lon":106.79448776706,"slug":"happyfresh-supermarket-cilandak","photo":null,"state_name":"Jakarta Selatan","supplier":{"id":3468,"name":"HappyFresh Supermarket - ID","slug":"tip-top-
it's supposed to return 2 items :
11649,"name":"HappyFresh Supermarket Depok"
6501,"name":"HappyFresh Supermarket Cilandak"
However it returns all phrases that is in range of id and address1. How do you return back just the items that is between the {"id": and "address1"?
CodePudding user response:
Basically, don't parse json with regex, use json
module:
import re
import json
import requests
mlocations = requests.get(
"https://m.happyfresh.id/supplier/tip-top-hfc?tracking_source=backlink_storehome"
).text
data = re.search(r"window\.__PRELOADED_STATE__ = (.*})", mlocations).group(1)
data = json.loads(data) # <-- parse the initial data with Json
# now you can access data like normal python dict/list etc.
for store in data["supplierReducer"]["supplierLanding"]["stores"]["data"]:
print(store["id"], store["name"])
Prints:
10457 HappyFresh Supermarket Senayan
11649 HappyFresh Supermarket Depok
6501 HappyFresh Supermarket Cilandak
11184 HappyFresh Supermarket Bintaro
10010 HappyFresh Supermarket Sunter
10456 HappyFresh Supermarket Puri
11323 HappyFresh Supermarket Bekasi
10455 HappyFresh Supermarket BSD