Home > Blockchain >  Why is this Swift web scraper not working?
Why is this Swift web scraper not working?

Time:05-18

I am having trouble scraping an image HTML link with a code I found on youtube (https://www.youtube.com/watch?v=0jTyKu9DGm8&list=PLYjXqILgs9uPwYlmSrIkNj2O3dwPCcoBK&index=2). The code works perfectly fine in a playground, but there is something wrong with my implementation into an Xcode project. (More like: im not sure how to implement it into my project :) )

When I ran this code on a Playground it pulled the link that I needed exactly as I needed it to be outputted.

import Foundation

let url = URL(string: "https://guide.michelin.com/th/en/bangkok- 
region/bangkok/restaurant/somtum-khun-kan")

let task = URLSession.shared.dataTask(with: url!) { (data, resp, error) in
    guard let data = data else {
        print("data was nil")
        return
    }
    guard let htmlString = String(data: data, encoding: String.Encoding.utf8) else {
        print("can not cast data into string")
        return
    }

    let leftSideOfTheString = """
    image":"
    """

    let rightSideOfTheString = """
    ","@type
    """

    guard let leftRange = htmlString.range(of: leftSideOfTheString) else {
        print("can not find left range of string")
        return
    }

    guard let rightRange = htmlString.range(of: rightSideOfTheString) else {
        print("can not find right range of string")
        return
    }

    let rangeOfValue = leftRange.upperBound..<rightRange.lowerBound

    print(htmlString[rangeOfValue])
}
task.resume()

I then put the same exact code into a structure containing the code as a parameter and method, like so:

struct ImageLink {

    let url = URL(string: "https://guide.michelin.com/th/en/bangkok-region/bangkok/restaurant/somtum-khun-kan")

    func getImageLink() {
    
        let task = URLSession.shared.dataTask(with: url!) { (data, resp, error) in
            guard let data = data else {
                print("data was nil")
                return
            }
            guard let htmlString = String(data: data, encoding: String.Encoding.utf8) else {
                print("can not cast data into string")
                return
            }
        
            let leftSideOfTheString = """
                image":"
                """
        
            let rightSideOfTheString = """
                ","@type
                """
        
            guard let leftRange = htmlString.range(of: leftSideOfTheString) else {
                print("can not find left range of string")
                return
            }
        
            guard let rightRange = htmlString.range(of: rightSideOfTheString) else {
                print("can not find right range of string")
                return
            }
        
            let rangeOfValue = leftRange.upperBound..<rightRange.lowerBound
        
            print(htmlString[rangeOfValue])
        }
        task.resume()
    }
}

Finally, to check if the code would give me the right link, I made an instance in a View and made a button printing the getImageLink() function like bellow. You'll see in commented out code that I tried displaying the image both by hard coding its link and by inserting the function call. The former worked as expected, the latter did not work.

import SwiftUI

struct WebPictures: View {

    var imageLink = ImageLink()

    var body: some View {
        VStack {
            //AsyncImage(url: URL(string: "\(imageLink.getImageLink())"))
            //AsyncImage(url: URL(string: "https://axwwgrkdco.cloudimg.io/v7/__gmpics__/c8735576e7d24c09b45a4f5d56f739ba?width=1000"))
            Button {
                print(imageLink.getImageLink())
            } label: {
                Text("Print Html")
            }
        }
    }
}

When I click the button to print the link I get the following message:

()
2022-05-16 17:21:30.030264 0800 MichelinRestaurants[35477:925525] [boringssl] 
boringssl_metrics_log_metric_block_invoke(153) Failed to log metrics
https://axwwgrkdco.cloudimg.io/v7/__gmpics__/c8735576e7d24c09b45a4f5d56f739ba?width=1000

And if I click the button for a second time only this gets printed:

()
https://axwwgrkdco.cloudimg.io/v7/__gmpics__/c8735576e7d24c09b45a4f5d56f739ba?width=1000

If anybody knows how to help me out here that would be much appreciated!!

CodePudding user response:

This fails because you do not wait until your func has pulled the link. You are in an async context here. One possible solution:

//Make a class in instead of a struct and inherit from ObservableObject
class ImageLink: ObservableObject {

    let url = URL(string: "https://guide.michelin.com/th/en/bangkok-region/bangkok/restaurant/somtum-khun-kan")
    //Create a published var for your view to get notified when the value changes
    @Published var imageUrlString: String = ""
    func getImageLink() {
    
        let task = URLSession.shared.dataTask(with: url!) { (data, resp, error) in
            guard let data = data else {
                print("data was nil")
                return
            }
            guard let htmlString = String(data: data, encoding: String.Encoding.utf8) else {
                print("can not cast data into string")
                return
            }
        
            let leftSideOfTheString = """
                image":"
                """
        
            let rightSideOfTheString = """
                ","@type
                """
        
            guard let leftRange = htmlString.range(of: leftSideOfTheString) else {
                print("can not find left range of string")
                return
            }
        
            guard let rightRange = htmlString.range(of: rightSideOfTheString) else {
                print("can not find right range of string")
                return
            }
        
            let rangeOfValue = leftRange.upperBound..<rightRange.lowerBound
        
            print(htmlString[rangeOfValue])
            //Assign the scrapped link to the var
            imageUrlString = htmlString[rangeOfValue]
        }
        task.resume()
    }
}

And the view:

struct WebPictures: View {
    //Observe changes from your imagelink class
    @StateObject var imageLink = ImageLink()

    var body: some View {
        VStack {
            AsyncImage(url: URL(string: imageLink.imageUrlString)) // assign imageurl to asyncimage
            //AsyncImage(url: URL(string: "https://axwwgrkdco.cloudimg.io/v7/__gmpics__/c8735576e7d24c09b45a4f5d56f739ba?width=1000"))
            Button {
                imageLink.getImageLink()
            } label: {
                Text("Print Html")
            }
        }
    }
}

Update:

In order to get the link when the view appears call it this way:

        VStack {
            AsyncImage(url: URL(string: imageLink.imageUrlString)) 
        }
         .onAppear{
             if imageLink.imageUrlString.isEmpty{
                 imageLink.getImageLink()
             }
          }
  • Related