Home > database >  Combining multiple lists with strings in haskell
Combining multiple lists with strings in haskell

Time:03-24

For an assignment im trying to combine 4 lists of scraped data into 1. All 4 of them are ordered correctly and shown below.

["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
["Directie","Bot","CB","Moniek","Christian"]

My desired output would be like this

[["Een gezonde samenleving? Het belang van sporten wordt onderschat", "Teamsport", "16 maar 2022", "Directie"], [...], [...], [...], [...]]

I've tried some of the solutions found on the internet but i don't understand some of them and most of them are about 2 lists or give errors when i try to implement them.

For more reference, my code looks like this:

urlString :: String
urlString = "https://www.example.com"

--Main function in which we call the other functions
main :: IO()
main = do
    resultTitle <- scrapeURL urlString scrapeHANTitle
    resultSubtitle <- scrapeURL urlString scrapeHANSubtitle
    resultDate <- scrapeURL urlString scrapeHANDate
    resultAuthor <- scrapeURL urlString scrapeHANAuthor
    print resultTitle
    print resultSubtitle
    print resultDate
    print resultAuthor

scrapeHANTitle :: Scraper String [String]
scrapeHANTitle =
    chroots ("div" @: [hasClass "card-news__body"]) scrapeTitle

scrapeHANSubtitle :: Scraper String [String]
scrapeHANSubtitle =
    chroots ("div" @: [hasClass "card-news__body"]) scrapeSubTitle

scrapeHANDate :: Scraper String [String]
scrapeHANDate = 
    chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeDate

scrapeHANAuthor :: Scraper String [String]
scrapeHANAuthor =
    chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeAuthor

-- gets the title of news items
-- https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec
-- some titles contain special characters so use this utf8 table to add conversion
scrapeTitle :: Scraper String String
scrapeTitle = do
    text $ "a" @: [hasClass "card-news__body__title"]

-- gets the subtitle of news items
scrapeSubTitle :: Scraper String String
scrapeSubTitle = do
    text $ "span" @: [hasClass "card-news__body__eyebrow"]

--gets the date on which the news item was posted
scrapeDate :: Scraper String String 
scrapeDate = do
    text $ "div" @: [hasClass "card-news__footer__body__date"]

--gets the author of the news item
scrapeAuthor :: Scraper String String 
scrapeAuthor = do
    text $ "div" @: [hasClass "card-news__footer__body__author"]

I also tried the following below but it gave me a bunch of type errors.

mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists = \s1 -> \s2 -> \s3 -> \s4 ->s1    s2    s3    s4

CodePudding user response:

You can make use of the Monoid instance and work with:

mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists s1 s2 s3 s4 = s1 <> s2 <> s3 <> s4

Here you are however scraping the same page, so you can combine the data from the scraper with:

myScraper :: Scraper String [String]
myScraper = do
    da <- scrapeHANTitle
    db <- scrapeHANSubtitle
    dc <- scrapeHANDate
    dd <- scrapeHANAuthor
    return da    db    dc    dd

and then run this with:

main :: IO()
main = do
    result <- scrapeURL urlString myScraper
    print result

or shorter:

main :: IO()
main = scrapeURL urlString myScraper >>= print

CodePudding user response:

You can combine four lists using zip4 from Data.List.

import Data.List

list1 = ["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
list2 = ["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
list3 = ["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
list4 = ["Directie","Bot","CB","Moniek","Christian"]

result = zip4 list1 list2 list3 list4

result2 = [[x1,x2,x3,x4] | (x1,x2,x3,x4) <- zip4 list1 list2 list3 list4]

The two results differ slightly. Result result creates a list of tuples. Result result2 creates a list of lists, as requested. A list of tuples is probably better, because:

  • The list can contain any number of values, all of the same type (Haskell lists are homogenous)
  • Tuples can contain any types, so more flexibility
  • Tuples with two values are a different type than tuples with three values, so if you want collections of four values using tuples stops the user squeezing in a collection of three values or five values
  • Related