For an assignment im trying to combine 4 lists of scraped data into 1. All 4 of them are ordered correctly and shown below.
["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
["Directie","Bot","CB","Moniek","Christian"]
My desired output would be like this
[["Een gezonde samenleving? Het belang van sporten wordt onderschat", "Teamsport", "16 maar 2022", "Directie"], [...], [...], [...], [...]]
I've tried some of the solutions found on the internet but i don't understand some of them and most of them are about 2 lists or give errors when i try to implement them.
For more reference, my code looks like this:
urlString :: String
urlString = "https://www.example.com"
--Main function in which we call the other functions
main :: IO()
main = do
resultTitle <- scrapeURL urlString scrapeHANTitle
resultSubtitle <- scrapeURL urlString scrapeHANSubtitle
resultDate <- scrapeURL urlString scrapeHANDate
resultAuthor <- scrapeURL urlString scrapeHANAuthor
print resultTitle
print resultSubtitle
print resultDate
print resultAuthor
scrapeHANTitle :: Scraper String [String]
scrapeHANTitle =
chroots ("div" @: [hasClass "card-news__body"]) scrapeTitle
scrapeHANSubtitle :: Scraper String [String]
scrapeHANSubtitle =
chroots ("div" @: [hasClass "card-news__body"]) scrapeSubTitle
scrapeHANDate :: Scraper String [String]
scrapeHANDate =
chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeDate
scrapeHANAuthor :: Scraper String [String]
scrapeHANAuthor =
chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeAuthor
-- gets the title of news items
-- https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec
-- some titles contain special characters so use this utf8 table to add conversion
scrapeTitle :: Scraper String String
scrapeTitle = do
text $ "a" @: [hasClass "card-news__body__title"]
-- gets the subtitle of news items
scrapeSubTitle :: Scraper String String
scrapeSubTitle = do
text $ "span" @: [hasClass "card-news__body__eyebrow"]
--gets the date on which the news item was posted
scrapeDate :: Scraper String String
scrapeDate = do
text $ "div" @: [hasClass "card-news__footer__body__date"]
--gets the author of the news item
scrapeAuthor :: Scraper String String
scrapeAuthor = do
text $ "div" @: [hasClass "card-news__footer__body__author"]
I also tried the following below but it gave me a bunch of type errors.
mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists = \s1 -> \s2 -> \s3 -> \s4 ->s1 s2 s3 s4
CodePudding user response:
You can make use of the Monoid
instance and work with:
mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists s1 s2 s3 s4 = s1 <> s2 <> s3 <> s4
Here you are however scraping the same page, so you can combine the data from the scraper with:
myScraper :: Scraper String [String]
myScraper = do
da <- scrapeHANTitle
db <- scrapeHANSubtitle
dc <- scrapeHANDate
dd <- scrapeHANAuthor
return da db dc dd
and then run this with:
main :: IO()
main = do
result <- scrapeURL urlString myScraper
print result
or shorter:
main :: IO()
main = scrapeURL urlString myScraper >>= print
CodePudding user response:
You can combine four lists using zip4
from Data.List
.
import Data.List
list1 = ["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
list2 = ["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
list3 = ["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
list4 = ["Directie","Bot","CB","Moniek","Christian"]
result = zip4 list1 list2 list3 list4
result2 = [[x1,x2,x3,x4] | (x1,x2,x3,x4) <- zip4 list1 list2 list3 list4]
The two results differ slightly. Result result
creates a list of tuples. Result result2
creates a list of lists, as requested. A list of tuples is probably better, because:
- The list can contain any number of values, all of the same type (Haskell lists are homogenous)
- Tuples can contain any types, so more flexibility
- Tuples with two values are a different type than tuples with three values, so if you want collections of four values using tuples stops the user squeezing in a collection of three values or five values