theurl <- "https://cryptoslam.io/#sales-rankings-24h"
url <- curl(theurl, "rb")
urldata <- readLines(url, warn=FALSE)
data <- readHTMLTable(urldata, stringAsFactors = FALSE)
close(url)
data.2 <- data.frame(Reduce(rbind, data[1]))
data.3 <- data.2 %>% dplyr::select(Collection, Sales, Change..24h.) %>%
head(10) %>% mutate(Sales.numeric = as.numeric(gsub('[$,]', '', Sales)))
The strings in the column, "Collection" are duplicated.
> data.3$Collection
[1] "Bored Ape Yacht ClubBored Ape YC"
[2] "Mutant Ape Yacht ClubMutant Ape Yacht Club"
[3] "CryptoPunksCryptoPunks"
[4] "CloneXCloneX"
[5] "MeebitsMeebits"
[6] "Bored Ape Kennel ClubBored Ape Kennel Club"
[7] "CrypToadzCrypToadz"
[8] "AzukiAzuki"
[9] "World Of WomenWorld Of Women"
[10] "CrabadaCrabada"
Anyway to remove such duplicates?
CodePudding user response:
One way to solve this is by getting names from the website,
library(rvest)
library(dplyr)
name = theurl %>% read_html() %>% html_nodes('.summary-sales-table__column-product-abbreviation') %>% html_text()
#as the data.2 has only 250 entries
name = name[1:250]
data.2$Collection = name
Collection Sales Change..24h. Sales.numeric
1 Bored Ape YC $15,609,329 0.72% 15609329
2 Mutant Ape Yacht Club $13,337,117 438.65% 13337117
3 CryptoPunks $6,188,758 9.88% 6188758
4 CloneX $5,977,297 397.96% 5977297
5 Meebits $5,139,169 35.32% 5139169
6 Bored Ape Kennel Club $3,052,526 392.60% 3052526
7 Azuki $2,736,697 63.56% 2736697
8 World Of Women $2,671,328 36.28% 2671328
9 Crabada $2,665,660 19.88% 2665660
10 RTFKT MNLTH $2,182,638 48.33% 2182638