The enclosed code is an attempt to extract data from an api, but when I try to paginate and bind the rows, the row index duplicates posing the below error:
**Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed**
**In addition: Warning message: non-unique values when setting 'row.names':**
The code is:
df = tibble()
for (i in seq(from = 0, to = 620, by = 24)) {
linky = paste0("https://www.rightmove.co.uk/api/_search?locationIdentifier=REGION^94405&numberOfPropertiesPerPage=24&radius=0.0&sortType=2&index=",i,"&includeSSTC=false&viewType=LIST&channel=BUY&areaSizeUnit=sqft¤cyCode=GBP&isFetching=false")
pge <- jsonlite::fromJSON(linky)
props <- pge$properties
print(linky)
Sys.sleep(runif(1, 2.34, 6.19))
df = rbind(df, tibble(props))
print(paste("Page:", i))
}
HA_area_ <- df
CodePudding user response:
As the error indicates due to different column names the dataframes can't be bound together. Below are the column names for first two dataframes.
[[1]]
[1] "id" "bedrooms" "bathrooms" "numberOfImages"
[5] "numberOfFloorplans" "numberOfVirtualTours" "summary" "displayAddress"
[9] "countryCode" "location" "propertyImages" "propertySubType"
[13] "listingUpdate" "premiumListing" "featuredProperty" "price"
[17] "customer" "distance" "transactionType" "productLabel"
[21] "commercial" "development" "residential" "students"
[25] "auction" "feesApply" "feesApplyText" "displaySize"
[29] "showOnMap" "propertyUrl" "contactUrl" "staticMapUrl"
[33] "channel" "firstVisibleDate" "keywords" "keywordMatchType"
[37] "saved" "hidden" "onlineViewingsAvailable" "lozengeModel"
[41] "hasBrandPlus" "propertyTypeFullDescription" "addedOrReduced" "formattedDistance"
[45] "heading" "enhancedListing" "displayStatus" "formattedBranchName"
[49] "isRecent"
[[2]]
[1] "id" "bedrooms" "bathrooms" "numberOfImages"
[5] "numberOfFloorplans" "numberOfVirtualTours" "summary" "displayAddress"
[9] "countryCode" "location" "propertyImages" "propertySubType"
[13] "listingUpdate" "premiumListing" "featuredProperty" "price"
[17] "customer" "distance" "transactionType" "productLabel"
[21] "commercial" "development" "residential" "students"
[25] "auction" "feesApply" "feesApplyText" "displaySize"
[29] "showOnMap" "propertyUrl" "contactUrl" "staticMapUrl"
[33] "channel" "firstVisibleDate" "keywords" "keywordMatchType"
[37] "saved" "hidden" "onlineViewingsAvailable" "lozengeModel"
[41] "hasBrandPlus" "displayStatus" "formattedBranchName" "addedOrReduced"
[45] "isRecent" "formattedDistance" "propertyTypeFullDescription" "enhancedListing"
[49] "heading"
You can see different names of column at certain positions.
Instead of rbind
we can use lapply
and store results in a list.
Wee shall create function f1
to get the dataframe required and then use possibly
to skip any errors.
f1 = function(x){
linky = paste0("https://www.rightmove.co.uk/api/_search?locationIdentifier=REGION^94405&numberOfPropertiesPerPage=24&radius=0.0&sortType=2&index=",x,"&includeSSTC=false&viewType=LIST&channel=BUY&areaSizeUnit=sqft¤cyCode=GBP&isFetching=false")
pge <- jsonlite::fromJSON(linky)
props <- pge$properties
print(linky)
Sys.sleep(runif(1, 2.34, 6.19))
print(paste("Page:", x))
return(props)
}
x = seq(from = 0, to = 620, by = 24)
df = lapply(x, possibly(f1, NA))
CodePudding user response:
library(data.table)
dt <- lapply(seq(from = 0, to = 620, by = 24), function(i) {
uri <- paste0("https://www.rightmove.co.uk/api/_search?locationIdentifier=REGION^94405&numberOfPropertiesPerPage=24&radius=0.0&sortType=2&index=", i,"&includeSSTC=false&viewType=LIST&channel=BUY&areaSizeUnit=sqft¤cyCode=GBP&isFetching=false")
as.data.table(jsonlite::fromJSON(uri)$properties)
})
dt <- rbindlist(dt, fill = T)