Home > Net >  Webscraping Pokemon Data
Webscraping Pokemon Data

Time:04-04

I am trying to find out the number of moves each Pokemon (first generation) could learn.

I found the following website that contains this information: enter image description here

In the end, I would like to add a column to the earlier data frame that contains the number of moves each Pokemon can learn. For example, something that looks like this:

> head(pokemon_websites)
                      template_1      names template_2                                     full_website number_of_moves
1 https://pokemondb.net/pokedex/  Bulbasaur   /moves/1  https://pokemondb.net/pokedex/Bulbasaur/moves/1              24
2 https://pokemondb.net/pokedex/    Ivysaur   /moves/1    https://pokemondb.net/pokedex/Ivysaur/moves/1              ???
3 https://pokemondb.net/pokedex/   Venusaur   /moves/1   https://pokemondb.net/pokedex/Venusaur/moves/1              ???
4 https://pokemondb.net/pokedex/ Charmander   /moves/1 https://pokemondb.net/pokedex/Charmander/moves/1              ???
5 https://pokemondb.net/pokedex/ Charmeleon   /moves/1 https://pokemondb.net/pokedex/Charmeleon/moves/1              ???
6 https://pokemondb.net/pokedex/  Charizard   /moves/1  https://pokemondb.net/pokedex/Charizard/moves/1              ???
  • Is there a way to webscrape this data in R, count the number of moves for each of the 150 Pokemon, and then place this move count into a column?

Right now I am doing this by hand and it is taking a long time! Also, I have heard some websites do not allow for automated webscraping - if this website (https://pokemondb.net/pokedex/game/red-blue-yellow) does not allow webscraping, I can try to find another website that might allow it.

Thank you!

CodePudding user response:

You can scrape all the tables for each of the pokemen using something like this:

tables =lapply(pokemon_websites$full_website,function(link) {
  tryCatch(
    read_html(link) %>% html_nodes("table") %>% html_table(),
    error = function(e) {}, warning=function(w) {}
  )
})

However, note that the number of tables returned differs for each of the pokemon. For example the first has 6 tables - the first three of those are for Red/Blue, the second three of those are for Yellow.

lengths(tables)

  [1] 6 6 6 6 6 6 6 6 6 2 4 7 2 4 8 6 6 6 4 4 6 6 6 6 6 8 6 6 0 4 8 4 8 6 8 4 6 6 8 4 4 6 6 8 6 6 5 5 5 5 4 4 6 6 6
 [56] 6 4 6 6 6 8 6 6 6 6 6 6 6 6 8 6 6 6 6 6 4 4 6 6 6 6 0 6 6 6 6 4 4 6 8 4 4 6 6 6 6 6 6 6 6 4 8 6 7 6 6 6 4 4 6
[111] 6 6 6 6 6 6 6 6 6 8 0 6 4 6 6 6 6 2 8 6 2 4 8 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

CodePudding user response:

Since the OP wants to count only the moves in Red/Blue tab we can do the following, (If you need moves from both the tabs follow @langtang answer)

tables1 =lapply(pokemon_websites$full_website, function(x){
  tryCatch( x %>%  read_html() %>% html_nodes('.active')  %>% html_nodes('.resp-scroll') %>% html_table(),
    error = function(e) NULL
  )
})

moves= lapply(tables1, function(x) lapply(x, function(x) dim(x)[1]))

moves = lapply(moves, unlist, use.names=FALSE) 
moves = lapply(moves, sum) %>% unlist()
[1] 24 25 27 32 33 37 32 33 37  2  3 30  2  3 26 22 23 25 24 27 21 23 24 26 29 30 27 29  0 28 43 28 44 41 42 22 23 40 41 19 22 21 23 25 23 26 22 29 20 23 24
 [52] 27 31 34 31 34 23 25 25 36 37 25 34 35 29 31 32 23 24 26 28 31 30 31 33 22 26 37 46 23 26  0 23 26 25 28 22 24 27 29 20 20 32 25 33 36 25 27 24 27 24 29
[103] 32 35 26 26 37 19 21 27 42 44 24 36 22 24 25 27 33 34  0 21 34 35 30 23 27  2 34 34  1 19 32 32 28 30 22 28 22 30 24 43 25 25 22 30 32 36 45 60
  • Related