Home > front end >  need a help for scraping table data from this website
need a help for scraping table data from this website

Time:10-10

I want to scraping table data from this website by using R language.

my code

library(XML)
url <- "https://www.westmetall.com/en/markdaten.php?action=show_table&field=LME_Cu_cash"
doc <- htmlParse(url)
tableNodes = getNodeSet(doc,"//table")
tb = readHTMLTable(tableNodes[[1]])

but i got a error looks like that enter image description here

CodePudding user response:

You can do it using {rvest} package

library(rvest)

url <- "https://www.westmetall.com/en/markdaten.php?action=show_table&field=LME_Cu_cash"

tables <- read_html(url) |>
  html_table()

The tables list contains all the tables found in the page you can inspect it

str(tables)
#> List of 5
#>  $ : tibble [7 × 4] (S3: tbl_df/tbl/data.frame)
#>   ..$ Official LME-Prices in US Dollar: chr [1:7] "in US Dollar per ton" "Copper" "Tin" "Lead" ...
#>   ..$ 07. October 2022                : chr [1:7] "Settlement Kasse" "7,575.50" "20,000.00" "2,078.00" ...
#>   ..$                                 : chr [1:7] "3 months" "7,554.00" "19,950.00" "2,050.00" ...
#>   ..$                                 : chr [1:7] "Chart\nTable\nAverage" "" "" "" ...
#>  $ : tibble [7 × 4] (S3: tbl_df/tbl/data.frame)
#>   ..$ LME stocks      : chr [1:7] "in tons" "Copper" "Tin" "Lead" ...
#>   ..$ 07. October 2022: chr [1:7] "" "143,775" "4,690" "31,875" ...
#>   ..$ Changes         : chr [1:7] "" "3,575" "15" "0" ...
#>   ..$                 : chr [1:7] "Chart\nTable\nAverage" "" "" "" ...
#>  $ : tibble [3 × 4] (S3: tbl_df/tbl/data.frame)
#>   ..$ Exchange Rates  : chr [1:3] "EUR/USD LME-FX-rate (MTLE)" "ECB-Fixing (14:15 Uhr)" "EUR/USD-Basis DEL-Notiz"
#>   ..$ 07. October 2022: num [1:3] 0.979 0.98 0.979
#>   ..$ 06. October 2022: num [1:3] 0.987 0.986 0.987
#>   ..$                 : logi [1:3] NA NA NA
#>  $ : tibble [15 × 4] (S3: tbl_df/tbl/data.frame)
#>   ..$ German Metal Prices: chr [1:15] "in Euro per 100 kg" "lower Copper WM-Notiz" "higher Copper WM-Notiz" "lower DEL-Notiz (until February 11, 2022)" ...
#>   ..$ 07. October 2022   : chr [1:15] "" "786.54" "789.89" "-" ...
#>   ..$ 06. October 2022   : chr [1:15] "" "797.52" "800.84" "-" ...
#>   ..$                    : chr [1:15] "Chart\nTable\nAverage" "" "" "" ...
#>  $ : tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
#>   ..$ Precious metals : chr [1:5] "Gold London Fixing in USD/oz." "Gold in Euro/kg" "Gold, processed in Euro/kg" "Fine Silver in Euro/kg" ...
#>   ..$ 07. October 2022: chr [1:5] "1,711.50" "55,190.00" "62,080.00" "666.90 / 733.80" ...
#>   ..$ 06. October 2022: chr [1:5] "1,716.00" "54,840.00" "61,670.00" "658.90 / 725.10" ...
#>   ..$                 : logi [1:5] NA NA NA NA NA

Then you just have to pick the table you want and format as you wish

tables[[2]]
#> # A tibble: 7 × 4
#>   `LME stocks` `07. October 2022` Changes  ``                     
#>   <chr>        <chr>              <chr>    <chr>                  
#> 1 in tons      ""                 ""       "Chart\nTable\nAverage"
#> 2 Copper       "143,775"          "3,575"  ""                     
#> 3 Tin          "4,690"            "15"     ""                     
#> 4 Lead         "31,875"           "0"      ""                     
#> 5 Zinc         "53,475"           "150"    ""                     
#> 6 Aluminium    "327,625"          "-1,225" ""                     
#> 7 Nickel       "52,362"           "942"    ""

Created on 2022-10-09 with reprex v2.0.2

  • Related