Error when I try to convert XML to a dataframe in R-CodePudding

I am using xml files obatined from

https://eco2mix.rte-france.com/curves/getDonneesMarche?&dateDeb=31/12/2020&dateFin=24/02/2021&mode=NORM&_=1648578231712 (called WEEKS1) and https://eco2mix.rte-france.com/curves/getDonneesMarche?&dateDeb=04/12/2021&dateFin=31/12/2021&mode=NORM&_=1648650611995 (called WEEKS7) y downloaded the files and save them in my local folder.

Using these files I want to extract some information. More specifically a time series, so I use the following code:

library(XML)
library(methods)
library(purrr)

list.filenames<-list.files(pattern = "\\.xml")

France2022<-lapply(list.filenames, function(file) #Reading files in my local repo
  xmlParse(file)
)

France2022<-map(France2022, xmlRoot)

Here I wanted to used an apply in my object France2022 for getting my data:

lapply(6:61, function(root)
  xmlToDataFrame(France2022[[2]][[root]][[7]])) # the second list is associated with WEEKS7

but the following error appears:

Error in (function (classes, fdef, mtable)  : unable to find an inherited method for function ‘xmlToDataFrame’ for signature ‘"NULL", "missing", "missing", "missing", "missing"’

In this point I notice that one of this file has a problem. I do not know what is happening because both files have the same structure. I also tried to read the file using the ´https´ direction, but I have the same error:

F7<-read_xml("https://eco2mix.rte-france.com/curves/getDonneesMarche?&dateDeb=08/10/2021&dateFin=03/12/2021&mode=NORM&_=1648650611994")
F7<-xmlParse(F7)
lapply(6:61, function(root)
  xmlToDataFrame(F7[[root]][[7]]))

CodePudding user response：

You could do the following:

require(tidyverse)
require(xml2)

dat <- read_xml("https://eco2mix.rte-france.com/curves/getDonneesMarche?&dateDeb=31/12/2020&dateFin=24/02/2021&mode=NORM&_=1648578231712")

dat %>% 
  xml_find_first("//donneesMarche") %>% 
  as_list() %>% 
  tibble::as_tibble(.name_repair = "unique") %>% 
  map_df(map_chr, simplify)

Resulting in

# A tibble: 11 × 24
   valeur...1 valeur...2 valeur...3 valeur...4 valeur...5 valeur...6 valeur...7
   <chr>      <chr>      <chr>      <chr>      <chr>      <chr>      <chr>     
 1 43.74      38.01      36.75      33.06      31.67      34         40.44     
 2 43.36      39.78      37.27      34.09      33.29      35.86      42.83     
 3 ND         ND         ND         ND         ND         ND         ND        
 4 38.54      35         32.13      29.24      31.67      34         33.85     
 5 40.09      35.79      33.37      30.27      31.67      34         35.65     
 6 42.11      37.03      35.25      31.82      31.67      34         36.99     
 7 42.11      37.03      35.25      31.82      31.67      34         38.32     
 8 72.36      72.32      68.96      60.24      57.77      54.69      55.69     
 9 42.11      35.79      33.37      30.27      31.67      34         38.32     
10 44.6       44.1       47         41.97      31.67      34         55        
11 42.11      37.03      35.25      31.82      31.67      34         36.99     
# … with 17 more variables: valeur...8 <chr>, valeur...9 <chr>,
#   valeur...10 <chr>, valeur...11 <chr>, valeur...12 <chr>, valeur...13 <chr>,
#   valeur...14 <chr>, valeur...15 <chr>, valeur...16 <chr>, valeur...17 <chr>,
#   valeur...18 <chr>, valeur...19 <chr>, valeur...20 <chr>, valeur...21 <chr>,
#   valeur...22 <chr>, valeur...23 <chr>, valeur...24 <chr>

Note that the third Element has ND as Value. Thats why its characters instead of double columns.

I do leave the tidying to you.

CodePudding user response：

I could find a solution reading the files with read_xml and using unnest

Names<-c("BE", "CH", "DE", "DL", "AT", "ES", "FR", "GB", "IT", "NL", "PT")

list.filenames<-list.files(pattern = "\\.xml")

France2022<-lapply(list.filenames, function(file) #Reading files
  read_xml(file)
)

France_data<-map(France2022, ~as_list(.)%>%
                   tibble::as.tibble()%>%
                   unnest_longer("liste")%>%
                   unnest_wider("liste")%>%
                   unnest(cols = names(.))%>%
                   unnest(cols = names(.))%>%
                   select(-c(1))%>%
                   drop_na()%>%
  mutate_all(as.numeric)%>%
  mutate(Area=rep_len(Names, length.out=n()))
)%>%
  enframe%>% # convert list to tibble
  unnest(value)

I am not sure why the error message appeared, apparently the files do not have errors and are defined as XML objects.