Home > Software engineering >  Compiling API outputs in XML format in R
Compiling API outputs in XML format in R

Time:03-25

I have searched everywhere trying to find an answer to this question and I haven't quite found what I'm looking for yet so I'm hoping asking directly will help.

I am working with the USPS Tracking API, which provides an output an XML format. The API is limited to 35 results per call (i.e. you can only provide 35 tracking numbers to get info on each time you call the API) and I need information on ~90,000 tracking numbers, so I am running my calls in a for loop. I was able to store the results of the call in a list, but then I had trouble exporting the list as-is into anything usable. However, when I tried to convert the results from the list into JSON, it dropped the attribute tag, which contained the tracking number I had used to generate the results.

Here is what a sample result looks like:

<TrackResponse>

<TrackInfo ID="XXXXXXXXXXX1">

<TrackSummary> Your item was delivered at 6:50 am on February 6 in BARTOW FL 33830.</TrackSummary>

<TrackDetail>February 6 6:49 am NOTICE LEFT BARTOW FL 33830</TrackDetail>

<TrackDetail>February 6 6:48 am ARRIVAL AT UNIT BARTOW FL 33830</TrackDetail>

<TrackDetail>February 6 3:49 am ARRIVAL AT UNIT LAKELAND FL 33805</TrackDetail>

<TrackDetail>February 5 7:28 pm ENROUTE 33699</TrackDetail>

<TrackDetail>February 5 7:18 pm ACCEPT OR PICKUP 33699</TrackDetail>

Here is the script I ran to get the output I'm currently working with:

final_tracking_info <- list()

for (i in 1:x) { # where x = the number of calls to the API the loop will need to make
  
  usps = input_tracking_info[i] # input_tracking_info = GET commands
  
  usps = read_xml(usps)
  
  final_tracking_info1[[i 1]]<-usps$TrackResponse
  
  gc()
}

final_output <- toJSON(final_tracking_info)
write(final_output,"final_tracking_info.json") # tried converting to JSON, lost the ID attribute

cat(capture.output(print(working_list),file = "Final_Tracking_Info.txt")) # exported the list to a textfile, was not an ideal format to work with

What I ultimately want tog et from this data is a table containing the tracking number, the first track detail, and the last track detail. What I'm wondering is, is there a better way to compile this in XML/JSON that will make it easier to convert to a tibble/df down the line? Is there any easy way/preferred format to select based on the fact that I know most of the columns will have the same name ("Track Detail") and the DFs will have to be different lengths (since each package will have a different number of track details) when I'm trying to compile 1,000 of results into one final output?

CodePudding user response:

Using XML::xmlToList() will store the ID attribute in .attrs:

$TrackSummary
[1] " Your item was delivered at 6:50 am on February 6 in BARTOW FL 33830."

$TrackDetail
[1] "February 6 6:49 am NOTICE LEFT BARTOW FL 33830"

$TrackDetail
[1] "February 6 6:48 am ARRIVAL AT UNIT BARTOW FL 33830"

$TrackDetail
[1] "February 6 3:49 am ARRIVAL AT UNIT LAKELAND FL 33805"

$TrackDetail
[1] "February 5 7:28 pm ENROUTE 33699"

$TrackDetail
[1] "February 5 7:18 pm ACCEPT OR PICKUP 33699"

$.attrs
            ID 
"XXXXXXXXXXX1" 

A way of using that output which assumes that the Summary and ID are always present as first and last elements, respectively, is:

xml_data <- XML::xmlToList("71563898.xml") %>%
  unlist() # flattening
  unname() # removing names

data.frame (
  ID = tail(xml_data, 1), # getting last element
  Summary = head(xml_data, 1), # getting first element
  Info = xml_data %>% head(-1) %>% tail(-1) # remove first and last elements
)
  • Related