I have an existing dataframe (df1) that looks like this:
txnID date repID
1 1001 8/5/2020 203
2 1002 6/28/2020 887
3 1003 6/28/2020 887
4 1004 4/16/2020 887
5 1005 6/14/2020 887
6 1006 5/19/2020 332
7 1019 9/23/2020 100
8 1027 12/4/2020 221
I also have an XML that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<salesteam>
<rep rID="r100">
<firstName>Helmut</firstName>
<lastName>Schwab</lastName>
<territory>EMEA</territory>
</rep>
<rep rID="r887">
<firstName>Walison</firstName>
<lastName>da Silva</lastName>
<territory>South America</territory>
</rep>
<rep rID="r332">
<firstName>Lynette</firstName>
<lastName>McRowe</lastName>
<territory>East</territory>
</rep>
<rep rID="r203">
<firstName>Aneeta</firstName>
<lastName>Kappoorthy</lastName>
<territory>West</territory>
</rep>
<rep rID="r221">
<firstName>Veronika</firstName>
<lastName>Sixt</lastName>
<territory>EMEA</territory>
</rep>
</salesteam>
I want to create a new dataframe that assigns the repID to the appropriate lastName according to the xml and creates a new dataframe (df3) including the data from df1.
head(df3)
txnID date repID lastName
1 1001 8/5/2020 203 Kappoorthy
2 1002 6/28/2020 887 da Silva
3 1003 6/28/2020 887 da Silva
4 1004 4/16/2020 887 da Silva
5 1005 6/14/2020 887 da Silva
6 1006 5/19/2020 332 McRowe
7 1019 9/23/2020 100 Schwab
8 1027 12/4/2020 221 Sixt
CodePudding user response:
You’ll first have to wrangle your XML into a dataframe, which you can do with xml2. Then, you can just merge in the info from the XML dataframe into df1
using dplyr::left_join()
.
library(xml2)
library(tidyr)
library(dplyr)
xml_df <- tibble(
xml = as_list(read_xml(xml))[[1]],
repID = as.numeric(gsub(
"^r",
"",
sapply(xml, attr, "rID")
))
) %>%
unnest_wider(xml, transform = ~ .x[[1]])
df1 %>%
left_join(xml_df) %>%
select(!c(firstName, territory))
# A tibble: 8 × 4
txnID date repID lastName
<dbl> <chr> <dbl> <chr>
1 1001 8/5/2020 203 Kappoorthy
2 1002 6/28/2020 887 da Silva
3 1003 6/28/2020 887 da Silva
4 1004 4/16/2020 887 da Silva
5 1005 6/14/2020 887 da Silva
6 1006 5/19/2020 332 McRowe
7 1019 9/23/2020 100 Schwab
8 1027 12/4/2020 221 Sixt