Home > Enterprise >  How can I replace some columns with data from an excel file?
How can I replace some columns with data from an excel file?

Time:06-03

I have a question. Like the title said, now I need to replace some columns with new data from another single excel file. I need to replace them when certain requirements are fulfilled.

For example, I want to replace column R5 to APM (arbitrary column header names, just for example) with new data from another excel file and the following conditions should be met before replacing:

  1. replace with the same ID, which means I want each person's ID to be matched when replacing
  2. do not replace when a column in a row displays "N". For example, if any cell between R5 to APM in a row has an N, then do not replace it.

Basically, I want to make sure the data is exactly matched.

How can I complete this task in R? Or, how can I achieve it in MS Excel? Many thanks in advance for your help! I will be very appreciative of each answer.

Below is an example:

enter image description here

Below is the text version of the screenshot:

ID ID
11 2021/4/15 N 9/11/66 56.9 175 cm Standard Test 22 N/A
11 2021/7/29 Y 9/11/66 55.7 186 cm Standard Test 748 N/A 11 2021/7/29 Y TWCOVID 11 1966/11/9 F Caucasian 157 54 NO NO

In the screenshot below, I want to replace new data with ID 11 to the row that indicates Y and to replace the old data with new data beginning from R5 and ending with APM. I can say for sure that the length of each new data is exactly the same as the length of the old data.

CodePudding user response:

If I understand your question correctly, what you would like to do could be achieved with the function merge(). Do ?merge to obtain more information.

What is important is that the columns from your two dataframes are named the same way

Example below:

Here, you would read in both your dataset, and you would then make sure that the column names with the information that are matching are the same one dataset to the next (in your case, you want ID ("ID"), the date ("date"), and the decision on whether to merge or not ("test") to match.

library(readxl)
dt1 <- read_excel("path to dataset 1")
dt2 <- read_excel("path to dataset 2")
colnames(dt1)<- c("ID","date","test","DOB","weigth","height","other1","other2","other3")
colnames(dt2)<- c("ID","date","test","variable")

But for the sake of the example, I am recreating your two datasets:

dt1 <- data.frame(ID = c(11,11),
                  date = c("2021/4/15", "2021/7/29"),
                  test = c("N", "Y"),
                  DOB = c("9/11/66", "9/11/66"),
                  weight = c(56.9, 55.7),
                  heigth = c(175, 186),
                  other1 = "Standard Test",
                  other2 = c(22, 748),
                  other3 = NA)

dt2 <- data.frame(ID = 11,
                  date = "2021/7/29",
                  test = "Y",
                  variable = "TWCOVID")

Then, you can just merge dt1 and dt2, by specifying which columns to use (although the function would automatically recognize the columns names being the same), and specifying all.x = TRUE to say that all the rows from the dataframe x (the first one) must be preserved. Check the difference between all and all.x depending on what you want to achieve.

dt <- merge(x = dt1, y = dt2[dt2$test == "Y", ], by = c("ID", "date", "test"), all.x = TRUE)

Which gives you: (last column as the information you wanted.

output dataset

Then, export to excel:

library(writexl)
write_xlsx(dt)
  • Related