Home > Software engineering >  making four new columns based on 8 existing columns
making four new columns based on 8 existing columns

Time:12-17

Below you can see the reproduced sample of my data.

DATA <- structure(list(ID = c("101", "101", "101", "101", "101", "101","101", "101", "101", "101"), IDA = c("1", "1", "2", "3", "4","5", "5", "1859", "1860", "1861"), DATE = structure(c(1300928400,1277946000, 1277946000, 1278550800, 1278550800, 1453770000, 1329958800,1506474000, 1485133200, 1485133200), tzone = "UTC", class = c("POSIXct","POSIXt")), NR = c("CH-0001", "CH-0001","CH-0002", "CH-0003", "CH-0004", "CH-0005","CH-0005", "CH-1859", "CH-1860", "CH-1861"), PAT = c("101-1", "101-1", "101-2", "101-3", "101-4", "101-5","101-5", "101-1859", "101-1860", "101-1861"), INT1 = c(245005,280040, 280040, 280040, 280040, 240040, 240040, NA, NA, NA),INT2 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), INT3 = c(NA_real_,NA_real_, 280010, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, 245035, NA_real_), INT4 = c(NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_), INTX1 = c(NA_real_, 275040, NA_real_,NA_real_, NA_real_, NA_real_, 240080, NA_real_, NA_real_,NA_real_), INTX2 = c(276790, NA_real_, 7612645, NA_real_,NA_real_, NA_real_, 5078219, NA_real_, NA_real_, NA_real_), INTX173 = c(NA_real_, NA_real_, NA_real_, 3456878,NA_real_, NA_real_, 3289778, NA_real_, NA_real_, NA_real_), INTX4 = c(NA_real_, NA_real_, 11198767, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 7025676), KAT = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))

As you see, I have eight columns called: INT1:INT4 and INTX1:INTX4. For each row there are only a maximum of four values for these variables and the rest are NAs. I need to create four new variables called ING1:ING4 and tell R to check the 8 columns one by one per row and assign the first value it finds in that row to ING1, the second value to ING2, the third value to ING3, and the fourth value to ING4.At the end, it is possible that, for a row, all or some of the ING1:ING4 columns are filled with values. I would expect for row 1 I get the following ING columns:

ING1 == 245005, ING2 == 276790, ING3 == NA, ING4 ==NA

I think I need to write a loop for that but as I am a beginner I am lost how to do it. Could you kindly help me with it?

CodePudding user response:

Try this:

fun <- function(select, prefix = "ING", ncol = -1, data = cur_data()) {
  select <- substitute(select)
  out <- asplit(t(
    apply(subset(data, select = eval(select)), 1, sort, na.last = TRUE)
    ), 2)
  names(out) <- paste0(prefix, seq_along(out))
  if (ncol > 0) out <- out[seq_len(ncol)]
  do.call(data.frame, out)
}

And its use:

dplyr

library(dplyr)
DATA %>%
  mutate(fun(INT1:INTX4, ncol=4))
# # A tibble: 10 × 18
#    ID    IDA   DATE                NR      PAT        INT1  INT2   INT3  INT4  INTX1   INTX2 INTX173    INTX4   KAT    ING1    ING2    ING3     ING4
#    <chr> <chr> <dttm>              <chr>   <chr>     <dbl> <dbl>  <dbl> <dbl>  <dbl>   <dbl>   <dbl>    <dbl> <dbl>   <dbl>   <dbl>   <dbl>    <dbl>
#  1 101   1     2011-03-24 01:00:00 CH-0001 101-1    245005    NA     NA    NA     NA  276790      NA       NA     0  245005  276790      NA       NA
#  2 101   1     2010-07-01 01:00:00 CH-0001 101-1    280040    NA     NA    NA 275040      NA      NA       NA     0  275040  280040      NA       NA
#  3 101   2     2010-07-01 01:00:00 CH-0002 101-2    280040    NA 280010    NA     NA 7612645      NA 11198767     0  280010  280040 7612645 11198767
#  4 101   3     2010-07-08 01:00:00 CH-0003 101-3    280040    NA     NA    NA     NA      NA 3456878       NA     0  280040 3456878      NA       NA
#  5 101   4     2010-07-08 01:00:00 CH-0004 101-4    280040    NA     NA    NA     NA      NA      NA       NA     0  280040      NA      NA       NA
#  6 101   5     2016-01-26 01:00:00 CH-0005 101-5    240040    NA     NA    NA     NA      NA      NA       NA     0  240040      NA      NA       NA
#  7 101   5     2012-02-23 01:00:00 CH-0005 101-5    240040    NA     NA    NA 240080 5078219 3289778       NA     0  240040  240080 3289778  5078219
#  8 101   1859  2017-09-27 01:00:00 CH-1859 101-1859     NA    NA     NA    NA     NA      NA      NA       NA     1      NA      NA      NA       NA
#  9 101   1860  2017-01-23 01:00:00 CH-1860 101-1860     NA    NA 245035    NA     NA      NA      NA       NA     1  245035      NA      NA       NA
# 10 101   1861  2017-01-23 01:00:00 CH-1861 101-1861     NA    NA     NA    NA     NA      NA      NA  7025676     1 7025676      NA      NA       NA

base R

cbind(DATA, fun(data = DATA, INT1:INTX4, ncol=4))
#     ID  IDA                DATE      NR      PAT   INT1 INT2   INT3 INT4  INTX1   INTX2 INTX173    INTX4 KAT    ING1    ING2    ING3     ING4
# 1  101    1 2011-03-24 01:00:00 CH-0001    101-1 245005   NA     NA   NA     NA  276790      NA       NA   0  245005  276790      NA       NA
# 2  101    1 2010-07-01 01:00:00 CH-0001    101-1 280040   NA     NA   NA 275040      NA      NA       NA   0  275040  280040      NA       NA
# 3  101    2 2010-07-01 01:00:00 CH-0002    101-2 280040   NA 280010   NA     NA 7612645      NA 11198767   0  280010  280040 7612645 11198767
# 4  101    3 2010-07-08 01:00:00 CH-0003    101-3 280040   NA     NA   NA     NA      NA 3456878       NA   0  280040 3456878      NA       NA
# 5  101    4 2010-07-08 01:00:00 CH-0004    101-4 280040   NA     NA   NA     NA      NA      NA       NA   0  280040      NA      NA       NA
# 6  101    5 2016-01-26 01:00:00 CH-0005    101-5 240040   NA     NA   NA     NA      NA      NA       NA   0  240040      NA      NA       NA
# 7  101    5 2012-02-23 01:00:00 CH-0005    101-5 240040   NA     NA   NA 240080 5078219 3289778       NA   0  240040  240080 3289778  5078219
# 8  101 1859 2017-09-27 01:00:00 CH-1859 101-1859     NA   NA     NA   NA     NA      NA      NA       NA   1      NA      NA      NA       NA
# 9  101 1860 2017-01-23 01:00:00 CH-1860 101-1860     NA   NA 245035   NA     NA      NA      NA       NA   1  245035      NA      NA       NA
# 10 101 1861 2017-01-23 01:00:00 CH-1861 101-1861     NA   NA     NA   NA     NA      NA      NA  7025676   1 7025676      NA      NA       NA
  • Related