Home > Enterprise >  Changing number of observation in a dataset by IDs according to a given value
Changing number of observation in a dataset by IDs according to a given value

Time:09-23

I have this dataset in R:

enter image description here

and I want to change the data according to nb variable, it means ID = 1 will have 5 rows and ID=2 will have 12 rows as shown below:

enter image description here

is there any R function that I could use it to transform my data :) ?

Thanks in advance

CodePudding user response:

We need uncount from tidyr to expand based on the 'nb' column, by default, it removes the column as .remove = TRUE, change it to FALSE and then create the nb_long by doing a group by row_number()

library(dplyr)
library(tidyr)
df1 %>%
   uncount(nb, .remove = FALSE) %>%
   group_by(ID) %>%
   mutate(nb_long = row_number()) %>%
   ungroup

-output

# A tibble: 17 x 3
      ID    nb nb_long
   <int> <dbl>   <int>
 1     1     5       1
 2     1     5       2
 3     1     5       3
 4     1     5       4
 5     1     5       5
 6     2    12       1
 7     2    12       2
 8     2    12       3
 9     2    12       4
10     2    12       5
11     2    12       6
12     2    12       7
13     2    12       8
14     2    12       9
15     2    12      10
16     2    12      11
17     2    12      12

data

df1 <- structure(list(ID = 1:2, nb = c(5, 12)), 
class = "data.frame", row.names = c(NA, 
-2L))

CodePudding user response:

Here is another option. we just map out the values from 1 to nb and then we unnest the vector longer.

#packages
library(tidyverse)

#data
df1 <- structure(list(ID = 1:2, nb = c(5, 12)), 
class = "data.frame", row.names = c(NA, 
-2L))

#solution
df1 %>% 
  mutate(nums = map(nb, ~seq(1, .x, by = 1))) %>%
  unnest_longer(nums)
#> # A tibble: 17 x 3
#>       ID    nb  nums
#>    <int> <dbl> <dbl>
#>  1     1     5     1
#>  2     1     5     2
#>  3     1     5     3
#>  4     1     5     4
#>  5     1     5     5
#>  6     2    12     1
#>  7     2    12     2
#>  8     2    12     3
#>  9     2    12     4
#> 10     2    12     5
#> 11     2    12     6
#> 12     2    12     7
#> 13     2    12     8
#> 14     2    12     9
#> 15     2    12    10
#> 16     2    12    11
#> 17     2    12    12

CodePudding user response:

We can try the following data.table option

> setDT(df)[,.(nb_long = 1:nb),.(ID,nb)]
    ID nb nb_long
 1:  1  5       1
 2:  1  5       2
 3:  1  5       3
 4:  1  5       4
 5:  1  5       5
 6:  2 12       1
 7:  2 12       2
 8:  2 12       3
 9:  2 12       4
10:  2 12       5
11:  2 12       6
12:  2 12       7
13:  2 12       8
14:  2 12       9
15:  2 12      10
16:  2 12      11
17:  2 12      12
  • Related