Home > Mobile >  Replacing values in a data.frame that have lost their order
Replacing values in a data.frame that have lost their order

Time:10-27

In my toy data, for each unique study, the numeric variables (sample and group) must have an order starting from 1. But:

For example, in study 1, we see that there are two unique sample values (1 & 3), so 3 must be replaced with 2.

For example, in study 2, we see that there is one unique group value (2), so it must be replaced with 1.

In study 3, both sample and group seem ok meaning their unique values are 1 and 2 (no replacing needed).

For this toy data, my desired output is shown below. But I appreciate a functional solution that can automatically replace any number of numeric variables in a data.frame that have lost their order just like I showed in my toy data.

m="
study sample group outcome
1      1     1       A
1      1     1       B
1      1     2       A
1      1     2       B 
1      3     1       A
1      3     1       B
1      3     2       A
1      3     2       B

2      1     2       A
2      1     2       B
2      2     2       A
2      2     2       B
2      3     2       A
2      3     2       B

3      1     1       A
3      1     1       B
3      1     2       A
3      1     2       B
3      2     1       A
3      2     1       B
3      2     2       A
3      2     2       B"

data <- read.table(text=m, h=T)

Desired_output="
study sample group outcome
1      1     1       A
1      1     1       B
1      1     2       A
1      1     2       B 
1      2     1       A
1      2     1       B
1      2     2       A
1      2     2       B

2      1     1       A
2      1     1       B
2      2     1       A
2      2     1       B
2      3     1       A
2      3     1       B

3      1     1       A
3      1     1       B
3      1     2       A
3      1     2       B
3      2     1       A
3      2     1       B
3      2     2       A
3      2     2       B"

CodePudding user response:

You can do:

library(dplyr)

data %>% 
  group_by(study) %>% 
  mutate(across(tidyselect::vars_select_helpers$where(is.numeric),
                function(x) as.numeric(as.factor(x)))) %>%
  as.data.frame()

The resultant data frame looks like this:

   study sample group outcome
1      1      1     1       A
2      1      1     1       B
3      1      1     2       A
4      1      1     2       B
5      1      2     1       A
6      1      2     1       B
7      1      2     2       A
8      1      2     2       B
9      2      1     1       A
10     2      1     1       B
11     2      2     1       A
12     2      2     1       B
13     2      3     1       A
14     2      3     1       B
15     3      1     1       A
16     3      1     1       B
17     3      1     2       A
18     3      1     2       B
19     3      2     1       A
20     3      2     1       B
21     3      2     2       A
22     3      2     2       B

CodePudding user response:

Here is an alternative (not as elegant as @Allan Cameron 1 ) dplyr solution:

library(dplyr)
df %>% 
  group_by(study) %>% 
  mutate(x = n()/length(unique(sample)),
         sample =  rep(row_number(), each=x, length.out = n()),
         y = length(unique(group)),
         group = ifelse(y==1, 1, group)) %>% 
  select(-x, -y) 
   study sample group outcome
   <int>  <int> <dbl> <chr>  
 1     1      1     1 A      
 2     1      1     1 B      
 3     1      1     2 A      
 4     1      1     2 B      
 5     1      2     1 A      
 6     1      2     1 B      
 7     1      2     2 A      
 8     1      2     2 B      
 9     2      1     1 A      
10     2      1     1 B      
11     2      2     1 A      
12     2      2     1 B      
13     2      3     1 A      
14     2      3     1 B      
15     3      1     1 A      
16     3      1     1 B      
17     3      1     2 A      
18     3      1     2 B      
19     3      2     1 A      
20     3      2     1 B      
21     3      2     2 A      
22     3      2     2 B 
  • Related