I have data that looks like this :
X snp_id is_severe encoding_1 encoding_2 encoding_0
1 0 GL000191.1-37698 0 0 1 7
3 2 GL000191.1-37922 1 1 0 12
what I wish to do is to add a new row after every row that will contain the previous snp_id value and the is_sever value will be 1 if the previous is 0 and 0 if the previous is 1 (the goal is that every value of snp_id will have zero and one in is_severe column and not only zero or one ( and every snp_id will appear twice once with is_sever =zero and once with is_sever=1 all values of snp_id in the data are unique ) . Also, the encoding_1 & ancoding_2 will have the value 0 and the encoding_0 column will follow the equation: if in the new row the is_severe value is 0 the encoding_0 will be =8 and if in the new row the is_severe value is 1 the encoding_0 will be =13
Examples of desired output:
X snp_id is_severe encoding_1 encoding_2 encoding_0
1 0 GL000191.1-37698 0 0 1 7
2 1 GL000191.1-37698 1 0 0 13 <- new row
3 2 GL000191.1-37922 1 1 0 12
4 3 GL000191.1-37922 0 0 0 8 <- new row
i saw a similar QA here:How can I add rows to an R data frame every other row? but i need to do more data manipulation and unfortunately this solution doesn't solve my problem . thank you:)
CodePudding user response:
here are two options. 1) split and map, 2) copy and bind
library(tidyverse)
dat <- read_table("snp_id is_severe encoding_1 encoding_2 encoding_0
GL000191.1-37698 0 0 1 7
GL000191.1-37922 1 1 0 12")
dat |>
group_split(snp_id) |>
map_dfr(~add_row(.x,
snp_id = .x$snp_id,
is_severe = 1 - (.x$is_severe == 1),
encoding_1 = 0,
encoding_2 = 0,
encoding_0 = ifelse(.x$is_severe == 1, 8, 13)))
#> # A tibble: 4 x 5
#> snp_id is_severe encoding_1 encoding_2 encoding_0
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 GL000191.1-37698 0 0 1 7
#> 2 GL000191.1-37698 1 0 0 13
#> 3 GL000191.1-37922 1 1 0 12
#> 4 GL000191.1-37922 0 0 0 8
or
library(tidyverse)
bind_rows(dat,
dat |>
mutate(is_severe = 1 - (is_severe == 1),
across(c(encoding_1, encoding_2), ~.*0),
encoding_0 = ifelse(is_severe == 1, 13, 8))) |>
arrange(snp_id)
#> # A tibble: 4 x 5
#> snp_id is_severe encoding_1 encoding_2 encoding_0
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 GL000191.1-37698 0 0 1 7
#> 2 GL000191.1-37698 1 0 0 13
#> 3 GL000191.1-37922 1 1 0 12
#> 4 GL000191.1-37922 0 0 0 8
CodePudding user response:
dummy data:
df <- data.frame(
a = letters[1:4],
is_severe = sample(c(0,1), 4, TRUE),
encoding1 = sample(c(0,1), 4, TRUE),
encoding2 = sample(c(0,1), 4, TRUE),
encoding0 = 1:4
)
You can copy data do your calculations and bind with original data (afterwards you make required permutation of rows):
df_copy <- df
df_copy$is_severe <- 1 - df_copy$is_severe
df_copy[, c("encoding1", "encoding2")] <- 0
df_copy$encoding0 <- ifelse(df_copy$is_severe == 0, 8 , 13)
rbind(df, df_copy)[rep(seq_len(nrow(df)), each = 2) rep(c(0, nrow(df)), times = nrow(df)),]