Home > Enterprise >  How to assign 1s and 0s to columns if variable in row matches or not match in R
How to assign 1s and 0s to columns if variable in row matches or not match in R

Time:10-13

I'm an absolute beginner in coding and R and this is my third week doing it for a project. (for biologists, I'm trying to find the sum of risk alleles for PRS) but I need help with this part

df
  x y z
1 t c a
2 a t a
3 g g t

so when code applied:

  x y z
1 t 0 0
2 a 0 1
3 g 1 0
```

I'm trying to make it that if the rows in y or z match x the value changes to 1 and if not, zero
I started with: 
```
for(i in 1:ncol(df)){
  df[, i]<-df[df$x == df[,i], df[ ,i]<- 1]
}
```
But got all NA values 
In reality, I have 100 columns I have to compare with x in the data frame. Any help is appreciated

CodePudding user response:

A tidyverse approach

library(dplyr)

df <-
  tibble(
    x = c("t","a","g"),
    y = c("c","t","g"),
    z = c("a","a","t")
  )

df %>% 
  mutate(
    across(
      .cols = c(y,z),
      .fns = ~if_else(. == x,1,0) 
    )
  )

# A tibble: 3 x 3
  x         y     z
  <chr> <dbl> <dbl>
1 t         0     0
2 a         0     1
3 g         1     0

CodePudding user response:

An alternative way to do this is by using ifelse() in base R.

df$y <- ifelse(df$y == df$x, 1, 0)
df$z <- ifelse(df$z == df$x, 1, 0)
df
#  x y z
#1 t 0 0
#2 a 0 1
#3 g 1 0

Edit to extend this step to all columns efficiently

For example:

df1
#  x y z w
#1 t c a t
#2 a t a a
#3 g g t m

To apply column editing efficiently, a better approach is to use a function applied to all targeted columns in the data frame. Here is a simple function to do the work:

edit_col <- function(any_col) any_col <- ifelse(any_col == df1$x, 1, 0)

This function takes a column, and then compare the elements in the column with the elements of df1$x, and then edit the column accordingly. This function takes a single column. To apply this to all targeted columns, you can use apply(). Because in your case x is not a targeted column, you need to exclude it by indexing [,-1] because it is the first column in df.

# Here number 2 indicates columns. Use number 1 for rows.

df1[, -1] <- apply(df1[,-1], 2, edit_col)
df1
#  x y z w
#1 t 0 0 1
#2 a 0 1 1
#3 g 1 0 0

Of course you can also define a function that edit the data frame so you don't need to do apply() manually.

Here is an example of such function

edit_df <- function(any_df){
    edit_col <- function(any_col) any_col <- ifelse(any_col == any_df$x, 1, 0)
    
    # Create a vector containing all names of the targeted columns.
    
    target_col_names <- setdiff(colnames(any_df), "x")
    
    any_df[,target_col_names] <-apply( any_df[,target_col_names], 2, edit_col)
    return(any_df)
}

Then use the function:

edit_df(df1)
#  x y z w
#1 t 0 0 1
#2 a 0 1 1
#3 g 1 0 0
  • Related