Replace values in one column by taking values from another column-CodePudding

After asking one question this morning, now I would like to ask another way to do the replacement, since I am waiting my teacher confirm about the species name.

I have a dataframe like this (The real df resulted by removing duplicated rows)

df <- data.frame(name1 = c("a" , "b", "c", "a"),
                 name2 = c("x", NA, NA, NA),
                 name3 = c(NA, "b1", "c1", NA),
                 name4 = c("x", "b1", "c1", "a"))

  name1 name2 name3 name4
1     a     x  <NA>     x
2     b  <NA>    b1    b1
3     c  <NA>    c1    c1
4     a  <NA>  <NA>     a

Can we replace a by x by calling if the value in name4 column match with name1 column?

I do not want to use and assign x directly here since my data is supposed to have many cases like this. Any suggestions for me, please? (using base-R also fine for me since I would love to learn more)

Desired output

  name1 name2 name3 name4
1     a     x  <NA>     x
2     b  <NA>    b1    b1
3     c  <NA>    c1    c1
4     a  <NA>  <NA>     x

My explanation for the table and my expectation:

I have 3 columns name1, name2, name3 (after removing duplicated rows). Name4 column is the final column that contains value that I want from 3 previous columns. The value in name2 column is the my first priority to use, then value in name3.

In my fourth row, since NA value appears in name2 column, then I took an "a" from name1 column. I am thinking that whether can I replace a by x without assigning x i.e. if value (i.e. a) in name4 == value (i.e. a) in name1, then the a in name4 replaced by x in name2 or 4.

CodePudding user response：

Your criteria to define name4 as I understand it is:

Use name2 from the same row if available
Use name3 from the same row if available
Leave it missing (for now)
Fill missing name4 values with name4 values from previous rows that share the same name1 value.

If you want a tidyverse-based solution:

library(dplyr)
library(tidyr)

df <- data.frame(name1 = c("a" , "b", "c", "a"),
                 name2 = c("x", NA, NA, NA),
                 name3 = c(NA, "b1", "c1", NA))

result <- df %>% 
  mutate(name4 = case_when(
    !is.na(name2) ~ name2, # when name2 is not missing, use it
    !is.na(name3) ~ name3, # when name3 is not missing, use it
    TRUE ~ NA_character_   # leave a NA for now otherwise
  )) %>%
  group_by(name1) %>%
  fill(name4, .direction = c("down")) %>% # Fill each group looking at the previous non-missing row.
  ungroup()

Returns:

# A tibble: 4 × 4
  name1 name2 name3 name4
  <chr> <chr> <chr> <chr>
1 a     x     NA    x    
2 b     NA    b1    b1   
3 c     NA    c1    c1   
4 a     NA    NA    x

Note that fill can fill in several directions, you could use "downup" if you want to first fill from top to bottom and then bottom to top.

CodePudding user response：

You can do it like this:

df[which(df$name1==df$name4), "name4"] <- "x"

Basically this means subsetting your dataframe selecting rows, in which name1 == name4, and name4 column, then changing these values to "x"

CodePudding user response：

You can group by name1 and if name1 and name4 are equal replace the name4 value with 1st non-NA value available.

library(dplyr)

df %>%
  group_by(name1) %>%
  mutate(name4 = ifelse(name1 == name4, na.omit(unlist(cur_data()))[1], name4)) %>%
  ungroup

# name1 name2 name3 name4
#  <chr> <chr> <chr> <chr>
#1 a     x     NA    x    
#2 b     NA    b1    b1   
#3 c     NA    c1    c1   
#4 a     NA    NA    x

CodePudding user response：

Base R ifelse solution:

df$name4 <- ifelse(df$name1 == df$name4, "x", df$name4)

Based on your update, using dplyr's first:

library(dplyr)

df$name4 <- ifelse(df$name1 == df$name4, first(df$name4), df$name4)

This does the following:

Checks to see if name1 is equal to name 4
If name1 is equal to name4, it replaces the value of name4 with the first value occurring for name4.

Result:

  name1 name2 name3 name4
1     a     x  <NA>     x
2     b  <NA>    b1    b1
3     c  <NA>    c1    c1
4     a  <NA>  <NA>     x