After asking one question this morning, now I would like to ask another way to do the replacement, since I am waiting my teacher confirm about the species name.
I have a dataframe like this (The real df
resulted by removing duplicated rows)
df <- data.frame(name1 = c("a" , "b", "c", "a"),
name2 = c("x", NA, NA, NA),
name3 = c(NA, "b1", "c1", NA),
name4 = c("x", "b1", "c1", "a"))
name1 name2 name3 name4
1 a x <NA> x
2 b <NA> b1 b1
3 c <NA> c1 c1
4 a <NA> <NA> a
Can we replace a
by x
by calling if the value in name4
column match with name1
column?
I do not want to use and assign x
directly here since my data is supposed to have many cases like this. Any suggestions for me, please? (using base-R also fine for me since I would love to learn more)
Desired output
name1 name2 name3 name4
1 a x <NA> x
2 b <NA> b1 b1
3 c <NA> c1 c1
4 a <NA> <NA> x
My explanation for the table and my expectation:
I have 3 columns name1, name2, name3 (after removing duplicated rows). Name4 column is the final column that contains value that I want from 3 previous columns. The value in name2 column is the my first priority to use, then value in name3.
In my fourth row, since NA value appears in name2 column, then I took an "a" from name1 column. I am thinking that whether can I replace a by x without assigning x i.e. if value (i.e. a) in name4 == value (i.e. a) in name1, then the a in name4 replaced by x in name2 or 4.
CodePudding user response:
Your criteria to define name4
as I understand it is:
- Use
name2
from the same row if available - Use
name3
from the same row if available - Leave it missing (for now)
- Fill missing
name4
values withname4
values from previous rows that share the samename1
value.
If you want a tidyverse-based solution:
library(dplyr)
library(tidyr)
df <- data.frame(name1 = c("a" , "b", "c", "a"),
name2 = c("x", NA, NA, NA),
name3 = c(NA, "b1", "c1", NA))
result <- df %>%
mutate(name4 = case_when(
!is.na(name2) ~ name2, # when name2 is not missing, use it
!is.na(name3) ~ name3, # when name3 is not missing, use it
TRUE ~ NA_character_ # leave a NA for now otherwise
)) %>%
group_by(name1) %>%
fill(name4, .direction = c("down")) %>% # Fill each group looking at the previous non-missing row.
ungroup()
Returns:
# A tibble: 4 × 4
name1 name2 name3 name4
<chr> <chr> <chr> <chr>
1 a x NA x
2 b NA b1 b1
3 c NA c1 c1
4 a NA NA x
Note that fill
can fill in several directions, you could use "downup" if you want to first fill from top to bottom and then bottom to top.
CodePudding user response:
You can do it like this:
df[which(df$name1==df$name4), "name4"] <- "x"
Basically this means subsetting your dataframe selecting rows, in which name1 == name4
, and name4
column, then changing these values to "x"
CodePudding user response:
You can group by name1
and if name1
and name4
are equal replace the name4
value with 1st non-NA value available.
library(dplyr)
df %>%
group_by(name1) %>%
mutate(name4 = ifelse(name1 == name4, na.omit(unlist(cur_data()))[1], name4)) %>%
ungroup
# name1 name2 name3 name4
# <chr> <chr> <chr> <chr>
#1 a x NA x
#2 b NA b1 b1
#3 c NA c1 c1
#4 a NA NA x
CodePudding user response:
Base R ifelse
solution:
df$name4 <- ifelse(df$name1 == df$name4, "x", df$name4)
Based on your update, using dplyr
's first
:
library(dplyr)
df$name4 <- ifelse(df$name1 == df$name4, first(df$name4), df$name4)
This does the following:
- Checks to see if name1 is equal to name 4
- If name1 is equal to name4, it replaces the value of name4 with the first value occurring for name4.
Result:
name1 name2 name3 name4
1 a x <NA> x
2 b <NA> b1 b1
3 c <NA> c1 c1
4 a <NA> <NA> x