I have the following data frame:
Row Repro Number2
1 1 EWC
2 NA LWY
3 7 EWS
4 NA LWC
5 NA EWC
6 NA LWC
7 3 EWY
8 NA LW2Y
9 NA Unknown
10 NA LWC
11 1 EWC
12 NA LWY
13 NA EWY
14 NA LWY
15 NA Unknown
16 NA LWC
On this data frame, I am using the following loop:
for (i in 1:nrow(df3)) {
if(df3$Number2[i 1]=="Unknown" & is.na(df3$Repro[i])) {
df3$Number2[i]="Unknown"
} else{
df3$Number2[i]==df3$Number2[i]
}
}
While the loop does run, I get an error code at the end and the data frame ends up not looking like the result I want.
My issue is that while the code is carrying out its intended purpose (replacing values in the number2 column with "Unknown" if the value after it is also "Unknown" and the associated Repro value is NA), it is only doing it with "Unknown" values that are initially in the datafreame. I want it to also take into account the new "Unknowns" added and carry out the loop conditions with those too.
Here is the error code:
Error in if (df3$Number2[i 1] == "Unknown" & is.na(df3$Repro[i])) { :
missing value where TRUE/FALSE needed
And here is the data frame after running the loop. I have added another column called "Number2.Correct" showing what I want the Number2 column to actually look like. The issue is with the rows 12 and 13 - These should be "Unknowns" and not "LWY" and "EWY", respectively.
Repro Number2 Number2.Correct
1 1 EWC EWC
2 NA LWY LWY
3 7 EWS EWS
4 NA LWC LWC
5 NA EWC EWC
6 NA LWC LWC
7 3 EWY EWY
8 NA Unknown Unknown
9 NA Unknown Unknown
10 NA LWC LWC
11 1 EWC EWC
12 NA LWY Unknown
13 NA EWY Unknown
14 NA Unknown Unknown
15 NA Unknown Unknown
16 NA LWC LEW
In the end, I have two questions:
- How do I change my code to give me the result I want?
- Why is the error code appearing and is it partly responsible for the issue?
CodePudding user response:
The reason that the code fails is because nrow(df3) 1
is out of range. so the for loop needs to be 1:(nrow(df3)-1)
To iteratively update Number2, one easy way (although not elegant) is to use while loop. The stopping condition is when new and old Number2
is the same.
while(T){
df3$Number2_new <- df3$Number2
for (i in 1:(nrow(df3)-1)) {
if(df3$Number2_new[i 1]=="Unknown" & is.na(df3$Repro[i])) {
df3$Number2_new[i]="Unknown"
} else{
df3$Number2_new[i]==df3$Number2_new[i]
}
}
if(all(df3$Number2==df3$Number2_new)){
df3 <- df3%>%
mutate(Number2=Number2_new)%>%
select(-Number2_new)
break
}else{
df3 <- df3%>%
mutate(Number2=Number2_new)%>%
select(-Number2_new)
}
}
df3
Row Repro Number2
1 1 1 EWC
2 2 NA LWY
3 3 7 EWS
4 4 NA LWC
5 5 NA EWC
6 6 NA LWC
7 7 3 EWY
8 8 NA Unknown
9 9 NA Unknown
10 10 NA LWC
11 11 1 EWC
12 12 NA Unknown
13 13 NA Unknown
14 14 NA Unknown
15 15 NA Unknown
16 16 NA LWC
CodePudding user response:
for (i in rev(1:nrow(df3))) {
if (df3$Number2[i 1] == "Unknown" & is.na(df3$Repro[i]) & i 1 < nrow(df3)) {
df3$Number2[i] <- "Unknown"
} else {
df3$Number2[i] == df3$Number2[i]
}
}
df3
#> Row Repro Number2
#> 1 1 1 EWC
#> 2 2 NA LWY
#> 3 3 7 EWS
#> 4 4 NA LWC
#> 5 5 NA EWC
#> 6 6 NA LWC
#> 7 7 3 EWY
#> 8 8 NA Unknown
#> 9 9 NA Unknown
#> 10 10 NA LWC
#> 11 11 1 EWC
#> 12 12 NA Unknown
#> 13 13 NA Unknown
#> 14 14 NA Unknown
#> 15 15 NA Unknown
#> 16 16 NA LWC
Created on 2023-01-09 with reprex v2.0.2 You had two issues:
i 1
is out of range for the final row in your data; I added another condition (i 1 < nrow(df3)
)- The desired output you posted suggests that you want to look for
Unknown
from bottom to top, not top to bottom. You can reverse the order withrev()
CodePudding user response:
The i 1
goes out of range after the nrow
of the data. We may use a group by approach with tidyverse
library(dplyr)
library(tidyr)
library(data.table)
df3 %>%
mutate(grp = replace(replace(Number2, Number2 != "Unknown", NA),
Number2 == "Unknown", seq_len(sum(Number2 == "Unknown")))) %>%
fill(grp, .direction = "updown") %>%
group_by(grp, grp2 = rleid(is.na(Repro))) %>%
mutate(Number2 = case_when(is.na(Repro) &
row_number() < match("Unknown", Number2) ~ "Unknown",
TRUE ~ Number2)) %>%
ungroup %>%
select(-grp, -grp2)
-output
# A tibble: 16 × 3
Row Repro Number2
<int> <int> <chr>
1 1 1 EWC
2 2 NA LWY
3 3 7 EWS
4 4 NA LWC
5 5 NA EWC
6 6 NA LWC
7 7 3 EWY
8 8 NA Unknown
9 9 NA Unknown
10 10 NA LWC
11 11 1 EWC
12 12 NA Unknown
13 13 NA Unknown
14 14 NA Unknown
15 15 NA Unknown
16 16 NA LWC
data
df3 <- structure(list(Row = 1:16, Repro = c(1L, NA, 7L, NA, NA, NA,
3L, NA, NA, NA, 1L, NA, NA, NA, NA, NA), Number2 = c("EWC", "LWY",
"EWS", "LWC", "EWC", "LWC", "EWY", "LW2Y", "Unknown", "LWC",
"EWC", "LWY", "EWY", "LWY", "Unknown", "LWC")),
class = "data.frame", row.names = c(NA,
-16L))