I have a simple loop that iterate across a number of string values in a vector called measurements:
measurements <- c("A","B","C","D")
here a reproducible data frame:
value <- c(1,2,3,4)
measurement <- c("A","B","C","D")
questiondata <- data.frame(measurement, value)
questiondata <- as.tibble(questiondata)
At first, the loop filters rows based on the measurement column. If the variable assigned in the loop has the same name as the column name of my data frame the filter does not work, it prints the entire dataframe 4 times:
for (measurement in measurements){
print(measurement)
print(questiondata %>% dplyr::filter(measurement == measurement))
}
If, instead,I change the variable name - from "measurement" to "m" for instance- it works:
for (m in measurements){
print(m)
print(questiondata %>% dplyr::filter(measurement == m))
}
Does anyone know the reason of this behaviour?
CodePudding user response:
This issue results from the ambiguity between data-variables and env-variables for data-masked functions like filter()
.
In the following code, the both measurement
refer to the measurement
column from the questiondata
data, and hence there are no rows being filtered out.
questiondata %>% filter(measurement == measurement)
# # A tibble: 4 × 2
# measurement value
# <chr> <dbl>
# 1 A 1
# 2 B 2
# 3 C 3
# 4 D 4
You could use the .env
pronoun to make it explicit where to find objects.
questiondata %>% filter(measurement == .env$measurement)
# # A tibble: 1 × 2
# measurement value
# <chr> <dbl>
# 1 D 4
CodePudding user response:
I am not sure if this is what you look for. Here is an example.
A="ABC"
print(A)
for (A in c("1", "2")) print(A)
print(A)
Then we will get
[1] "ABC"
[1] "1"
[1] "2"
[1] "2"
The value of A is replaced by the do-loop.