Long time user, first time poster. Something strange happened while I was looking at data. I made a data.frame, added columns, manipulated them, and renamed them. When I then looked at the colnames, the renaming appears to work. But when I looked at the df, only one of the column was actually renamed.
Here is a reproducible example.
var_tab <- data.frame(coef=c(1:4), p=rep(0.1, 4))
var_tab <- cbind (var_tab, c("one", "two", "three", "four"))
var_tab[4] <- "three"
> var_tab
coef p c("one", "two", "three", "four") V4
1 1 0.1 one three
2 2 0.1 two three
3 3 0.1 three three
4 4 0.1 four three
> colnames(var_tab)
[1] "coef"
[2] "p"
[3] "c(\"one\", \"two\", \"three\", \"four\")"
[4] "V4"
All as expected...until I rename the columns. The colnames aren't all showing up correctly! The colnames change, but they only show up for the third variable, not the fourth variable.
var_tab[4] <- ifelse(var_tab[4] == var_tab[3], 1, 0)
colnames(var_tab)[3:4] <- c("model", "base")
> var_tab
coef p model V4
1 1 0.1 one 0
2 2 0.1 two 0
3 3 0.1 three 1
4 4 0.1 four 0
> colnames(var_tab)
[1] "coef" "p" "model" "base"
The problem can be solved by renaming column 4 before recalculating it, so the problem is easily avoidable.
colnames(var_tab)[3:4] <- c("model", "base")
var_tab[4] <- ifelse(var_tab[4] == var_tab[3], 1, 0)
> colnames(var_tab)
[1] "coef" "p" "model" "base"
> var_tab
coef p model base
1 1 0.1 one 0
2 2 0.1 two 0
3 3 0.1 three 1
4 4 0.1 four 0
Though I can avoid the problem, I still cannot understand what solved the problem. And I cannot find any other reference to this issue. It vaguely reminds me of the R floating number problem. Does anyone here know what caused my colnames to not be applied to the dataframe?
Thanks in advance for your help!
CodePudding user response:
After your setup
var_tab <- data.frame(coef=c(1:4), p=rep(0.1, 4))
var_tab <- cbind (var_tab, c("one", "two", "three", "four"))
var_tab[4] <- "three"
note the difference between
str(var_tab[4] == var_tab[3])
str(var_tab[[4]] == var_tab[[3]])
The first one returns a data.frame. When you assign a data.frame with a single column to a column of a data frame, then things get weird. The outer data frame has a name for the column that contains a data frame. That inner data frame has its own name for its column.
If you assign a clean vector to a column of a data.frame then you don't have this problem.
So you should use
var_tab[4] <- ifelse(var_tab[[4]] == var_tab[[3]], 1, 0)
colnames(var_tab)[3:4] <- c("model", "base")