Home > Mobile >  R: Dataframe colnames not being applied to dataframe
R: Dataframe colnames not being applied to dataframe

Time:08-29

Long time user, first time poster. Something strange happened while I was looking at data. I made a data.frame, added columns, manipulated them, and renamed them. When I then looked at the colnames, the renaming appears to work. But when I looked at the df, only one of the column was actually renamed.

Here is a reproducible example.

var_tab <- data.frame(coef=c(1:4), p=rep(0.1, 4))
var_tab <- cbind (var_tab, c("one", "two", "three", "four"))
var_tab[4] <- "three"

> var_tab
  coef   p c("one", "two", "three", "four")    V4
1    1 0.1                              one three
2    2 0.1                              two three
3    3 0.1                            three three
4    4 0.1                             four three

> colnames(var_tab)
[1] "coef"                                    
[2] "p"                                       
[3] "c(\"one\", \"two\", \"three\", \"four\")"
[4] "V4"

All as expected...until I rename the columns. The colnames aren't all showing up correctly! The colnames change, but they only show up for the third variable, not the fourth variable.

var_tab[4] <- ifelse(var_tab[4] == var_tab[3], 1, 0)
colnames(var_tab)[3:4] <- c("model", "base")

> var_tab
  coef   p model V4
1    1 0.1   one  0
2    2 0.1   two  0
3    3 0.1 three  1
4    4 0.1  four  0

> colnames(var_tab)
[1] "coef"  "p"     "model" "base" 

The problem can be solved by renaming column 4 before recalculating it, so the problem is easily avoidable.

colnames(var_tab)[3:4] <- c("model", "base")
var_tab[4] <- ifelse(var_tab[4] == var_tab[3], 1, 0)

> colnames(var_tab)
[1] "coef"  "p"     "model" "base" 

> var_tab
  coef   p model base
1    1 0.1   one    0
2    2 0.1   two    0
3    3 0.1 three    1
4    4 0.1  four    0

Though I can avoid the problem, I still cannot understand what solved the problem. And I cannot find any other reference to this issue. It vaguely reminds me of the R floating number problem. Does anyone here know what caused my colnames to not be applied to the dataframe?

Thanks in advance for your help!

CodePudding user response:

After your setup

var_tab <- data.frame(coef=c(1:4), p=rep(0.1, 4))
var_tab <- cbind (var_tab, c("one", "two", "three", "four"))
var_tab[4] <- "three"

note the difference between

str(var_tab[4] == var_tab[3])
str(var_tab[[4]] == var_tab[[3]]) 

The first one returns a data.frame. When you assign a data.frame with a single column to a column of a data frame, then things get weird. The outer data frame has a name for the column that contains a data frame. That inner data frame has its own name for its column.

If you assign a clean vector to a column of a data.frame then you don't have this problem.

So you should use

var_tab[4] <- ifelse(var_tab[[4]] == var_tab[[3]], 1, 0)
colnames(var_tab)[3:4] <- c("model", "base")
  • Related