Home > Software engineering >  Why is the unite function not accepting my column names?
Why is the unite function not accepting my column names?

Time:04-13

I'm baffled. This code will not work for my dataset, but it works fine with dummy data. As far as I can tell there is no important differences in the structure of these two datasets. Why might I be getting this error about undefined columns?

> packageVersion('tidyr')
[1] ‘1.2.0’


> str(test)
'data.frame':   229 obs. of  9 variables:
 $ Response    : chr  "presence" "presence" "presence" "presence" ...
 $ Predictor   : chr  "tussock_gram" "wet_sedge" "nontussock_gram" "dry_gram_dwarf_shrub" ...
 $ Estimate    : num  1.03 2.77 2.02 13.73 -6.69 ...
 $ Std.Error   : chr  "1.6469" "1.7951" "8.5393" "14.6206" ...
 $ DF          : num  844 844 844 844 844 844 844 844 844 844 ...
 $ Crit.Value  : num  0.628 1.542 0.236 0.939 -0.761 ...
 $ P.Value     : num  0.53 0.123 0.813 0.348 0.447 ...
 $ Std.Estimate: num  0.0233 0.0536 0.0177 0.1019 -0.1441 ...
 $             : chr  "" "" "" "" ...

> dput(head(test))
structure(list(Response = c("presence", "presence", "presence", 
"presence", "presence", "presence"), Predictor = c("tussock_gram", 
"wet_sedge", "nontussock_gram", "dry_gram_dwarf_shrub", "low_shrub", 
"high_shrub"), Estimate = c(1.035, 2.7687, 2.0189, 13.7295, -6.6858, 
12.4353), Std.Error = c("1.6469", "1.7951", "8.5393", "14.6206", 
"8.7873", "3.5288"), DF = c(844, 844, 844, 844, 844, 844), Crit.Value = c(0.6285, 
1.5424, 0.2364, 0.9391, -0.7608, 3.524), P.Value = c(0.5297, 
0.123, 0.8131, 0.3477, 0.4467, 0.0004), Std.Estimate = c(0.0233, 
0.0536, 0.0177, 0.1019, -0.1441, 0.1436), c("", "", "", "", "", 
"***")), row.names = c(NA, 6L), class = "data.frame")



> test <- test %>%
  unite("Relationship", c(Response, Predictor), sep = "~") 

Error in `[.data.frame`(out, setdiff(names(out), names(from_vars))) : 
  undefined columns selected


> df <- as.data.frame(expand_grid(Response = c("a", NA), Predictor = c("b", NA)))

> str(df)
'data.frame':   4 obs. of  2 variables:
 $ Response : chr  "a" "a" NA NA
 $ Predictor: chr  "b" NA "b" NA


> df <- df %>%
  unite("Relationship", c(Response, Predictor), sep = "~")

# works fine



CodePudding user response:

There was a column in the updated dput, that is just blank as column name (""). We need to remove it

library(dplyr)
library(tidyr)
test %>% 
   select(-"") %>% 
   unite(Relationship, Response, Predictor, sep = "~")
  Relationship Estimate Std.Error  DF Crit.Value P.Value Std.Estimate
1         presence~tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233
2            presence~wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536
3      presence~nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177
4 presence~dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019
5            presence~low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441
6           presence~high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436

The issue is in the source code where it checks

...
 out <- out[setdiff(names(out), names(from_vars))]
...

It triggers the error because when we try to select a column with blank ("") as column name, it returns the error

> names(test)
[1] "Response"     "Predictor"    "Estimate"     "Std.Error"    "DF"           "Crit.Value"   "P.Value"      "Std.Estimate" ""       
> test[""]
Error in `[.data.frame`(test, "") : undefined columns selected

If there are unusual column names, either run make.names (from base R)

> make.names(names(test))
[1] "Response"     "Predictor"    "Estimate"     "Std.Error"    "DF"           "Crit.Value"   "P.Value"      "Std.Estimate" "X"    

Or use clean_names from janitor

> janitor::clean_names(test)
  response            predictor estimate std_error  df crit_value p_value std_estimate   x
1 presence         tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233    
2 presence            wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536    
3 presence      nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177    
4 presence dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019    
5 presence            low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441    
6 presence           high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436 ***

Thus, updating the column names will make sure that it runs with unite (without removing the column '')

names(test) <- make.names(names(test))
test %>%  
    unite(Relationship, Response, Predictor, sep = "~")
                   Relationship Estimate Std.Error  DF Crit.Value P.Value Std.Estimate   X
1         presence~tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233    
2            presence~wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536    
3      presence~nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177    
4 presence~dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019    
5            presence~low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441    
6           presence~high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436 ***
  • Related