Having a problem reading in a textConnection() mini-file that has factors. This fragment below makes two separate factors for 'LabAuto'.
x <- read.table(tc <- textConnection(
"Project, TestingType, CodeType
'TS', 'TDDEUT', Production
'TS', 'TDDEUT', Testing
'NR', 'LabAuto', Production
'In', 'LabAuto', Testing"),
header=TRUE, colClasses=c("character", "factor", "factor"),
sep=",", na.strings=c("NULL"), quote="'")
TestingType shows this, indicating there are two levels labeled (approximately) LabAuto:
> x$TestingType
[1] TDDEUT TDDEUT LabAuto LabAuto
Levels: LabAuto LabAuto TDDEUT
Ostensibly this is due to the extra space in front of the first 'LabAuto' factor, because if I remove one space (on the 'NR' line), then I just end up with two factors for TestingType, as I want:
> x$TestingType
[1] TDDEUT TDDEUT LabAuto LabAuto
Levels: LabAuto TDDEUT
But shouldn't specifying the sep="," and quote="'" parameters have told R to only consider the text inside the single-quotes as the factor label?
The single quotes are not exclusively the problem, as the third column above has the same issue:
> x$CodeType
[1] Production Testing Production Testing
Levels: Production Testing Testing Production
It shows 4 different factors instead of 2, again ostensibly because there are differing numbers of spaces in front of each. Is there a way to tell R to ignore spaces when making factor levels out of a text input file? Thanks.
CodePudding user response:
Your input file is in a very strange format. Normally you either have a delimiter or spaces separating values. You seem to have both which is odd. But you can strip out the space if you use the strip.white=
parameter to read.table
. Use
x <- read.table(tc <- textConnection(
"Project, TestingType, CodeType
'TS', 'TDDEUT', Production
'TS', 'TDDEUT', Testing
'NR', 'LabAuto', Production
'In', 'LabAuto', Testing"),
header=TRUE, colClasses=c("character", "factor", "factor"),
sep=",", na.strings=c("NULL"), quote="'", strip.white = TRUE)
x$TestingType
# [1] TDDEUT TDDEUT LabAuto LabAuto
# Levels: LabAuto TDDEUT