I would like to know what the differences are between vector and factor, because sometimes I find it confusing when I work in which database.
CodePudding user response:
A vector is the most basic form of data in R, it can be numeric, character, logical, or ... factor. Often we also call them variables, when thinking of variables in a dataframe, but vectors can be their own objects or be part of a list or dataframe. We commonly use c()
to create a vector, but note that even something as simple as x <- "a"
or y <- 0
will create a vector, which happens to be of length 1.
A factor is a very specific type of vector that is an odd mix of numeric and character, which at first glance seems like a character, but under the hood is actually numeric. The character aspect are the labels that are attached to each value. Thus, it is a categorical variable with a limited number of categories. If you have any knowledge of survey research, you'd know of Likert scales where 1 - Strongly disagree, etc... to 4 - Strongly agree. That would be commonly used as a factor variable in R.
For example, see the following:
vec <- c("Male", "Female", "Male", "Female", "Male")
factor(vec)
vec_fac <- factor(vec)
str(vec)
chr [1:5] "Male" "Female" "Male" "Female" "Male"
str(vec_fac)
Factor w/ 2 levels "Female","Male": 2 1 2 1 2
CodePudding user response:
There is a difference between the class of data, and the dimension of the data. dimension of the data can be vector, matrix, data.frame etc. However, type of the data can be character, numeric, factor, etc.
So you can have a vector that is either a factor, or numbers or strings etc.
Factor is defined as a categorical variable that has distinct levels. Factor and vector are not mutually exclusive.