I have this example of dataframe.
df <- structure(list(PC1 = c(-0.0277818345657933, -0.0342426301759117,
-0.0328199061848987, 0.0557338197779853, 0.042369402931087),
PC2 = c(-0.0149291182738773, -0.00862145986823889, -0.0101822421485786,
-0.00862630869071877, -0.00419434673647331)), row.names = c("Homo sapiens - ULAC-0968",
"Homo sapiens - ULAC-0978", "Homo sapiens - ULAC-0996", "Pan troglodytes - HTB2804",
"Pan troglodytes - HTB411"), class = "data.frame")
What I would like is to create an extra column, named Species
, with the content of the row names. In this case, the factors would be only Homo sapiens
and Pan troglodytes
.
How could I proceed?
CodePudding user response:
library(tidyverse)
df %>%
rownames_to_column(var = 'Species') %>%
mutate(Species=sapply(strsplit(Species,split = ' -'),function(x) as.factor(x[1])))
output;
Species PC1 PC2
<fct> <dbl> <dbl>
1 Homo sapiens -0.0278 -0.0149
2 Homo sapiens -0.0342 -0.00862
3 Homo sapiens -0.0328 -0.0102
4 Pan troglodytes 0.0557 -0.00863
5 Pan troglodytes 0.0424 -0.00419
CodePudding user response:
Base R option.
Use sub
to drop everything from -
and delete the rownames.
df$Species <- trimws(sub('-.*', '', rownames(df)))
rownames(df) <- NULL
df
# PC1 PC2 Species
#1 -0.0278 -0.01493 Homo sapiens
#2 -0.0342 -0.00862 Homo sapiens
#3 -0.0328 -0.01018 Homo sapiens
#4 0.0557 -0.00863 Pan troglodytes
#5 0.0424 -0.00419 Pan troglodytes