I am working with a dataset and I am trying to perform PCA on the subset of the data that appears in multiple samples. I am wanting to do this by looking for presence 1
or absence 0
of a character in the samples.
Below is a simplified example of what the starting dataframe looks like:
> df
Sample SForm
1 S1 A
2 S1 B
3 S2 A
4 S2 B
5 S2 C
6 S3 B
7 S3 C
I would like to be able to get out a dataframe that has the presence or absence of the SForm column of df listed for each sample. As shown below:
> df_pca
S1 S2 S3
A 1 1 0
B 1 1 1
C 0 1 1
Any help that could be provided would be appreciated!
CodePudding user response:
One solution using tidyverse,
df = as.data.frame(rbind(c("S1", "A"),
c("S1", "B"),
c("S2", "A"),
c("S2", "B"),
c("S2", "C"),
c("S3", "B"),
c("S3", "C")))
df_pca = df %>%
count(V1, V2) %>%
tidyr::spread(key = V1,value = n) %>%
replace(is.na(.), 0) %>%
column_to_rownames(var = "V2")