Home > Software engineering >  R - Fill a dataframe based on presence for PCA
R - Fill a dataframe based on presence for PCA

Time:05-11

I am working with a dataset and I am trying to perform PCA on the subset of the data that appears in multiple samples. I am wanting to do this by looking for presence 1 or absence 0 of a character in the samples.

Below is a simplified example of what the starting dataframe looks like:

> df
  Sample SForm
1     S1     A
2     S1     B
3     S2     A
4     S2     B
5     S2     C
6     S3     B
7     S3     C

I would like to be able to get out a dataframe that has the presence or absence of the SForm column of df listed for each sample. As shown below:

> df_pca
  S1 S2 S3
A  1  1  0
B  1  1  1
C  0  1  1

Any help that could be provided would be appreciated!

CodePudding user response:

One solution using tidyverse,

df = as.data.frame(rbind(c("S1", "A"), 
                         c("S1", "B"), 
                         c("S2", "A"), 
                         c("S2", "B"), 
                         c("S2", "C"), 
                         c("S3", "B"), 
                         c("S3", "C")))

df_pca = df %>% 
    count(V1, V2) %>% 
    tidyr::spread(key = V1,value = n) %>% 
    replace(is.na(.), 0) %>% 
    column_to_rownames(var = "V2")
  • Related