I have a dataset retrieved from an.RData file. If use head() my data looks like this:
>head(df,1)
R123 R456
cg1 1.252 1.282
Using the "typeof()" command tells me the data type is a list. However, if I use "class()" my output is data.frame:
>typeof(df)
>class(df)
"data.frame"
Furthermore, I can use commands like df$ and my output looks like this
>df$R123
1.252 1.895
which returns the values corresponding to cg1 and cg2 rows. Using df[1,] and df[,1] gives me an output like this:
>df[1,]
R123 R456
cg1 1.252 1.252
>df[,2]
1.252 1.895
I used rownames() to confirm that cgX are row names These are my questions:
- Can someone explain this type of data format this is?
- Can someone explain how I would transform this data into the "Long" format data frame?
I would like to get the dataset in the "long" format so that it may be easier to analyze. The ideal format would look like this:
Individual_ID cg_site value
R123 cg1 1.252
R123 cg2 1.895
R456 cg1 1.282
R456 cg2 1.572
If context helps. The data is about DNA methylation sites. The RXYZ is an ID number, the cgX is a location, and the numerical values are quantities of methylation activities. The real dataset is quite massive, making doing this transformation manually very difficult a difficult task.
CodePudding user response:
It is just a data.frame with rownames as well. If we want to create three columns, one with column name, second with row name and the third with value, an easier option is to create table
object and coerce it to data.frame
with as.data.frame
as.data.frame.table(as.matrix(df))