Home > Software engineering >  Error in read.table, how to set column name as row.name?
Error in read.table, how to set column name as row.name?

Time:11-19

Can anyone explain what's going on here? setting row.name= NULL makes no difference compared to when I dont specify it, yet when I set row.names=1, it says duplicate row.names not allowed? How do I resolve this to get column V1 as rownames?

ak1a = read.table("/Users/abhaykanodia/Desktop/smallRNA/AK1a_counts.txt", row.names = NULL)
head(ak1a)
                  V1 V2
1 ENSG00000000003.15  2
2  ENSG00000000005.6  0
3 ENSG00000000419.14 21
4 ENSG00000000457.14  0
5 ENSG00000000460.17  2
6 ENSG00000000938.13  0
ak1a = read.table("/Users/abhaykanodia/Desktop/smallRNA/AK1a_counts.txt")
head(ak1a)
                  V1 V2
1 ENSG00000000003.15  2
2  ENSG00000000005.6  0
3 ENSG00000000419.14 21
4 ENSG00000000457.14  0
5 ENSG00000000460.17  2
6 ENSG00000000938.13  0
ak1a = read.table("/Users/abhaykanodia/Desktop/smallRNA/AK1a_counts.txt", row.names = 1)
Error in read.table("/Users/abhaykanodia/Desktop/smallRNA/AK1a_counts.txt",  : 
  duplicate 'row.names' are not allowed

CodePudding user response:

From the helpfile you can read:

If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered.

That explains the same behavior when you set row.names=NULL or when you use its default value.

You can set row.names as in this example:

df <- read.table(text="V1 V2
ENSG00000000003.15  2
ENSG00000000005.6  0
ENSG00000000419.14 21
ENSG00000000457.14  0
ENSG00000000460.17  2
ENSG00000000938.13  0", header=TRUE, row.names=letters[1:6])

which displays:

                  V1 V2
a ENSG00000000003.15  2
b  ENSG00000000005.6  0
c ENSG00000000419.14 21
d ENSG00000000457.14  0
e ENSG00000000460.17  2
f ENSG00000000938.13  0

CodePudding user response:

The first two executions are functionally the same, when you don't use row.names parameter of read.table, it's assumed that its value is NULL.

The third one fails because 1 is interpreted as being a vector with length equal to the number of rows filled with the value 1. Hence the error affirming you can't have two rows with the same name.

What you're doing with row.names=1 is equivalent trying to do:

test <- read.table(text="X Y
1 2
3 4", header=TRUE)
row.names(test) = c(1,1)

It gives the same Error.

If you want to name your rows R1:RX why not try something like this:

ak1a = read.table("/Users/abhaykanodia/Desktop/smallRNA/AK1a_counts.txt")
row.names(ak1a) = paste("R",1:dim(ak1a)[1],sep="")
  • Related