I have a single column (yes, just one column) with 200 rows whose elements are strings separated by commas.
Actual data in a dataframe:
"A, B, C, D"
"1, 10, 13, 4"
"0, 1, 6, 1"
"9, 3, 3, 0"
...
And from this single column I want to produce de following data frame:
A B C D
1 10 13 4
0 1 6 1
9 3 3 0
...
Where "A", "B", "C", "D" are the column-headers for this data frame and the rows also split by comma to each of the created column respectively. How can I achieve this in R with read.table making the first row as headers?
CodePudding user response:
Here are several different ways to extract and use the data sticking with read.table()
.
I started with two fake sets of data. In one with nothing of value in the column name (real column name is in row one).
df1 <- data.frame("V1" = c("A,B,C,D",
"AA,D,E,F3",
"Car1,Car2,Car3,Car4",
"a,b,c,d",
"a1,b1,c1,d1"))
# V1
# 1 A,B,C,D
# 2 AA,D,E,F3
# 3 Car1,Car2,Car3,Car4
# 4 a,b,c,d
# 5 a1,b1,c1,d1
In the other, the string that is listed as the column name is a list of the would-be names.
df2 <- data.frame("A,B,C,D" = c("AA,D,E,F3",
"Car1,Car2,Car3,Car4",
"a,b,c,d",
"a1,b1,c1,d1"),
check.names = F)
# A,B,C,D
# 1 AA,D,E,F3
# 2 Car1,Car2,Car3,Car4
# 3 a,b,c,d
# 4 a1,b1,c1,d1
To extract the names and values delimited by a comma, where the would-be headings are in row 1 (using df1
).
# single data.frame with headers concatenated in the first row
df.noHeader <- read.table(col.names = unlist(strsplit(df1[1,],
split = "[,]")),
sep = ",",
skip = 1, # since the headers were in row 1
text = unlist(df1, use.names = F))
# A B C D
# 1 AA D E F3
# 2 Car1 Car2 Car3 Car4
# 3 a b c d
# 4 a1 b1 c1 d1
For clarity, this is what works for when the names are in the column name of the original data frame.
# splitting the original header when splitting the data
df.header <- read.table(col.names = unlist(strsplit(names(df2),
split = "[,]")),
sep = ",",
text = unlist(df2))
# A B C D
# 1 AA D E F3
# 2 Car1 Car2 Car3 Car4
# 3 a b c d
# 4 a1 b1 c1 d1
If you had the headings in some other row, you only need to change the value in the call to strsplit()
, like this:
# if the headers were in row 2
df.noHeader <- read.table(col.names = unlist(strsplit(df1[2,], # <- see 2 here
split = "[,]")),
sep = ",",
skip = 2, # since the headers were in row 2
text = unlist(df1, use.names = F))
# AA D E F3
# 1 Car1 Car2 Car3 Car4
# 2 a b c d
# 3 a1 b1 c1 d1
# if the headers were in row 3
df.noHeader <- read.table(col.names = unlist(strsplit(df1[3,], # <- see 3 here
split = "[,]")),
sep = ",",
skip = 3, # since the headers were in row 3
text = unlist(df1, use.names = F))
# Car1 Car2 Car3 Car4
# 1 a b c d
# 2 a1 b1 c1 d1