Split strings by commas into columns-CodePudding

I have a single column (yes, just one column) with 200 rows whose elements are strings separated by commas.

Actual data in a dataframe:

"A, B, C, D"
"1, 10, 13, 4"
"0, 1, 6, 1"
"9, 3, 3, 0"
...

And from this single column I want to produce de following data frame:

A   B   C   D

1   10  13  4
0   1   6   1
9   3   3   0
     ...

Where "A", "B", "C", "D" are the column-headers for this data frame and the rows also split by comma to each of the created column respectively. How can I achieve this in R with read.table making the first row as headers?

CodePudding user response：

Here are several different ways to extract and use the data sticking with read.table().

I started with two fake sets of data. In one with nothing of value in the column name (real column name is in row one).

df1 <- data.frame("V1" = c("A,B,C,D", 
                           "AA,D,E,F3", 
                           "Car1,Car2,Car3,Car4",
                           "a,b,c,d",
                           "a1,b1,c1,d1"))
#                    V1
# 1             A,B,C,D
# 2           AA,D,E,F3
# 3 Car1,Car2,Car3,Car4
# 4             a,b,c,d
# 5         a1,b1,c1,d1

In the other, the string that is listed as the column name is a list of the would-be names.

df2 <- data.frame("A,B,C,D" = c("AA,D,E,F3", 
                                "Car1,Car2,Car3,Car4",
                                "a,b,c,d",
                                "a1,b1,c1,d1"), 
                  check.names = F)
#               A,B,C,D
# 1           AA,D,E,F3
# 2 Car1,Car2,Car3,Car4
# 3             a,b,c,d
# 4         a1,b1,c1,d1

To extract the names and values delimited by a comma, where the would-be headings are in row 1 (using df1).

# single data.frame with headers concatenated in the first row
df.noHeader <- read.table(col.names = unlist(strsplit(df1[1,], 
                                                      split = "[,]")),
                          sep = ",",
                          skip = 1, # since the headers were in row 1
                          text = unlist(df1, use.names = F)) 
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1

For clarity, this is what works for when the names are in the column name of the original data frame.

# splitting the original header when splitting the data
df.header <- read.table(col.names = unlist(strsplit(names(df2), 
                                                    split = "[,]")),
                        sep = ",", 
                        text = unlist(df2))
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1

If you had the headings in some other row, you only need to change the value in the call to strsplit(), like this:

# if the headers were in row 2
df.noHeader <- read.table(col.names = unlist(strsplit(df1[2,], # <- see 2 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 2,  # since the headers were in row 2
                          text = unlist(df1, use.names = F))
#     AA    D    E   F3
# 1 Car1 Car2 Car3 Car4
# 2    a    b    c    d
# 3   a1   b1   c1   d1 

# if the headers were in row 3
df.noHeader <- read.table(col.names = unlist(strsplit(df1[3,], # <- see 3 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 3, # since the headers were in row 3
                          text = unlist(df1, use.names = F))
#   Car1 Car2 Car3 Car4
# 1    a    b    c    d
# 2   a1   b1   c1   d1