Home > front end >  How to turn four vectors of differing lengths into a long format dataframe?
How to turn four vectors of differing lengths into a long format dataframe?

Time:05-06

I am very new to R programming and have been provided the following data to implement a non-parametric test on. My issue lies in being able to turn this data (in R) into a long format data frame, so I may then conduct a histo/box plot. We aren't allowed to simply turn data into csv then read in, it has to be done in R.

A:1361,1466,1319,1426,1437,1541,1474,1386,1510,1373,1463,1305,1571,1224,1372 B:1581,1515,1606,1518,1395,1584,1671,1573,1454,1674,1459,1647 C:1482,1570,1575,1634,1542,1651,1189,1678,1391,1525 D:2084,1566,1990,1996,2052,1436,1808,1679,1981,2014,1759,1842,1603,1670,1845,2016,1621,2050,1690,1933

I've turned these into vectors but keep spitting error mssgs when I try to turn into data frame (vectors different lengths). Any pointers would be much help, I've been trying to troubleshoot for hours and my prof is no help.

Thanks

CodePudding user response:

You can use stack to put into one long format.

I'll assume you are starting with a vector of strings,

vec <- c("A:1361,1466,1319,1426,1437,1541,1474,1386,1510,1373,1463,1305,1571,1224,1372", "B:1581,1515,1606,1518,1395,1584,1671,1573,1454,1674,1459,1647", "C:1482,1570,1575,1634,1542,1651,1189,1678,1391,1525", "D:2084,1566,1990,1996,2052,1436,1808,1679,1981,2014,1759,1842,1603,1670,1845,2016,1621,2050,1690,1933")

We can split into a list,

str(setNames(sapply(vecspl, `[`, -1), sapply(vecspl, `[[`, 1)))
# List of 4
#  $ A: chr [1:15] "1361" "1466" "1319" "1426" ...
#  $ B: chr [1:12] "1581" "1515" "1606" "1518" ...
#  $ C: chr [1:10] "1482" "1570" "1575" "1634" ...
#  $ D: chr [1:20] "2084" "1566" "1990" "1996" ...

From here, we can stack(.) it:

stack(setNames(sapply(vecspl, `[`, -1), sapply(vecspl, `[[`, 1)))
#    values ind
# 1    1361   A
# 2    1466   A
# 3    1319   A
# 4    1426   A
# 5    1437   A
# 6    1541   A
# 7    1474   A
# 8    1386   A
# 9    1510   A
# 10   1373   A
# 11   1463   A
# 12   1305   A
# 13   1571   A
# 14   1224   A
# 15   1372   A
# 16   1581   B
# 17   1515   B
# 18   1606   B
# 19   1518   B
# 20   1395   B
# 21   1584   B
# 22   1671   B
# 23   1573   B
# 24   1454   B
# 25   1674   B
# 26   1459   B
# 27   1647   B
# 28   1482   C
# 29   1570   C
# 30   1575   C
# 31   1634   C
# 32   1542   C
# 33   1651   C
# 34   1189   C
# 35   1678   C
# 36   1391   C
# 37   1525   C
# 38   2084   D
# 39   1566   D
# 40   1990   D
# 41   1996   D
# 42   2052   D
# 43   1436   D
# 44   1808   D
# 45   1679   D
# 46   1981   D
# 47   2014   D
# 48   1759   D
# 49   1842   D
# 50   1603   D
# 51   1670   D
# 52   1845   D
# 53   2016   D
# 54   1621   D
# 55   2050   D
# 56   1690   D
# 57   1933   D

CodePudding user response:

You could scan the information. Then clean it from labels using gsub, strsplit at the commas, use substrings as names, then stack. You could continue automatically type.converting numerics, aggregate the sums and barplot the result.

x <- scan(text='A:1361,1466,1319,1426,1437,1541,1474,1386,1510,1373,1463,1305,1571,1224,1372
B:1581,1515,1606,1518,1395,1584,1671,1573,1454,1674,1459,1647
C:1482,1570,1575,1634,1542,1651,1189,1678,1391,1525
D:2084,1566,1990,1996,2052,1436,1808,1679,1981,2014,1759,1842,1603,1670,1845,2016,1621,2050,1690,1933',
      what='character', quiet=TRUE)

x |> 
  gsub(pattern='\\w:', replacement='') |>
  strsplit(',') |>
  setNames(substr(x, 1, 1)) |>
  stack() |>
  type.convert(as.is=TRUE) |>
  aggregate(values ~ ind, data=_, sum) |>
  barplot(values ~ ind, data=_, col=seq_len(length(x))   1)

enter image description here

  • Related