I have an R dataframe that contains a rather large number of length measurements.
It is structured as follows:
> head(rb.len)
# A tibble: 6 × 5
Length Year Value Total Perc_Distr
<dbl> <dbl> <dbl> <dbl> <dbl>
1 9.5 1981 1 16641 0
2 10.5 1981 3 16641 0
3 12.5 1981 3 16641 0
4 13.5 1981 4 16641 0
5 14.5 1981 17 16641 0
6 15.5 1981 31 16641 0
Individuals of a certain length are grouped together and the total number of individuals (n) of a length class is listed in the column "value" (e.g. 17 individuals of 14.5cm were measured). For my further analysis I need each measurement to be in a separate row (so basically I need 17 rows with a measurement of 14.5cm). Unfortunately all I have learned so far is how to split columns with observations with multiple delimited values. As I have a single numeric value I am unsure how to proceed.
Hope you can help, Thanks in advance!
CodePudding user response:
1) tidyr Using rb.len
shown reproducibly in the Note at the end, use uncount
as shown. Add the argument .remove=FALSE
to uncount
if you prefer to retain the Value
column.
library(dplyr)
library(tidyr)
rb.len %>% uncount(Value)
## Length other
## 1 9.5 a
## 2 9.5 a
## 3 10.5 b
## 4 10.5 b
## 5 10.5 b
2) Base R Using base R we have the following. Replace , ]
with , -2]
if you prefer to omit Value
from the output (because in rb.len
shown in the Note Value
is the second column).
rb.len[rep(1:nrow(rb.len), rb.len$Value), ]
## Length Value other
## 1 9.5 2 a
## 1.1 9.5 2 a
## 2 10.5 3 b
## 2.1 10.5 3 b
## 2.2 10.5 3 b
Note
rb.len <- data.frame(Length = c(9.5, 10.5), Value = 2:3, other = letters[1:2])
rb.len
## Length Value other
## 1 9.5 2 a
## 2 10.5 3 b
CodePudding user response:
I hope I've understood you correctly. There is a nice solution that repeats the whole rows n-times:
rb.len<- as.data.frame(lapply(rb.len, rep, df$Value))
If you wish to multiply only some columns:
rb.len<- as.data.frame(lapply(rb.len[1:2], rep, df$Value))