I looked for a solution in the forum but I didn´t get any.
I´m working with a fish database and I´m trying to transform my data frame from this (MRE):
df_initial <- structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L), haul = c(11L, 11L, 11L, 11L, 11L, 11L, 11L), species = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "Merluccius merluccius", class = "factor"),
length = c(29L, 33L, 34L, 37L, 10L, 11L, 12L), number = c(2L,
1L, 1L, 1L, 7L, 4L, 5L)), class = "data.frame", row.names = c(NA,
-7L))
to this
df_final <-structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L, 2011L, 2011L, 2011L, 2011L), haul = c(11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L), species = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Merluccius merluccius", class = "factor"),
length = c(29L, 29L, 33L, 34L, 37L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L), number = c(2L,
2L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-21L))
Namely, I want to replicate the length size by its number and keeping all the columns.
I´ve tried several approaches using the function rep() but I always get the same error: invalid 'times' argument . I´ve also tried playing with the data type but with no success.
What am I doing wrong?.
Here it is the last code I ran
df_final <- df_initial[rep(row.names(df_initial), df_initial$number), 1:5]
Any help will be more than welcome. Thanks in advance.
CodePudding user response:
The error is most likely caused by NA
values in number
. You'll have to deal with these first, either by dropping them or, if you want to retain them in the output, replacing NA
with some value. Here's how to do both, using either base R or {tidyr}.
Remove rows with NA
s
base R:
# add NA values to example
df_initial$number[5:6] <- NA_integer_
df_cleaned <- df_initial[!is.na(df_initial$number), ]
df_final <- df_cleaned[rep(row.names(df_cleaned), df_cleaned$number), 1:5]
df_final
#> year haul species length number
#> 1 2011 11 Merluccius merluccius 29 2
#> 1.1 2011 11 Merluccius merluccius 29 2
#> 2 2011 11 Merluccius merluccius 33 1
#> 3 2011 11 Merluccius merluccius 34 1
#> 4 2011 11 Merluccius merluccius 37 1
#> 7 2011 11 Merluccius merluccius 12 5
#> 7.1 2011 11 Merluccius merluccius 12 5
#> 7.2 2011 11 Merluccius merluccius 12 5
#> 7.3 2011 11 Merluccius merluccius 12 5
#> 7.4 2011 11 Merluccius merluccius 12 5
tidyr:
library(tidyr)
df_final <- df_initial %>%
drop_na(number) %>%
uncount(weights = number, .remove = FALSE)
df_final
#> year haul species length number
#> 1 2011 11 Merluccius merluccius 29 2
#> 2 2011 11 Merluccius merluccius 29 2
#> 3 2011 11 Merluccius merluccius 33 1
#> 4 2011 11 Merluccius merluccius 34 1
#> 5 2011 11 Merluccius merluccius 37 1
#> 6 2011 11 Merluccius merluccius 12 5
#> 7 2011 11 Merluccius merluccius 12 5
#> 8 2011 11 Merluccius merluccius 12 5
#> 9 2011 11 Merluccius merluccius 12 5
#> 10 2011 11 Merluccius merluccius 12 5
Replace NA
s
base R:
df_cleaned <- df_initial
df_cleaned$number[is.na(df_initial$number)] <- 1L
df_final <- df_cleaned[rep(row.names(df_cleaned), df_cleaned$number), 1:5]
df_final
#> year haul species length number
#> 1 2011 11 Merluccius merluccius 29 2
#> 1.1 2011 11 Merluccius merluccius 29 2
#> 2 2011 11 Merluccius merluccius 33 1
#> 3 2011 11 Merluccius merluccius 34 1
#> 4 2011 11 Merluccius merluccius 37 1
#> 5 2011 11 Merluccius merluccius 10 1
#> 6 2011 11 Merluccius merluccius 11 1
#> 7 2011 11 Merluccius merluccius 12 5
#> 7.1 2011 11 Merluccius merluccius 12 5
#> 7.2 2011 11 Merluccius merluccius 12 5
#> 7.3 2011 11 Merluccius merluccius 12 5
#> 7.4 2011 11 Merluccius merluccius 12 5
tidyr
df_final <- df_initial %>%
replace_na(list(number = 1L)) %>%
uncount(weights = number, .remove = FALSE)
df_final
#> year haul species length number
#> 1 2011 11 Merluccius merluccius 29 2
#> 2 2011 11 Merluccius merluccius 29 2
#> 3 2011 11 Merluccius merluccius 33 1
#> 4 2011 11 Merluccius merluccius 34 1
#> 5 2011 11 Merluccius merluccius 37 1
#> 6 2011 11 Merluccius merluccius 10 1
#> 7 2011 11 Merluccius merluccius 11 1
#> 8 2011 11 Merluccius merluccius 12 5
#> 9 2011 11 Merluccius merluccius 12 5
#> 10 2011 11 Merluccius merluccius 12 5
#> 11 2011 11 Merluccius merluccius 12 5
#> 12 2011 11 Merluccius merluccius 12 5
Created on 2022-03-15 by the reprex package (v2.0.1)