first time question asker so I will try to explain my best. I am trying to tidy up a data set for a non-profit that counts benthic macroinvertebrates (stream bugs). The goal of this data set is to compare ID accuracy between volunteers and biologists. The original data was stored in an extremely complex Excel sheet. I finally have the data concatenated in a large .csv, but I still have an issue where 1 observation (1 observer at 1 moment in time) is in multiple rows. I would like to use pivot functions to get this taken care of, but it's not as cut and dry as some of the how-to's I've been reading. I created a simplified example of my issue here:
Workbook Location Taxa K_Vol K_Bio L_Vol L_Bio V_Vol V_Bio
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A BigCreek Worm 4 4 2 5 5 3
2 A BigCreek Larvae 4 4 3 2 2 3
3 A BigCreek Swimmer 1 1 2 3 3 3
4 A SmallCreek Worm 1 2 2 4 1 5
5 A SmallCreek Larvae 1 4 4 2 5 5
6 A SmallCreek Swimmer 2 2 3 4 3 1
7 B BigCreek Worm 5 1 3 3 4 1
8 B BigCreek Larvae 5 5 2 5 4 4
9 B BigCreek Swimmer 3 1 4 5 4 4
10 B SmallCreek Worm 4 1 3 3 4 4
11 B SmallCreek Larvae 1 4 5 5 2 5
12 B SmallCreek Swimmer 5 3 3 5 1 1
I would like it to look like this:
A tibble: 24 × 6
Workbook Location Counter Worm Larvae Swimmer
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 A BigCreek K_Vol 4 4 1
2 A BigCreek K_Bio 4 4 1
3 A BigCreek L_Vol 2 3 2
4 A BigCreek L_Bio 5 2 3
5 A BigCreek V_Vol 5 2 3
6 A BigCreek V_Bio 3 3 3
7 A SmallCreek K_Vol 1 1 2
8 A SmallCreek K_Bio 2 4 2
9 A SmallCreek L_Vol 2 4 3
10 A SmallCreek L_Bio 4 2 4
# … with 14 more rows
I have used this to get one column to behave more or less, but I am unsure how to proceed from here as this is a little more of a complicated transformation.
simplecurrent %>%
pivot_longer(c('K_Vol', 'K_Bio', 'L_Vol', 'L_Bio', 'V_Vol', 'V_Bio'),
names_to = "Counter", values_to = 'Worm') %>%
PS. The pipe at the end of that last bit is from me having the tibble print entirely, not needed here It produces this:
Workbook Location Taxa Counter Worm
<chr> <chr> <chr> <chr> <dbl>
1 A BigCreek Worm K_Vol 4
2 A BigCreek Worm K_Bio 4
3 A BigCreek Worm L_Vol 2
4 A BigCreek Worm L_Bio 5
5 A BigCreek Worm V_Vol 5
6 A BigCreek Worm V_Bio 3
7 A BigCreek Larvae K_Vol 4
8 A BigCreek Larvae K_Bio 4
9 A BigCreek Larvae L_Vol 3
10 A BigCreek Larvae L_Bio 2
11 A BigCreek Larvae V_Vol 2
and so on, all looking good. How do I get the other species to behave like "worm"?
CodePudding user response:
A data.table
option using melt
and dcast
.
library(data.table)
dcast(melt(setDT(simplecurrent), id.vars = c('Workbook', 'Location', 'Taxa'),
variable.name = 'Counter'),
Workbook Location Counter ~ Taxa, value.var = 'value')
# Workbook Location Counter Larvae Swimmer Worm
# 1: A BigCreek K_Vol 4 1 4
# 2: A BigCreek K_Bio 4 1 4
# 3: A BigCreek L_Vol 3 2 2
# 4: A BigCreek L_Bio 2 3 5
# 5: A BigCreek V_Vol 2 3 5
# 6: A BigCreek V_Bio 3 3 3
# 7: A SmallCreek K_Vol 1 2 1
# 8: A SmallCreek K_Bio 4 2 2
# 9: A SmallCreek L_Vol 4 3 2
#10: A SmallCreek L_Bio 2 4 4
#11: A SmallCreek V_Vol 5 3 1
#12: A SmallCreek V_Bio 5 1 5
#13: B BigCreek K_Vol 5 3 5
#14: B BigCreek K_Bio 5 1 1
#15: B BigCreek L_Vol 2 4 3
#16: B BigCreek L_Bio 5 5 3
#17: B BigCreek V_Vol 4 4 4
#18: B BigCreek V_Bio 4 4 1
#19: B SmallCreek K_Vol 1 5 4
#20: B SmallCreek K_Bio 4 3 1
#21: B SmallCreek L_Vol 5 3 3
#22: B SmallCreek L_Bio 5 5 3
#23: B SmallCreek V_Vol 2 1 4
#24: B SmallCreek V_Bio 5 1 4
# Workbook Location Counter Larvae Swimmer Worm
CodePudding user response:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = c(K_Vol:V_Bio),names_to = "Counter") %>%
pivot_wider(names_from = Taxa,values_from = value)
# A tibble: 24 x 6
Workbook Location Counter Worm Larvae Swimmer
<chr> <chr> <chr> <int> <int> <int>
1 A BigCreek K_Vol 4 4 1
2 A BigCreek K_Bio 4 4 1
3 A BigCreek L_Vol 2 3 2
4 A BigCreek L_Bio 5 2 3
5 A BigCreek V_Vol 5 2 3
6 A BigCreek V_Bio 3 3 3
7 A SmallCreek K_Vol 1 1 2
8 A SmallCreek K_Bio 2 4 2
9 A SmallCreek L_Vol 2 4 3
10 A SmallCreek L_Bio 4 2 4
# ... with 14 more rows
-- data
structure(list(Workbook = c("A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), Location = c("BigCreek", "BigCreek",
"BigCreek", "SmallCreek", "SmallCreek", "SmallCreek", "BigCreek",
"BigCreek", "BigCreek", "SmallCreek", "SmallCreek", "SmallCreek"
), Taxa = c("Worm", "Larvae", "Swimmer", "Worm", "Larvae", "Swimmer",
"Worm", "Larvae", "Swimmer", "Worm", "Larvae", "Swimmer"), K_Vol = c(4L,
4L, 1L, 1L, 1L, 2L, 5L, 5L, 3L, 4L, 1L, 5L), K_Bio = c(4L, 4L,
1L, 2L, 4L, 2L, 1L, 5L, 1L, 1L, 4L, 3L), L_Vol = c(2L, 3L, 2L,
2L, 4L, 3L, 3L, 2L, 4L, 3L, 5L, 3L), L_Bio = c(5L, 2L, 3L, 4L,
2L, 4L, 3L, 5L, 5L, 3L, 5L, 5L), V_Vol = c(5L, 2L, 3L, 1L, 5L,
3L, 4L, 4L, 4L, 4L, 2L, 1L), V_Bio = c(3L, 3L, 3L, 5L, 5L, 1L,
1L, 4L, 4L, 4L, 5L, 1L)), class = "data.frame", row.names = c(NA,
-12L))