Basically I need a new column called "census" containing columns dbh1, dbh3, and dbh4 as sequential rows, correlating with their tree number and species of interest.
The data looks like this (with 10k rows):
Tree | Spec | dbh1 | dbh3 | dbh4 |
---|---|---|---|---|
1 | PICO | 6 | 8.8 | 9 |
2 | ABLA | 21 | 24.1 | 25.4 |
3 | PICO | 12 | 14.3 | 15.2 |
4 | PIEN | 24 | 25.5 | 25.8 |
I need it to look like this:
Tree | Spec | Census |
---|---|---|
1 | PICO | 6 |
1 | PICO | 8.8 |
1 | PICO | 9 |
2 | ABLA | 21 |
2 | ABLA | 24.1 |
2 | ABLA | 25.4 |
3 | PICO | 12 |
3 | PICO | 14.3 |
3 | PICO | 15.2 |
4 | PIEN | 24 |
4 | PIEN | 25.5 |
4 | PIEN | 25.8 |
I've tried looking around the tidyverse and I'm sure I'm missing what could be a very simple answer.
CodePudding user response:
You can use pivot_longer()
from tidyr
to pivot your dataframe, and then remove the new column with the information of which original variable the values from Census came from.
Data
df <-
structure(list(Tree = 1:4, Spec = c("PICO", "ABLA", "PICO", "PIEN"
), dbh1 = c(6L, 21L, 12L, 24L), dbh3 = c(8.8, 24.1, 14.3, 25.5
), dbh4 = c(9, 25.4, 15.2, 25.8)), class = "data.frame", row.names = c(NA,-4L))
Code
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("dbh"),values_to = "Census") %>%
select(-name)
Output
# A tibble: 12 x 3
Tree Spec Census
<int> <chr> <dbl>
1 1 PICO 6
2 1 PICO 8.8
3 1 PICO 9
4 2 ABLA 21
5 2 ABLA 24.1
6 2 ABLA 25.4
7 3 PICO 12
8 3 PICO 14.3
9 3 PICO 15.2
10 4 PIEN 24
11 4 PIEN 25.5
12 4 PIEN 25.8