i have following dataframe
head(ca.spe)
Dataset recording_date grid_cell
1 2 September 1987 7914/2
2 1 September 1937 8025/3
3 1 September 1933 8114/1
4 1 September 1926 8021/4
5 1 September 1926 8021/4
6 1 September 1865 7522/21
In this dataframe are some rows, that I want to change.
Rows with grid cells that have two or three numbers after the / need to be changed.
for example row 6 has 7522/21
I want to converte this row into two rows with 7522/1 and 7522/2 every other information of the row should be left untouched.
The result of the operation should look like this:
head(ca.spe)
Dataset recording_date grid_cell
1 2 September 1987 7914/2
2 1 September 1937 8025/3
3 1 September 1933 8114/1
4 1 September 1926 8021/4
5 1 September 1926 8021/4
6 1 September 1865 7522/1
7 1 September 1865 7522/2
Has somebody a clue how to do this?
Cheers Mike
CodePudding user response:
Here is one option with tidyverse
- separate
the column 'grid_cell' into two by splitting at the /
, then create rows by splitting between each character (or digit) in the 'grid_cell2' column and finally unite
the 'grid_cell' columns to create the 'grid_cell' column
library(dplyr)
library(tidyr)
ca.spe %>%
separate(grid_cell, into = c("grid_cell1", "grid_cell2"), sep = "/") %>%
separate_rows(grid_cell2, sep = "(?<=.)(?=.)") %>%
unite(grid_cell, starts_with('grid_cell'), sep = "/")
-output
# A tibble: 7 × 3
Dataset recording_date grid_cell
<int> <chr> <chr>
1 2 September 1987 7914/2
2 1 September 1937 8025/3
3 1 September 1933 8114/1
4 1 September 1926 8021/4
5 1 September 1926 8021/4
6 1 September 1865 7522/2
7 1 September 1865 7522/1
data
ca.spe <- structure(list(Dataset = c(2L, 1L, 1L, 1L, 1L, 1L),
recording_date = c("September 1987",
"September 1937", "September 1933", "September 1926", "September 1926",
"September 1865"), grid_cell = c("7914/2", "8025/3", "8114/1",
"8021/4", "8021/4", "7522/21")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
CodePudding user response:
Another option using base R, where we first split on /
, then create new columns with the part before and after. Then, we can split on the additional numbers (i.e., 21
into 2
and 1
). Then, we create a new dataframe by replicating the length for the additional rows, and paste the grid columns back together.
ca.spe$grid_cell1 <- sapply(strsplit(ca.spe$grid_cell, split = "/"), "[", 1 )
ca.spe$grid_cell2 <- sapply(strsplit(ca.spe$grid_cell, split = "/"), "[", 2 )
s <- strsplit(ca.spe$grid_cell2, split = "")
ca.spe <- data.frame(Dataset = rep(ca.spe$Dataset, sapply(s, length)),
recording_date = rep(ca.spe$recording_date, sapply(s, length)),
grid_cell = paste0(rep(ca.spe$grid_cell1, sapply(s, length)), "/", unlist(s)))
Output
Dataset recording_date grid_cell
1 1 2 September 1987 7914/2
2 2 1 September 1937 8025/3
3 3 1 September 1933 8114/1
4 4 1 September 1926 8021/4
5 5 1 September 1926 8021/4
6 6 1 September 1865 7522/2
7 6 1 September 1865 7522/1
Data
ca.spe <- structure(list(Dataset = 1:6, recording_date = c("2 September 1987",
"1 September 1937", "1 September 1933", "1 September 1926", "1 September 1926",
"1 September 1865"), grid_cell = c("7914/2", "8025/3", "8114/1",
"8021/4", "8021/4", "7522/21")), class = "data.frame", row.names = c(NA,
-6L))