I am trying to split up one column in a data frame that shows the period active(s) for several artists/ bands into two columns (start_of_career, end_of_career). The variable class is character. I tried to use tidyrs separate function for it and when I run it, I see that it is split in the console but not in the data frame itself, so I assume that it doesn't work properly.
Please see here a made up example of the data I want to split:
Column A | Column B |
---|---|
Artist A | 1995-present |
Artist B | 1995-1997, 2008, 2010-present |
As you can see, some rows will consists only of a start and end date, while others have several dates. All I actually need is the first number and the last, e.g. for Artist B I need only start_of_career 1995 and end_of_career "present". But I am somehow not able to solve this issue.
The code I used was:
library(tidyr)
df %>% separate(col = period_active, into = c('start_of_career', 'end_of_career'), sep = '-')
I also tried other separators(",", " "), but it didn't work either.
I also tried:
df$start_of_career = strsplit(df$period_active, split = '-')
But this didn't work as well.
CodePudding user response:
Using df
shown reproducibly in the Note at the end remove everything except first and last parts of Column B and then separate what is left.
library(dplyr)
library(tidyr)
dd %>%
mutate(`Column B` = sub("-.*-", "-", `Column B`)) %>%
separate(`Column B`, c("start", "end"))
## Column A start end
## 1 Artist A 1995 present
## 2 Artist B 1995 present
Note
df <-
structure(list(`Column A` = c("Artist A", "Artist B"), `Column B` = c("1995-present",
"1995-1997, 2008, 2010-present")), class = "data.frame", row.names = c(NA,
-2L))
CodePudding user response:
We could use separate_rows
and then filter for first and last row of group:
library(tidyr)
library(dplyr)
df %>%
separate_rows(Column.B) %>%
group_by(Column.A) %>%
filter(row_number()==1 | row_number()==n()) %>%
mutate(Colum.C = c("start", "end"))
Column.A Column.B Colum.C
<chr> <chr> <chr>
1 Artist A 1995 start
2 Artist A present end
3 Artist B 1995 start
4 Artist B present end
data:
structure(list(Column.A = c("Artist A", "Artist B"), Column.B = c("1995-present",
"1995-1997, 2008, 2010-present")), class = "data.frame", row.names = c(NA,
-2L))
CodePudding user response:
Using base R
df <- cbind(df[1], read.table(text = sub("-[0-9, ] ", "", df$`Column B`),
header = FALSE, col.names = c("start", "end"), sep = "-"))
-output
> df
Column A start end
1 Artist A 1995 present
2 Artist B 1995 present
We could do this with separate
as well
library(tidyr)
separate(df, `Column B`, into = c("start", "end"), sep = "-[^A-Za-z]*")
Column A start end
1 Artist A 1995 present
2 Artist B 1995 present
data
df <- structure(list(`Column A` = c("Artist A", "Artist B"),
`Column B` = c("1995-present",
"1995-1997, 2008, 2010-present")), class = "data.frame",
row.names = c(NA,
-2L))
CodePudding user response:
Using strsplit
and then subsequently pick the first and the last entry.
library(dplyr)
df %>%
rowwise() %>%
mutate(splitrow = strsplit(`Column B`, "-"),
start_of_career = splitrow[1],
end_of_career = splitrow[length(splitrow)],
splitrow = NULL) %>%
ungroup()
# A tibble: 2 × 4
`Column A` `Column B` start_of_career end_of_career
<chr> <chr> <chr> <chr>
1 Artist A 1995-present 1995 present
2 Artist B 1995-1997, 2008, 2010-present 1995 present
Data
df <- structure(list(`Column A` = c("Artist A", "Artist B"), `Column B` = c("1995-present",
"1995-1997, 2008, 2010-present")), class = "data.frame", row.names = c(NA,
-2L))
CodePudding user response:
Another option: use strsplit, and return the list of start and end values
f <- \(v) {
v = strsplit(v, "-|,| ")[[1]]
list(start = v[1],end = v[length(v)])
}
df %>%
mutate(df, `Column B` = lapply(`Column B`,f)) %>%
unnest_wider(`Column B`)
Output:
# A tibble: 2 × 3
`Column A` start end
<chr> <chr> <chr>
1 Artist A 1995 present
2 Artist B 1995 present
CodePudding user response:
Below code extract the first word before the dash and last word after.
for(i in 1:length(df))
{
df$start[i] <-sub("-.*", "", df$`Column B`[i])
df$end[i] <-sub("^. -", "", df$`Column B`[i])
}