Say for example I have a column that looks something like:
name:Michael,Age:31,City:NYC
How could I split this column into separate columns such that it would yield a result similar as a data frame to:
name | Age | City
1 Michael | 31 | NYC
I've attempted to use mtabulate from the qdapTools package but it only resulted with boolean value columns.
CodePudding user response:
We can use str_split_fixed
from the stringr
package:
library(stringr)
df[c("name", "Age", "City")] <- str_split_fixed(
gsub("\\w :", "", df$col), ",", 3
)
Data:
df <- data.frame(col=c("name:Michael,Age:31,City:NYC"),
stringsAsFactors=FALSE)
CodePudding user response:
We can separate
your single column dataframe into three columns, then remove the texts before colon :
.
library(dplyr)
library(tidyr)
df %>%
separate(col, into = c("name", "Age", "city"), sep = ",") %>%
mutate(across(everything(), ~gsub("^. ?:", "", .x)))
name Age city
1 Michael 31 NYC
Data
df <- structure(list(col = "name:Michael,Age:31,City:NYC"), class = "data.frame", row.names = c(NA,
-1L))
CodePudding user response:
library(tidyverse)
data <- "name:Michael,Age:31,City:NYC"
data |>
tibble() |>
separate_rows(data, sep = ",") |>
separate(data, into = c("name", "value")) |>
pivot_wider(names_from = "name", values_from = "value")
#> # A tibble: 1 × 3
#> name Age City
#> <chr> <chr> <chr>
#> 1 Michael 31 NYC
Created on 2022-08-08 by the reprex package (v2.0.1)
CodePudding user response:
In base R
, it can be done with read.dcf
out <- type.convert(as.data.frame(
read.dcf(textConnection(paste(gsub(",", "\n", df1$col1),
collapse = "\n\n")))
), as.is = TRUE)
-output
> out
name Age City
1 Michael 31 NYC
2 Michael 31 NYC
Or using tidyverse
library(dplyr)
library(tidyr)
df1 %>%
mutate(rn = row_number()) %>%
separate_rows(col1, sep = ",\\s*") %>%
separate(col1, into = c('col1', 'col2'), sep = ":") %>%
pivot_wider(names_from = col1, values_from = col2) %>%
select(-rn)
# A tibble: 2 × 3
name Age City
<chr> <chr> <chr>
1 Michael 31 NYC
2 Michael 31 NYC
data
df1 <- structure(list(col1 = c("name:Michael,Age:31,City:NYC",
"name:Michael,Age:31,City:NYC"
)), class = "data.frame", row.names = c(NA, -2L))