I am working with the R programming language.
I have a column in a data frame that looks something like this (the column is a CHARACTER variable):
head(b$`New Col`)
[1] "1073680,, 781230,, 292455," "128485,, 62890,, 65595," "372475,, 184745,, 187730,"
- The first row contains a single element: 1073680,, 781230,, 292455,
- The second row contains a single element: 128485,, 62890,, 65595,
- The third row contains a single element : 372475,, 184745,, 187730,
I want to split this column into 3 columns:
id col1 col2 col3
1 1 1073680 781230 292455
2 2 128485 62890 65595
3 3 372475 184745 187730
I know how to do this in Excel (e.g. remove last comma, and then used "fixed width delimited" using double commas).
But can someone please show me how to do this in R?
Thanks!
CodePudding user response:
You can do the same in R, with str_remove_all
and separate
. Here, I remove all commas, and separate by whitespace. Use convert = TRUE
to convert the separated values to numeric.
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(col = str_remove_all(col, ",")) %>%
separate(col, into = str_c("col", 1:3), convert = TRUE)
# ID col1 col2 col3
# 1 1 1073680 781230 292455
# 2 2 128485 62890 65595
# 3 3 372475 184745 187730
Edit: you actually don't need the first step, since separate
convert the first occurrences of the pattern defined by the length of the into
parameter (here, 3). It also chooses by default punctuation character(s) as the separator, so it is not needed to specify it.
df %>%
separate(col, into = str_c("col", 1:3), convert = TRUE)
# ID col1 col2 col3
# 1 1 1073680 781230 292455
# 2 2 128485 62890 65595
# 3 3 372475 184745 187730
Data
df <- data.frame(ID = 1:3, col = c("1073680,, 781230,, 292455,", "128485,, 62890,, 65595,", "372475,, 184745,, 187730," ))
CodePudding user response:
Just for fun. Another way:
library(tidyverse)
df %>%
separate_rows(col1, sep = ",, ") %>%
mutate(col = parse_number(col1), .keep="unused") %>%
group_by(ID) %>%
mutate(id = row_number()) %>%
pivot_wider(names_from = ID,
values_from = col,
names_glue = "col_{ID}")
id col_1 col_2 col_3
<int> <dbl> <dbl> <dbl>
1 1 1073680 128485 372475
2 2 781230 62890 184745
3 3 292455 65595 187730
CodePudding user response:
In addition to Maël's answer: if you're strictly after fixed-width separation, use separate
with position indices instead of delimiter strings.
Your example data:
b <- structure(list(New.Col = c("1073680,, 781230,, 292455,", " 128485,, 62890,, 65595,",
" 372475,, 184745,, 187730,")), class = "data.frame", row.names = c(NA,
-3L))
separate by fixed widths:
library(tidyr)
b <- b %>%
separate(col = `New.Col`,
into = c('col1', 'drop1', 'col2', 'drop2', 'col3'),
sep = c(7, 10, 17, 21, 27)
)
drop the garbage columns (containing the delimiters):
b %>% select(-starts_with('drop'))
CodePudding user response:
Using base R
df <- cbind(df[1], read.csv(text = trimws(gsub(", \\s ", ",", df$col),
whitespace = ","), header = FALSE, col.names = paste0("col", 1:3))
)
-output
df
ID col1 col2 col3
1 1 1073680 781230 292455
2 2 128485 62890 65595
3 3 372475 184745 187730