Home > front end >  R: Equivalent of "Fixed Width Delimitation"?
R: Equivalent of "Fixed Width Delimitation"?

Time:10-19

I am working with the R programming language.

I have a column in a data frame that looks something like this (the column is a CHARACTER variable):

head(b$`New Col`)

[1] "1073680,,  781230,,  292455," "128485,,   62890,,   65595,"  "372475,,  184745,,  187730," 
  • The first row contains a single element: 1073680,, 781230,, 292455,
  • The second row contains a single element: 128485,, 62890,, 65595,
  • The third row contains a single element : 372475,, 184745,, 187730,

I want to split this column into 3 columns:

  id    col1   col2   col3
1  1 1073680 781230 292455
2  2  128485  62890  65595
3  3  372475 184745 187730

I know how to do this in Excel (e.g. remove last comma, and then used "fixed width delimited" using double commas).

But can someone please show me how to do this in R?

Thanks!

CodePudding user response:

You can do the same in R, with str_remove_all and separate. Here, I remove all commas, and separate by whitespace. Use convert = TRUE to convert the separated values to numeric.

library(dplyr)
library(stringr)
library(tidyr)
df %>% 
  mutate(col = str_remove_all(col, ",")) %>% 
  separate(col, into = str_c("col", 1:3), convert = TRUE)

#   ID    col1   col2   col3
# 1  1 1073680 781230 292455
# 2  2  128485  62890  65595
# 3  3  372475 184745 187730

Edit: you actually don't need the first step, since separate convert the first occurrences of the pattern defined by the length of the into parameter (here, 3). It also chooses by default punctuation character(s) as the separator, so it is not needed to specify it.

df %>%
  separate(col, into = str_c("col", 1:3), convert = TRUE)
#   ID    col1   col2   col3
# 1  1 1073680 781230 292455
# 2  2  128485  62890  65595
# 3  3  372475 184745 187730

Data

df <- data.frame(ID = 1:3, col = c("1073680,,  781230,,  292455,", "128485,,   62890,,   65595,",  "372475,,  184745,,  187730," ))

CodePudding user response:

Just for fun. Another way:

library(tidyverse)

df %>% 
  separate_rows(col1, sep = ",,  ") %>% 
  mutate(col = parse_number(col1), .keep="unused") %>% 
  group_by(ID) %>% 
  mutate(id = row_number()) %>% 
  pivot_wider(names_from = ID,
              values_from = col, 
              names_glue = "col_{ID}")
   id   col_1  col_2  col_3
  <int>   <dbl>  <dbl>  <dbl>
1     1 1073680 128485 372475
2     2  781230  62890 184745
3     3  292455  65595 187730

CodePudding user response:

In addition to Maël's answer: if you're strictly after fixed-width separation, use separate with position indices instead of delimiter strings.

Your example data:

b <- structure(list(New.Col = c("1073680,,  781230,,  292455,", " 128485,,   62890,,   65595,", 
" 372475,,  184745,,  187730,")), class = "data.frame", row.names = c(NA, 
-3L))

separate by fixed widths:

library(tidyr)

b <- b %>%
separate(col = `New.Col`,
         into = c('col1', 'drop1', 'col2', 'drop2', 'col3'),
         sep = c(7, 10, 17, 21, 27)
)

drop the garbage columns (containing the delimiters):

b %>% select(-starts_with('drop'))

CodePudding user response:

Using base R

df <- cbind(df[1],  read.csv(text = trimws(gsub(", \\s ", ",", df$col), 
    whitespace = ","), header = FALSE, col.names = paste0("col", 1:3))
)

-output

df
ID    col1   col2   col3
1  1 1073680 781230 292455
2  2  128485  62890  65595
3  3  372475 184745 187730
  • Related