Replace single comma between two numbers-CodePudding

The case is that I have some data separated by commas that originally are two variables. One categorical and one numerical. Here you can see a sample:

-5,50,D
-5,50,S
 0,00,T
-5,50,S
-5,28,S
-5,25,C

As you can see in the previous sample if I separate the file by commas I get a dataset of 3 columns when there are only two:

-5.50,D
-5.50,S
 0,00,T
-5.50,S
-5.28,S
-5.25,C

I thought that the best idea to do it would be through a regex. Any code proposal?

CodePudding user response：

Since you mentioned "columns," I assume this is a column in a dataframe? If so, you can use tidyr::extract():

library(tidyr)

extract(dat, x, into = c("num", "char"), "(-?\\d*,\\d*),(\\w*)")

    num char
1 -5,50    D
2 -5,50    S
3  0,00    T
4 -5,50    S
5 -5,28    S
6 -5,25    C

Example data:

dat <- data.frame(
  x = c("-5,50,D", "-5,50,S", "0,00,T", "-5,50,S", "-5,28,S", "-5,25,C")
)

CodePudding user response：

Here is another option. Replace the "," with "." and then separate the columns.

library(tidyverse)

dat |>
  mutate(x = sub("(.*)(?<=\\d),(?=\\d)(.*?$)", "\\1.\\2", x, perl = TRUE)) |>
  separate(x, into = c("num", "char"), sep = ",")
#>     num char
#> 1 -5.50    D
#> 2 -5.50    S
#> 3  0.00    T
#> 4 -5.50    S
#> 5 -5.28    S
#> 6 -5.25    C

CodePudding user response：

library(tidyr)
dat %>%
  # extract into two columns:
  extract(x, 
          into = c("num", "char"), 
          regex = "(.*),(.*)") %>%
  # change "," to ".":
  mutate(num = sub(",", ".", num))
    num char
1 -5.50    D
2 -5.50    S
3  0.00    T
4 -5.50    S
5 -5.28    S
6 -5.25    C

Here, the regex used is maximally frugal in that it simply splits the strings into two capturing groups by means of the last comma (the first comma is matched by . in the first capture group).

Data: (thanks to zephryl):

dat <- data.frame(
    x = c("-5,50,D", "-5,50,S", "0,00,T", "-5,50,S", "-5,28,S", "-5,25,C")
  )