I have some raw data from an instrument that I need to get into shape, so I can work with it. I guess it's an easy fix, but I need some help.
The df looks like this:
date <-c("Date1","Date2","Date3")
data <-c("1,234,567,2345;2,345,5677,256;3,576,345,3456", "1,564,567,2345;2,745,5677,256;3,577,345,8456", "1,234,567,2345;2,345,5677,256,;3,555,345,3456;....")
df<-data.frame(date, data)
So for every date there are size classes (1,2 or 3..) with three corresponding measurements/values, separated by a ",". The different size classes are separated by a ";". Now i would like to access the first value per size class and transfer it into a new df. So that should look like:
date<-c("Date1","Date2","Date3")
data_sizeclass_1<-c(234,564,234)
data_sizeclass_2<-c(345,745,345)
data_sizeclass_3<-c(576,577,555)
df<-data.frame(date,data_sizeclass_1,data_sizeclass_2,data_sizeclass_3)
I hope that it makes sense.
So far I managed to separate the data column into separate columns (with cSplit). I would be able to assemble the columns into a new df, but then I would have to choose each column manually, and since I have more than 200 size classes that would be a lot of work. I would like to find a solution to directly extract only one of the measurements into a new df. Thanks for your help.
CodePudding user response:
library(data.table)
#set df to data.table
dt <- as.data.table(df)
#find the amount of columns which need to be created (max nr of different sizeclasses)
colnr <- max(lengths(strsplit(df$data, ";")))
#create columns for each sizeclass
for(i in 1:colnr ){
dt[, paste0("data_sizeclass_", i) := sapply(strsplit(data, ";"), `[`, i)]
}
dt
date data data_sizeclass_1 data_sizeclass_2 data_sizeclass_3
1: Date1 1,234,567,2345;2,345,5677,256;3,576,345,3456; 1,234,567,2345 2,345,5677,256 3,576,345,3456
2: Date2 1,564,567,2345;2,745,5677,256;3,577,345,8456; 1,564,567,2345 2,745,5677,256 3,577,345,8456
3: Date3 1,234,567,2345;2,345,5677,256,;3,555,345,3456; 1,234,567,2345 2,345,5677,256, 3,555,345,3456
In the second part we use strsplit to extract the second element of data
#pivot longer
dt <- melt(dt, measure.vars = paste0("data_sizeclass_", 1:colnr))
#extract second element of strsplit, using , as separator
dt[, value := sapply(strsplit(value, ","), `[`, 2)]
#pivot wider (to original shape)
dt <- dcast(dt, date data ~ variable, value.var = "value")
dt
date data data_sizeclass_1 data_sizeclass_2 data_sizeclass_3
1: Date1 1,234,567,2345;2,345,5677,256;3,576,345,3456; 234 345 576
2: Date2 1,564,567,2345;2,745,5677,256;3,577,345,8456; 564 745 577
3: Date3 1,234,567,2345;2,345,5677,256,;3,555,345,3456; 234 345 555
CodePudding user response:
Maybe you can try the base R code below
with(
df,
cbind(
df,
`colnames<-`(
t(sapply(
strsplit(data, ";"),
function(x) as.numeric(regmatches(x, regexpr("(?<=,)\\d ", x, perl = TRUE)))
)),
paste0("data_sizeclass_", seq(nrow(df)))
)
)
)
which gives
date data data_sizeclass_1
1 Date1 1,234,567,2345;2,345,5677,256;3,576,345,3456 234
2 Date2 1,564,567,2345;2,745,5677,256;3,577,345,8456 564
3 Date3 1,234,567,2345;2,345,5677,256,;3,555,345,3456;.... 234
data_sizeclass_2 data_sizeclass_3
1 345 576
2 745 577
3 345 555
CodePudding user response:
Does this work?
library(tidyr)
library(dplyr)
library(stringr)
df %>%
# first get rid of digit at string beginning and, respectively, after semi-colon:
mutate(data = gsub("^\\d,|(?<=;)\\d,", "", data, perl = TRUE)) %>%
# separate into columns using semi-colon as separator:
separate(data,
into = c("data_sizeclass_1", "data_sizeclass_2", "data_sizeclass_3"),
sep = ";",
convert = TRUE) %>%
# finally extract the initial numbers in the new columns:
mutate(across(c(-date), ~ str_extract(., "\\d ")))
date data_sizeclass_1 data_sizeclass_2 data_sizeclass_3
1 Date1 234 345 576
2 Date2 564 745 577
3 Date3 234 345 555