I have the following problem. I have to split the text below into separate columns. The data is extracted by web scraping and I need to transform it to analyze it. As an example I have copied one row from which I will only need "id":357 and "slug":"journalism/audio" as an information. Do you know how can I transform it in R? The code below is from df column:
{"id":357,"name":"Audio","analytics_name":"Audio","slug":"journalism/audio","position":1,"parent_id":13,"parent_name":"Journalism","color":1228010,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/journalism/audio"}}}
A screenshot sample table of the data I want to transform
CodePudding user response:
Going off of the one line of data you've supplied here, something like this perhaps?
library(magrittr)
library(stringr)
library(tidyr)
library(dplyr)
#Toy data.
df <- data.frame(category = '{"id":357,"name":"Audio","analytics_name":"Audio","slug":"journalism/audio","position":1,"parent_id":13,"parent_name":"Journalism","color":1228010,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/journalism/audio"}}}')
df[2, ] <- df[1, ]
df %>%
mutate(ucol = row_number()) %>%
separate_rows(category, sep = ",") %>%
mutate(category = str_replace_all(category, '[\\"\\{\\}]', "")) %>%
filter(str_detect(category, "^id|^slug")) %>%
separate(category, sep = ":", into = c("key", "val")) %>%
pivot_wider(names_from = key, values_from = val)
# # A tibble: 2 × 3
# ucol id slug
# <int> <chr> <chr>
# 1 1 357 journalism/audio
# 2 2 357 journalism/audio
CodePudding user response:
Starting with a string like this
stri
[1] "\"id\":357,\"name\":\"Audio\",\"analytics_name\":\"Audio\",\"slug\":\"journalism/audio\",\"position\":1,\"parent_id\":13,\"parent_name\":\"Journalism\",\"color\":1228010,\"urls\":{\"web\":{\"discover\":\"http://www.kickstarter.com/discover/categories/journalism/audio"
First strsplit
the string into chunks by comma and remove the quotes
d <- gsub( "\"","", strsplit(stri, ",")[[1]] )
[1] "id:357"
[2] "name:Audio"
[3] "analytics_name:Audio"
[4] "slug:journalism/audio"
[5] "position:1"
[6] "parent_id:13"
[7] "parent_name:Journalism"
[8] "color:1228010"
[9] "urls:{web:{discover:http://www.kickstarter.com/discover/categories/journalism/audio"
Finally build the dataframe
dat <- data.frame( strsplit( d[grep("^id|^slug",d)], ":" ) )[2,]
colnames( dat ) <- data.frame( strsplit( d[grep("^id|^slug",d)], ":" ) )[1,]
dat
id slug
2 357 journalism/audio
Data
stri <- "\"id\":357,\"name\":\"Audio\",\"analytics_name\":\"Audio\",\"slug\":\"journalism/audio\",\"position\":1,\"parent_id\":13,\"parent_name\":\"Journalism\",\"color\":1228010,\"urls\":{\"web\":{\"discover\":\"http://www.kickstarter.com/discover/categories/journalism/audio"