I am struggling to convert the following line of code into r.
for genre in c_a:
df['is_' str(genre)] = df['genre'].apply(lambda x: genre in [y.strip() for y in x.split(',')])
basically, I have a object (type "character", with 1341 values in it), and I'd like to add new columns of each value of the variable, and also asign 0/1 value to the new column by checking if the new column is included in the genre column.
For example:
Current Input:
Genre |
---|
dance pop, pop |
country, pop |
Expected Output:
Genre | dance pop | pop | country |
---|---|---|---|
dance pop, pop | 1 | 1 | 0 |
country, pop | 0 | 1 | 1 |
I am not familiar with apply and lambda function in R. I only know how to solve the problem through a for loop, which is slow.
CodePudding user response:
Python:
import pandas as pd
df = pd.DataFrame({"Genre": ["Dance pop, pop", "country, pop"]})
for col in set(sum([i.split(',') for i in df['Genre']],[])): ##['Dance pop', ' pop', 'country', ' pop']
df[col] = df['Genre'].apply(lambda x: 1 if col in x.split(',') else 0)
df
CodePudding user response:
You could use a tidyverse
approach, but I doubt it would speed things up. Suppose your data is stored in a vector genre
:
library(tidyverse)
genre <- c("dance pop, pop", "country, pop")
genre %>%
data.frame(genre = .) %>%
expand_grid(genres = unique(trimws(unlist(strsplit(genre, ","))))) %>%
mutate(value = str_detect(genre, genres)) %>%
pivot_wider(names_from = genres)
This returns
# A tibble: 2 x 4
genre `dance pop` pop country
<chr> <int> <int> <int>
1 dance pop, pop 1 1 0
2 country, pop 0 1 1
- First we create a data.frame with a new
genres
column, that contains all unique genres extracted from thegenre
vector. - Next we look for a match between the
genres
and thegenre
column, converting it into a binary value. - Finally we bring it into a rectangular shape using
pivot_wider
.