Add new columns based on values in other columns-CodePudding

I am struggling to convert the following line of code into r.

for genre in c_a:
    df['is_' str(genre)] = df['genre'].apply(lambda x: genre in [y.strip() for y in x.split(',')])

basically, I have a object (type "character", with 1341 values in it), and I'd like to add new columns of each value of the variable, and also asign 0/1 value to the new column by checking if the new column is included in the genre column.

For example:

Current Input:

Genre
dance pop, pop
country, pop

Expected Output:

Genre	dance pop	pop	country
dance pop, pop	1	1	0
country, pop	0	1	1

I am not familiar with apply and lambda function in R. I only know how to solve the problem through a for loop, which is slow.

CodePudding user response：

Python:

import pandas as pd

df = pd.DataFrame({"Genre": ["Dance pop, pop", "country, pop"]})
for col in set(sum([i.split(',') for i in df['Genre']],[])):          ##['Dance pop', ' pop', 'country', ' pop']
    df[col] = df['Genre'].apply(lambda x: 1 if col in x.split(',') else 0)
df

CodePudding user response：

You could use a tidyverse approach, but I doubt it would speed things up. Suppose your data is stored in a vector genre:

library(tidyverse)

genre <- c("dance pop, pop", "country, pop")

genre %>% 
  data.frame(genre = .) %>% 
  expand_grid(genres = unique(trimws(unlist(strsplit(genre, ","))))) %>% 
  mutate(value =  str_detect(genre, genres)) %>% 
  pivot_wider(names_from = genres)

This returns

# A tibble: 2 x 4
  genre          `dance pop`   pop country
  <chr>                <int> <int>   <int>
1 dance pop, pop           1     1       0
2 country, pop             0     1       1

First we create a data.frame with a new genres column, that contains all unique genres extracted from the genre vector.
Next we look for a match between the genres and the genre column, converting it into a binary value.
Finally we bring it into a rectangular shape using pivot_wider.