Python to R Translation-CodePudding

I have a couple of lines of code in Python that I am trying to replicate in R, but I'm admittedly not skilled enough at this point to figure it out.

Here's the code in Python:

import pandas as pd
df = pd.DataGram ({'col_a' : ["blue shovel 1024", "red shovel 1022", "green bucket 3021", "green rake 3021", 
"yellow shovel 1023"], 'col_b' : ["blue", "red", "green", "blue", "yellow"]},

columns = ["col_a", "col_b"])

unique_words = list(df.col_b.unique())
unique
["blue", "red", "green", "yellow"]

df['result] = df['col_a'].apply(lambda x:','.join([item for item in str(x).split () \
                                                  if item in unique_words]))

Results of running the code above gives you this:

    col_a                      col_b          result
1   blue shovel 1024           blue           blue
2   red shovel 1022            red            red
3   green buckets 3021         green          green
4   green rake 3021            blue           green
5   yellow shovel 1023         yellow         yellow

The goal of this code is to make a list of unique values in col_b and then search for any of those unique values in col_a and if it find them, place them in the result column. Note that in row 4, the result is green. This is correct because even though col_b shows a value of blue for row 4, the actual value in col_a is green.

I've tried rewriting this section:

df['result] = df['col_a'].apply(lambda x:','.join([item for item in str(x).split () \
                                                  if item in unique_words]))

in R (my thought was to write a function and try an lapply(), but either I'm doing it wrong or that's not the right approach. Thank you in advance for any suggestions or help, and I'll check back to see if there are any questions I can answer or information I can help clarify. Thank you again!

CodePudding user response：

library(tidyverse)

df <- tibble(
  col_a = c("blue shovel 1024", "red shovel 1022", "green bucket 3021", "green rake 3021", "yellow shovel 1023"),
  col_b = c("blue", "red", "green", "blue", "yellow")
)
df
#> # A tibble: 5 x 2
#>   col_a              col_b 
#>   <chr>              <chr> 
#> 1 blue shovel 1024   blue  
#> 2 red shovel 1022    red   
#> 3 green bucket 3021  green 
#> 4 green rake 3021    blue  
#> 5 yellow shovel 1023 yellow

unique_words <- unique(df$col_b)
unique_words
#> [1] "blue"   "red"    "green"  "yellow"
unique_words_regex <- unique_words %>% paste0(collapse = "|")

df <- mutate(df, result = col_a %>% str_extract(unique_words_regex))
df
#> # A tibble: 5 x 3
#>   col_a              col_b  result
#>   <chr>              <chr>  <chr> 
#> 1 blue shovel 1024   blue   blue  
#> 2 red shovel 1022    red    red   
#> 3 green bucket 3021  green  green 
#> 4 green rake 3021    blue   green 
#> 5 yellow shovel 1023 yellow yellow

^{Created on 2021-12-15 by the reprex package (v2.0.1)}