I have a couple of lines of code in Python that I am trying to replicate in R, but I'm admittedly not skilled enough at this point to figure it out.
Here's the code in Python:
import pandas as pd
df = pd.DataGram ({'col_a' : ["blue shovel 1024", "red shovel 1022", "green bucket 3021", "green rake 3021",
"yellow shovel 1023"], 'col_b' : ["blue", "red", "green", "blue", "yellow"]},
columns = ["col_a", "col_b"])
unique_words = list(df.col_b.unique())
unique
["blue", "red", "green", "yellow"]
df['result] = df['col_a'].apply(lambda x:','.join([item for item in str(x).split () \
if item in unique_words]))
Results of running the code above gives you this:
col_a col_b result
1 blue shovel 1024 blue blue
2 red shovel 1022 red red
3 green buckets 3021 green green
4 green rake 3021 blue green
5 yellow shovel 1023 yellow yellow
The goal of this code is to make a list of unique values in col_b and then search for any of those unique values in col_a and if it find them, place them in the result column. Note that in row 4, the result is green. This is correct because even though col_b shows a value of blue for row 4, the actual value in col_a is green.
I've tried rewriting this section:
df['result] = df['col_a'].apply(lambda x:','.join([item for item in str(x).split () \
if item in unique_words]))
in R (my thought was to write a function and try an lapply(), but either I'm doing it wrong or that's not the right approach. Thank you in advance for any suggestions or help, and I'll check back to see if there are any questions I can answer or information I can help clarify. Thank you again!
CodePudding user response:
library(tidyverse)
df <- tibble(
col_a = c("blue shovel 1024", "red shovel 1022", "green bucket 3021", "green rake 3021", "yellow shovel 1023"),
col_b = c("blue", "red", "green", "blue", "yellow")
)
df
#> # A tibble: 5 x 2
#> col_a col_b
#> <chr> <chr>
#> 1 blue shovel 1024 blue
#> 2 red shovel 1022 red
#> 3 green bucket 3021 green
#> 4 green rake 3021 blue
#> 5 yellow shovel 1023 yellow
unique_words <- unique(df$col_b)
unique_words
#> [1] "blue" "red" "green" "yellow"
unique_words_regex <- unique_words %>% paste0(collapse = "|")
df <- mutate(df, result = col_a %>% str_extract(unique_words_regex))
df
#> # A tibble: 5 x 3
#> col_a col_b result
#> <chr> <chr> <chr>
#> 1 blue shovel 1024 blue blue
#> 2 red shovel 1022 red red
#> 3 green bucket 3021 green green
#> 4 green rake 3021 blue green
#> 5 yellow shovel 1023 yellow yellow
Created on 2021-12-15 by the reprex package (v2.0.1)