I have a column in a dataset that lists all of the softwares that a given computer has installed. I have created multiple binary columns from this column so each software has its own column. My R code is below:
data <- data %>%
mutate(MS_Office_installed = ifelse(grepl("MS Office", installed_software), 1, 0),
Adobe_Acrobat_installed = ifelse(grepl("Adobe Acrobat", installed_software), 1, 0),
Slack_installed = ifelse(grepl("Slack", installed_software), 1, 0),
Mathcard_installed = ifelse(grepl("Mathcard", installed_software), 1, 0),
Google_Chrome_installed = ifelse(grepl("Google Chrome", installed_software), 1, 0))
How can I duplicate this in Python? Some observations have no softwares installed and have NaN
CodePudding user response:
You may use str.contains
here. For example:
df["MS_Office_installed"] = df["installed_software"].str.contains(r'\bMS Office\b', regex=True).astype(int)
Use similar logic for the other desired boolean columns.