I have a dataframe in which I have merged rows according to certain variables. This has worked well, but I now have the problem that for some character variables, the values are duplicates.
I have two values, either "Con" or "Lab" and now have rows (which were merged) that now show "ConCon" or "LabLabLab".
My question is how do I recode these values? Ideally I need a command where a value containing "Lab" (e.g. "LabLabLabLab") is turned into Lab.
Any input would be greatly appreciated. Thank you!
CodePudding user response:
In R:
df <- data.frame(id = 1:5, party = c("Con", "ConCon", "LabLabLab", "LabLabLabLab", "ConConCon"))
df$party <- gsub("^(Con|Lab).*", "\\1", df$party)
df
## id party
## 1 1 Con
## 2 2 Con
## 3 3 Lab
## 4 4 Lab
## 5 5 Con
CodePudding user response:
Assuming you can't pass the "LabCon" case, you can do:
legal_words = ["Con", "Lab"]
to_change_words = ["Con", "ConCon", "LabLabLab", "LabLab", "Lab"]
for i,word in enumerate(to_change_words):
for legal in legal_words:
if legal in word:
to_change_words[i] = legal
print(to_change_words)
And this will output
['Con', 'Con', 'Lab', 'Lab', 'Lab']