I am writing a function to prepare a data frame in R to be used later in a regression. I want to rename any column which contains the word distance. Specifically, I want to drop the first descriptive word previous to distance. (So this would include both a word and a period before the start of the word distance).
I have:
country.distance.median country.distance.mean population life.q state.distance.mean
210 189 10000 0.6. 100
3100 2100 20000 0.7. 300
37 36 500 0.3 10
I would like:
distance.median distance.mean population life.q distance.mean
210 189 10000 0.6 100
3100 2100 20000 0.7 300
37 36 500 0.3 10
Because this will be contained in a function, the number and position of columns is variable, so I need a solution which is not reliant on column position. Note that it should not change the column name "life.q", and so the solutions needs to be able to likewise recognize and select columns based on the distance string. Note that the word in front of distance may change as well (for example, the column 'state.distance.mean').
(It should also have the ability to be used as an if statement within a function.)
Thank you for your time and thoughts. :)
CodePudding user response:
You may try using sub
here:
names(df) <- sub("^country\\.(?=distance\\.)", "", names(df), perl=TRUE)
df
distance.median distance.mean population life.q
1 210 189 10000 0.6
2 3100 2100 20000 0.7
3 37 36 500 0.3
More generally, to remove the first word preceded by dot, provided that there is another dot later in the word, you may try:
names(df) <- sub("^[^.] \\.(?=.*\\.)", "", names(df), perl=TRUE)