beginner here! I have a dataframe with over 600 columns. I want to apply some simple operations to specific columns that are all named similarly (see example below) but spread out throughout the dataframe. How can I build a loop for all these columns based on their names and the ascending numbers at the end of the column name?
Dataframe example:
Name Col_x_1 Company Col_x_2 Start_Year End_Year Col_x_3
asd 4 Col Test 2 Col 1902 1933 1 Col
kfj 5 Cols Test_2 10 Col 1933 1954 0
ale 0 Test_3 11 Cols 1988 1999 5 Col
...
Output example: I want to remove the string following the space behind the value in the columns named "Col_x_1, Col_x_2, ..." (going up until 56) and turn the columns into dbl.
Name Col_x_1 Company Col_x_2 Start_Year End_Year Col_x_3
asd 4 Test 2 1902 1933 1
kfj 5 Test_2 10 1933 1954 0
ale 0 Test_3 11 1988 1999 5
...
How can I build a loop to perform this simple operation? Thank you in advance!
CodePudding user response:
You could use tidyverse
:
library(tidyverse)
df %>%
mutate(across(starts_with("Col_x"), ~str_extract(., "^\\d ") %>% as.numeric()))
This returns
# A tibble: 3 × 7
Name Col_x_1 Company Col_x_2 Start_Year End_Year Col_x_3
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 asd 4 Test 2 1902 1933 1
2 kfj 5 Test_2 10 1933 1954 0
3 ale 0 Test_3 11 1988 1999 5
Or, depending on your actual data
df %>%
mutate(across(starts_with("Col_x"), ~str_remove_all(., "(?<=\\d)\\s [A-z] ") %>% as.numeric()))
The first one extracts the starting number (assuming it's an integer) and removes the remaining content. The second one looks for a number followed by a space and some letters and removes them.
Finally both convert the remaining part into a numeric.
We use starts_with()
to select the columns by a names pattern. There are other possibilities like ends_with()
, matches()
or contains()
. The selection of the appropriate function depends on the actual structure of the names.
CodePudding user response:
If you must use a loop, you could use grep
to get the columns names that contains "Col_x_" in the beginning and gsub
to remove the words that you want
names_dat <- grep("^Col_x_",names(dat), value = TRUE)
for (i in names_dat) {
dat[[i]] <- as.numeric(gsub("Cols?","", dat[[i]]))
}