I have a data frame that looks like this. I want inside the dplyr pipeline to replace only the 7 first rows of the threshold column with the values that come from the manufacturer column.
library(tidyverse)
mpg %>%
arrange(cty) %>%
mutate(threshold=NA) %>%
select(manufacturer,cty, threshold)
#> # A tibble: 234 × 3
#> manufacturer cty threshold
#> <chr> <int> <lgl>
#> 1 dodge 9 NA
#> 2 dodge 9 NA
#> 3 dodge 9 NA
#> 4 dodge 9 NA
#> 5 jeep 9 NA
#> 6 chevrolet 11 NA
#> 7 chevrolet 11 NA
#> 8 chevrolet 11 NA
#> 9 dodge 11 NA
#> 10 dodge 11 NA
#> # … with 224 more rows
Created on 2022-08-31 with reprex v2.0.2
I want my data to look like this
#> manufacturer cty threshold
#> <chr> <int> <lgl>
#> 1 dodge 9 dodge
#> 2 dodge 9 dodge
#> 3 dodge 9 dodge
#> 4 dodge 9 dodge
#> 5 jeep 9 jeep
#> 6 chevrolet 11 chevrolet
#> 7 chevrolet 11 chevrolet
#> 8 chevrolet 11 NA
#> 9 dodge 11 NA
#> 10 dodge 11 NA
any help and guidance as always are highly appreciated
CodePudding user response:
USe case_when
to create a logical condition with row_number()
for replacement. In addition, there is no need to create a blank column i.e. the NAs
can be filled by default in case_when
library(dplyr)
library(ggplot2)
mpg %>%
arrange(cty) %>%
select(manufacturer, cty) %>%
mutate(threshold = case_when(row_number() < 7 ~manufacturer))
-output
# A tibble: 234 × 3
manufacturer cty threshold
<chr> <int> <chr>
1 dodge 9 dodge
2 dodge 9 dodge
3 dodge 9 dodge
4 dodge 9 dodge
5 jeep 9 jeep
6 chevrolet 11 chevrolet
7 chevrolet 11 <NA>
8 chevrolet 11 <NA>
9 dodge 11 <NA>
10 dodge 11 <NA>
# … with 224 more rows
CodePudding user response:
Here is version using row_number()
with ifelse
and %in%
:
mpg %>%
arrange(cty) %>%
mutate(threshold=NA) %>%
select(manufacturer,cty, threshold) %>%
mutate(threshold = ifelse(row_number() %in% 1:7, manufacturer, threshold))
manufacturer cty threshold
<chr> <int> <chr>
1 dodge 9 dodge
2 dodge 9 dodge
3 dodge 9 dodge
4 dodge 9 dodge
5 jeep 9 jeep
6 chevrolet 11 chevrolet
7 chevrolet 11 chevrolet
8 chevrolet 11 NA
9 dodge 11 NA
10 dodge 11 NA
# ... with 224 more rows
# i Use `print(n = ...)` to see more rows
CodePudding user response:
Here's yet another version. Similar to the ones above, but using the function rowid_to_column()
.
library(dplyr)
mpg %>%
arrange(cty) %>%
select(manufacturer, cty) %>%
rowid_to_column() %>%
mutate(threshold = ifelse(rowid < 8, manufacturer, NA))
# A tibble: 234 x 4
rowid manufacturer cty threshold
<int> <chr> <int> <chr>
1 1 dodge 9 dodge
2 2 dodge 9 dodge
3 3 dodge 9 dodge
4 4 dodge 9 dodge
5 5 jeep 9 jeep
6 6 chevrolet 11 chevrolet
7 7 chevrolet 11 chevrolet
8 8 chevrolet 11 NA
9 9 dodge 11 NA
10 10 dodge 11 NA
# ... with 224 more rows
# i Use `print(n = ...)` to see more rows