I would like to reproduce following in SQL what I have in R. So I have grouped my data with date, order and route. Then I look if there is NAs inside that group. If all are missing, I will insert value from route to test, otherwise it will take the value from tag if not all are missing for that group.
I would like to do this in SQL, how will I achieve same result there? So starting with test dataframe and how can I add test column, where it will get value from route, if all are NA in date, order, route group and from tag otherwise.
test %>%
group_by(date, order, route) %>%
mutate(test= case_when(all(is.na(tag)) ~ route, TRUE ~ tag))
# A tibble: 10 x 5
# Groups: date, order, route [2]
date order route tag test
<chr> <chr> <chr> <chr> <chr>
1 2022-01-01 A AB NA AB
2 2022-01-01 A AB NA AB
3 2022-01-01 A AB NA AB
4 2022-01-01 A AB NA AB
5 2022-01-01 A AB NA AB
6 2022-01-01 A AC ABC ABC
7 2022-01-01 A AC ABC ABC
8 2022-01-01 A AC ABC ABC
9 2022-01-01 A AC ABC ABC
10 2022-01-01 A AC ABC ABC
DATA:
test <- tibble(date = rep('2022-01-01', 10),
order = rep('A', 10),
route = c(rep('AB', 5), rep('AC', 5)),
tag = c(rep(NA, 5), rep("ABC", 5)))
CodePudding user response:
Here we have an example of what I understand that you are wanting to do. As you have not specified I have included 2 columns, one where the value of tag is only taken for the specific line and the other where it is taken for all the group.
create table test ( "date" date, "order" varchar(10), route varchar(10), tag varchar(10), test varchar(10)); insert into test values ('2022-01-01','A','AB','NA','AB'), ('2022-01-01','A','AB','NA','AB'), ('2022-01-01','A','AB','NA','AB'), ('2022-01-01','B','BB','NA','AB'), ('2022-01-01','B','BB','NA','AB'), ('2022-01-01','B','BB','this','AB'), ('2022-01-01','A','AC','ABC','ABC'), ('2022-01-01','A','AC','ABC','ABC'), ('2022-01-01','A','AC','ABC','ABC');
GO
9 rows affected
with groups as ( select route, max(tag) as tag, count(tag) all_tag, sum(case when tag = 'NA' then 1 else 0 end) nas from test group by route) select t."date", t."order", t.route, t.tag, t.test, g.all_tag, g.nas, case when g.all_tag = g.nas then t.route else t.tag end as calc_1, case when g.all_tag = g.nas then t.route else g.tag end as calc_2 from test t join groups g on t.route = g.route order by t."date",t."order",t.route GO
date | order | route | tag | test | all_tag | nas | calc_1 | calc_2 :--------- | :---- | :---- | :--- | :--- | ------: | --: | :----- | :----- 2022-01-01 | A | AB | NA | AB | 3 | 3 | AB | AB 2022-01-01 | A | AB | NA | AB | 3 | 3 | AB | AB 2022-01-01 | A | AB | NA | AB | 3 | 3 | AB | AB 2022-01-01 | A | AC | ABC | ABC | 3 | 0 | ABC | ABC 2022-01-01 | A | AC | ABC | ABC | 3 | 0 | ABC | ABC 2022-01-01 | A | AC | ABC | ABC | 3 | 0 | ABC | ABC 2022-01-01 | B | BB | NA | AB | 3 | 2 | NA | this 2022-01-01 | B | BB | NA | AB | 3 | 2 | NA | this 2022-01-01 | B | BB | this | AB | 3 | 2 | this | this
db<>fiddle here