Home > database >  SQL fill column if all values missing for a group
SQL fill column if all values missing for a group

Time:04-12

I would like to reproduce following in SQL what I have in R. So I have grouped my data with date, order and route. Then I look if there is NAs inside that group. If all are missing, I will insert value from route to test, otherwise it will take the value from tag if not all are missing for that group.

I would like to do this in SQL, how will I achieve same result there? So starting with test dataframe and how can I add test column, where it will get value from route, if all are NA in date, order, route group and from tag otherwise.

test %>% 
  group_by(date, order, route) %>% 
  mutate(test= case_when(all(is.na(tag)) ~ route, TRUE ~ tag))

# A tibble: 10 x 5
# Groups:   date, order, route [2]
   date       order route tag   test 
   <chr>      <chr> <chr> <chr> <chr>
 1 2022-01-01 A     AB    NA    AB   
 2 2022-01-01 A     AB    NA    AB   
 3 2022-01-01 A     AB    NA    AB   
 4 2022-01-01 A     AB    NA    AB   
 5 2022-01-01 A     AB    NA    AB   
 6 2022-01-01 A     AC    ABC   ABC  
 7 2022-01-01 A     AC    ABC   ABC  
 8 2022-01-01 A     AC    ABC   ABC  
 9 2022-01-01 A     AC    ABC   ABC  
10 2022-01-01 A     AC    ABC   ABC 

DATA:

test <- tibble(date = rep('2022-01-01', 10),
               order = rep('A', 10),
               route = c(rep('AB', 5), rep('AC', 5)),
               tag = c(rep(NA, 5), rep("ABC", 5)))

CodePudding user response:

Here we have an example of what I understand that you are wanting to do. As you have not specified I have included 2 columns, one where the value of tag is only taken for the specific line and the other where it is taken for all the group.

create table test (
"date" date,
"order" varchar(10),
route varchar(10),
tag varchar(10),
test varchar(10));
insert into test values
('2022-01-01','A','AB','NA','AB'), 
('2022-01-01','A','AB','NA','AB'), 
('2022-01-01','A','AB','NA','AB'), 

('2022-01-01','B','BB','NA','AB'),  
('2022-01-01','B','BB','NA','AB'),
('2022-01-01','B','BB','this','AB'),

('2022-01-01','A','AC','ABC','ABC'),
('2022-01-01','A','AC','ABC','ABC'),
('2022-01-01','A','AC','ABC','ABC');
GO

9 rows affected

with groups as (
select 
  route,
  max(tag) as tag,
  count(tag) all_tag,
  sum(case when tag = 'NA' then 1 else 0 end) nas
from test 
group by route) 
select
  t."date",
  t."order",
  t.route,
  t.tag, 
  t.test,
  g.all_tag,
  g.nas,
  case when g.all_tag = g.nas then t.route else t.tag end as calc_1,
  case when g.all_tag = g.nas then t.route else g.tag end as calc_2
from test t
join groups g 
on t.route = g.route
order by
  t."date",t."order",t.route
GO
date       | order | route | tag  | test | all_tag | nas | calc_1 | calc_2
:--------- | :---- | :---- | :--- | :--- | ------: | --: | :----- | :-----
2022-01-01 | A     | AB    | NA   | AB   |       3 |   3 | AB     | AB    
2022-01-01 | A     | AB    | NA   | AB   |       3 |   3 | AB     | AB    
2022-01-01 | A     | AB    | NA   | AB   |       3 |   3 | AB     | AB    
2022-01-01 | A     | AC    | ABC  | ABC  |       3 |   0 | ABC    | ABC   
2022-01-01 | A     | AC    | ABC  | ABC  |       3 |   0 | ABC    | ABC   
2022-01-01 | A     | AC    | ABC  | ABC  |       3 |   0 | ABC    | ABC   
2022-01-01 | B     | BB    | NA   | AB   |       3 |   2 | NA     | this  
2022-01-01 | B     | BB    | NA   | AB   |       3 |   2 | NA     | this  
2022-01-01 | B     | BB    | this | AB   |       3 |   2 | this   | this  

db<>fiddle here

  • Related