So I have a .csv file with thousands of rows that have duplicates area names in column A and "Completed" values on column B (which can be "Completed" or "In Progress" in the same area).
Area | Completed |
---|---|
Chicago | In Progress |
Chicago | Completed |
Chicago | In Progress |
Chicago | In Progress |
San Francisco | Completed |
San Francisco | Completed |
San Francisco | Completed |
San Francisco | Completed |
Los Angeles | In Progress |
Los Angeles | In Progress |
Los Angeles | In Progress |
Los Angeles | In Progress |
I need to make it so that the end product is the following
Area | Completed |
---|---|
Chicago | Particularly Completed |
San Francisco | Completed |
Los Angeles | In Progress |
The idea is to remove the duplicate area values and have the column B be determined by the original values with the following methodology:
- if all of the values in an area are "Completed" then column B is Completed
- if all of the values in an area are "In Progress" then column B is "In Progress"
- if one area contains values "In Progress" and "Completed" then the column B is Particularly Completed
So far I've thought about using a python script for this, but want to know if doing this would be possible with just excel as well?
CodePudding user response:
Formula I have used in D2
cell
=UNIQUE(A2:A13)
Then in E2
cell
=IF(COUNTA(UNIQUE(FILTER($B$2:$B$13,$A$2:$A$13=D2)))>1,"Particularly Complete",UNIQUE(FILTER($B$2:$B$13,$A$2:$A$13=D2)))
and drag down till need.
CodePudding user response:
A correct answer has already been indicated. Anyway, if one doesn't have the UNIQUE
or the FILTER
function, here is another solution:
A | B | C | D | E |
---|---|---|---|---|
Area | Completed | |||
Chicago | In Progress | =IF(COUNTIF($A$2:A2,A2)>1,"",A2) | =IFERROR(INDEX(C:C,AGGREGATE(15,6,ROW(C:C)/(C:C<>""),ROW(D2)-ROW($D$2) 1)),"") | =IF(D2="","",IF(COUNTIF(A:A,D2)=COUNTIFS(A:A,D2,B:B,"Completed"),"Completed",IF(COUNTIF(A:A,D2)=COUNTIFS(A:A,D2,B:B,"In Progress"),"In Progress",IF(AND(COUNTIFS(A:A,D2,B:B,"Completed")>0,COUNTIFS(A:A,D2,B:B,"In Progress")>0),"Particularly Completed","Unpredicted Outcome")))) |
Chicago | Completed | |||
Chicago | In Progress | |||
Chicago | In Progress | |||
San Francisco | Completed | |||
San Francisco | Completed | |||
San Francisco | Completed | |||
San Francisco | Completed | |||
Los Angeles | In Progress | |||
Los Angeles | In Progress | |||
Los Angeles | In Progress | |||
Los Angeles | In Progress |
Formulas in column C, D and E has to be dragged down. Column C filters the Area list leaving only unique areas. Column D sort the list of unique areas of column C. Column E gives the actual response. Once could easily obtain a list of unique areas with little work (which could be better if the list is really long), so the "real deal" is just the formula in column E which work on said list.