I have a table of people records with various demographic information (Race, Ethnicity, Gender etc.).
For null
values in [Ethnicity] (i.e., Hispanic Y/N), I want to search the corresponding [Race] value for a string "non-Hispanic", since for some records these have been stored as a combined value under [Race] (e.g., "White (non-Hispanic)"
). and I'd like to clean/normalize both fields ([Race] is cleaned in a separate downstream step).
However, I'm unsure why my code is not successfully identifying matches to the first two conditions, since I know there are many instances of "White (non-Hispanic)"
at the very least:
cleanData =
Table.ReplaceValue(rawData, each [Ethnicity], each
if [Ethnicity] = null and (
Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or
Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase))
then "Non-hispanic" else
if [Ethnicity] = null and
Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)
then "Hispanic" else
[Ethnicity], Replacer.ReplaceText, {"Ethnicity"}
),
Both fields are type Text, and I'm not hitting an error - just a lack of expected behavior. The null
values in [Ethnicity] are unchanged.
Sample input:
Race | Ethnicity |
---|---|
White | Yes |
Asian | No |
White (non-Hispanic) | Decline to respond |
White (non-Hispanic) | null |
White (Hispanic) | null |
Asian | null |
Sample output:
Race | Ethnicity |
---|---|
White | Yes |
Asian | No |
White (non-Hispanic) | Decline to Respond |
White (non-Hispanic) | No |
White (Hispanic) | Yes |
Asian | null |
CodePudding user response:
Its the Replacer.ReplaceText which should be Replacer.ReplaceValue
cleanData = Table.ReplaceValue(rawData, each [Ethnicity], each
if [Ethnicity] = null then
if (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase))
then "Non-hispanic" else
if Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) then "Hispanic" else [Ethnicity]
else [Ethnicity]
,Replacer.ReplaceValue,{"Ethnicity"}),
or
cleanData = Table.ReplaceValue(rawData, each [Ethnicity], each
if [Ethnicity] = null and
(Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase) )
then "Non-hispanic" else
if [Ethnicity] = null and Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) then "Hispanic" else [Ethnicity]
,Replacer.ReplaceValue,{"Ethnicity"}),
CodePudding user response:
@horseyride you can abstract it to use a list of any size
Text.ContainsAny = (source as text, searchStrings as list) as logical =>
let
matches = List.Transform( searchStrings, (string) =>
Text.Contains(source, string, Comparer.OrdinalIgnoreCase) )
in
List.AnyTrue(matches),
then you can write
if [Ethnicity] = null and Text.ContainsAny( [Race], {"non-hispanic", "not-hispanic"} ) then "Non-hispanic"
else ...
I here's another way using Table.AddColumn
, then rename. That might sound worse than using replacing a column, in-place. After testing, I was getting the same final folded SQL query -- using either method.
let
Source = Table.FromRows(
Json.Document( Binary.Decompress( Binary.FromText( "i45WCs/ILElVUNJRikwtVorViVZyLM5MzAMJ OWD RAFGnn5eboemcUFiXmZyZogaZfU5JzMvFSFknyFotTigvy8FLzKkSUxJeCWoppRgm5GLAA=", BinaryEncoding.Base64 ), Compression.Deflate ) ),
let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Race = _t, Ethnicity = _t]
),
// eliminate whitespace, or control chars
#"Changed Type" = Table.TransformColumnTypes( Source, {{"Race", type text}, {"Ethnicity", type nullable text}} ),
#"Cleaned Text" = Table.TransformColumns( Source, {{"Ethnicity", Text.Clean, type nullable text}}),
#"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "", null, Replacer.ReplaceValue, {"Ethnicity"} ),
merge_columns = Table.AddColumn(
#"Changed Type",
"Ethnicity2",
(row) =>
let
isBlank = row[Ethnicity] = null or row[Ethnicity] = "",
race = row[Race],
replacement =
if (
Text.Contains(race, "non-Hispanic", Comparer.OrdinalIgnoreCase)
or Text.Contains(race, "not-Hispanic", Comparer.OrdinalIgnoreCase)
) then
"Non-Hispanic"
else if Text.Contains(race, "Hispanic", Comparer.OrdinalIgnoreCase) then
"Hispanic"
else
race
in
if isBlank then
replacement
else
race,
type text
),
#"ReplaceColumns" = Table.RenameColumns(
Table.RemoveColumns(merge_columns, {"Ethnicity"}), {{"Ethnicity2", "Ethnicity"}}
)
in
#"ReplaceColumns"