Home > other >  Unsure why conditional column transform is not identifying values to update (Text.Contains)
Unsure why conditional column transform is not identifying values to update (Text.Contains)

Time:07-02

I have a table of people records with various demographic information (Race, Ethnicity, Gender etc.).

For null values in [Ethnicity] (i.e., Hispanic Y/N), I want to search the corresponding [Race] value for a string "non-Hispanic", since for some records these have been stored as a combined value under [Race] (e.g., "White (non-Hispanic)"). and I'd like to clean/normalize both fields ([Race] is cleaned in a separate downstream step).

However, I'm unsure why my code is not successfully identifying matches to the first two conditions, since I know there are many instances of "White (non-Hispanic)" at the very least:

cleanData = 
    Table.ReplaceValue(rawData, each [Ethnicity], each
        if [Ethnicity] = null and (
            Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or 
            Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
            then "Non-hispanic" else
        if [Ethnicity] = null and 
            Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) 
            then "Hispanic" else
        [Ethnicity], Replacer.ReplaceText, {"Ethnicity"}
        ),

Both fields are type Text, and I'm not hitting an error - just a lack of expected behavior. The null values in [Ethnicity] are unchanged.

Sample input:

Race Ethnicity
White Yes
Asian No
White (non-Hispanic) Decline to respond
White (non-Hispanic) null
White (Hispanic) null
Asian null

Sample output:

Race Ethnicity
White Yes
Asian No
White (non-Hispanic) Decline to Respond
White (non-Hispanic) No
White (Hispanic) Yes
Asian null

CodePudding user response:

Its the Replacer.ReplaceText which should be Replacer.ReplaceValue

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null then 
        if (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
        then "Non-hispanic" else 
        if  Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
    else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

or

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null and
        (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase) )
        then "Non-hispanic" else 
         if [Ethnicity] = null  and Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

CodePudding user response:

@horseyride you can abstract it to use a list of any size

    Text.ContainsAny = (source as text, searchStrings as list) as logical =>
        let
            matches = List.Transform( searchStrings, (string) =>
                Text.Contains(source, string, Comparer.OrdinalIgnoreCase) )
        in
            List.AnyTrue(matches),

then you can write

if [Ethnicity] = null and Text.ContainsAny( [Race], {"non-hispanic", "not-hispanic"} ) then "Non-hispanic"
else ...

enter image description here

I here's another way using Table.AddColumn, then rename. That might sound worse than using replacing a column, in-place. After testing, I was getting the same final folded SQL query -- using either method.

let
    Source = Table.FromRows(
        Json.Document( Binary.Decompress( Binary.FromText( "i45WCs/ILElVUNJRikwtVorViVZyLM5MzAMJ OWD RAFGnn5eboemcUFiXmZyZogaZfU5JzMvFSFknyFotTigvy8FLzKkSUxJeCWoppRgm5GLAA=", BinaryEncoding.Base64 ), Compression.Deflate ) ),
        let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Race = _t, Ethnicity = _t]
    ),
    // eliminate whitespace, or control chars
    #"Changed Type"   = Table.TransformColumnTypes( Source, {{"Race", type text}, {"Ethnicity", type nullable text}} ),
    #"Cleaned Text"   = Table.TransformColumns( Source, {{"Ethnicity", Text.Clean, type nullable text}}),
    #"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "", null, Replacer.ReplaceValue, {"Ethnicity"} ),

    merge_columns = Table.AddColumn(
        #"Changed Type",
        "Ethnicity2",
        (row) =>
            let
                isBlank = row[Ethnicity] = null or row[Ethnicity] = "",
                race = row[Race],
                replacement =
                    if (
                        Text.Contains(race, "non-Hispanic", Comparer.OrdinalIgnoreCase)
                        or Text.Contains(race, "not-Hispanic", Comparer.OrdinalIgnoreCase)
                    ) then
                        "Non-Hispanic"
                    else if Text.Contains(race, "Hispanic", Comparer.OrdinalIgnoreCase) then
                        "Hispanic"
                    else
                        race
            in
                if isBlank then
                    replacement
                else
                    race,
        type text
    ),
    #"ReplaceColumns" = Table.RenameColumns(
        Table.RemoveColumns(merge_columns, {"Ethnicity"}), {{"Ethnicity2", "Ethnicity"}}
    )
in
    #"ReplaceColumns"
  • Related