passing conditions through dictionary lowercase not recognized-CodePudding

In this sample data:

data = [{'source': ' Off-grid energy'},
 {'source': 'off-grid generation'},
 {'source': 'Off grid energy '},
 {'source': 'OFFGRID energy'},
 {'source': 'apple sauce'},
 {'source': 'green energy'},
 {'source': 'Green electricity '},
 {'source': 'tomato  sauce'},
 {'source': 'BIOMASS as an energy source'},
 {'source': 'produced heat (biogas).'}]

I want to create a new column based on conditions:

my_conditions = {
    "green": df["source"].str.contains("green"),
    "bio-gen": df["source"].str.contains("bio"),
    "off-grid": df["source"].str.contains("off-grid")
}

I preprocess by lowercasing df["source"]:

df['source'] = df["source"].str.lower()

Then using Numpy's select:

df['category-lower'] = np.select(my_conditions.values(),\
                           my_conditions.keys(),\
                           default="other")

I can't figure out why the lowercasing is not recognized (see row 0, 6, 8)

CodePudding user response：

You've probably applied .str.lower() after the my_condition was constructed. Try instead:

import re

# apply .str.lower() here, or use flags=re.I (ignorecase in .str.contains)
# df['source'] = df["source"].str.lower() 

my_conditions = {
    "green": df["source"].str.contains("green", flags=re.I),
    "bio-gen": df["source"].str.contains("bio", flags=re.I),
    "off-grid": df["source"].str.contains("off-grid", flags=re.I),
}

df["category-lower"] = np.select(
    my_conditions.values(), my_conditions.keys(), default="other"
)

print(df)

Prints:

                        source category-lower
0              Off-grid energy       off-grid
1          off-grid generation       off-grid
2             Off grid energy           other
3               OFFGRID energy          other
4                  apple sauce          other
5                 green energy          green
6           Green electricity           green
7                tomato  sauce          other
8  BIOMASS as an energy source        bio-gen
9      produced heat (biogas).        bio-gen