Given the following dataframe:
df = pd.DataFrame({"code": ["codeA",
"Codeb",
"codeB",
"codea",
"N/A",
"N/A"],
"warehouse": [20,
30,
10,
30,
10,
70]})
I need to set a variable in a column according to three conditions:
- value = codeA
- value = codeB
- value is anything other than codeA or codeB = ""
Pseudocode:
# account for case: make case insensitive
if value REGEX '(?i)codeA':
value = "product A"
else if value REGEX '(?1)codeB':
value = "product B"
else
value = ""
Would I use a function with apply
?
I can do the first 2 like:
df['code'].replace(to_replace="(?i)CodeA", value="Product A", inplace=True, regex=True)
df['code'].replace(to_replace="(?i)CodeB", value="Product B", inplace=True, regex=True)
However -- I'm stuck on trying to say: "if it doesn't match either" set to "". Also wondering if there's a more efficient way to do this with an "else" clause.
NOTE: The ideal solution would account for human error in the input -- Eg., case insensitive. I do a strip
beforehand to account for trailing and leading spaces, however.
CodePudding user response:
Use a dict mapping
d = {'codea': 'Product A', 'codeb': 'Product B'}
df['code'] = df['code'].str.replace(' ', '').str.casefold()
df['code'] = df['code'].map(d).fillna('')
Output:
code warehouse
0 Product A 20
1 Product B 30
2 Product B 10
3 Product A 30
4 10
5 70
CodePudding user response:
A more general approach to setting a column value based off a mapping to another column would be to use map
mapping = {'codeA': 'product A', 'codeB': 'product B'}
df['mapped_product'] = df.code.map(mapping)
df
Out
code warehouse mapped_product
0 codeA 20 product A
1 codeB 30 product B
2 codeB 10 product B
3 codeA 30 product A
4 N/A 10 NaN
5 N/A 70 NaN
If all you're doing is a string replace, you could do it this way:
df['mapped_product'] = df.code.str.replace('code', 'product ')
df
Out
code warehouse mapped_product
0 codeA 20 product A
1 codeB 30 product B
2 codeB 10 product B
3 codeA 30 product A
4 N/A 10 N/A
5 N/A 70 N/A