I have a dataframe called df_freight
and I would like to create a new column called "LM", based on a condition in another column called "Cost rate". The condition is: if it contains code "lm" right "lm" otherwise "not lm".
df_freight =pd.DataFrame(
{'Cost rate': ['11.53 LM', '12.22kg','22 LM','sdfdfsdf'],
'TO Number': ['x12', 'x13','x14','x15']})
df_fright["LM"] = df_fright.apply(lambda row: "LM" if row["Cost rate"].str.contans("lm") else "Not lm", axis=1)
but I am getting attribute error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-94-d277ddb08fc7> in <module>
----> 1 df_fright["LM"] = df_fright.apply(lambda row: "LM" if row["Cost rate"].str.contans("lm") else "Not lm", axis=1)
~\Anaconda3\envs\general\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
~\Anaconda3\envs\general\lib\site-packages\pandas\core\apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
~\Anaconda3\envs\general\lib\site-packages\pandas\core\apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
~\Anaconda3\envs\general\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
<ipython-input-94-d277ddb08fc7> in <lambda>(row)
----> 1 df_fright["LM"] = df_fright.apply(lambda row: "LM" if row["Cost rate"].str.contans("lm") else "Not lm", axis=1)
AttributeError: 'str' object has no attribute 'str'
isn't the syntax correct?
CodePudding user response:
row["Cost rate"]
is already a string, so you don't have to use .str
. Also to check if a substring is contained in a string use in
instead of contains()
.
import pandas as pd
df_freight = pd.DataFrame(
{'Cost rate': ['11.53 LM', '12.22kg', '22 LM', 'sdfdfsdf'],
'TO Number': ['x12', 'x13', 'x14', 'x15']})
df_freight["LM"] = df_freight.apply(lambda row: "LM" if "lm" in row["Cost rate"] else "Not lm", axis=1)
print(df_freight)
> Cost rate TO Number LM
0 11.53 LM x12 Not lm
1 12.22kg x13 Not lm
2 22 LM x14 Not lm
3 sdfdfsdf x15 Not lm
Returns False for all because comparing strings is case-sensitive. So you have to add .lower()
to compare them:
import pandas as pd
df_freight = pd.DataFrame(
{'Cost rate': ['11.53 LM', '12.22kg', '22 LM', 'sdfdfsdf'],
'TO Number': ['x12', 'x13', 'x14', 'x15']})
df_freight["LM"] = df_freight.apply(lambda row: "LM" if "lm" in row["Cost rate"].lower() else "Not lm", axis=1)
print(df_freight)
> Cost rate TO Number LM
0 11.53 LM x12 LM
1 12.22kg x13 Not lm
2 22 LM x14 LM
3 sdfdfsdf x15 Not lm
CodePudding user response:
You can use a vectorized operation:
df_freight["LM"] = np.where(df_freight['Cost rate'].str.contains('lm', case=False),
'LM', 'Not lm')
print(df)
# Output
Cost rate TO Number LM
0 11.53 LM x12 LM
1 12.22kg x13 Not lm
2 22 LM x14 LM
3 sdfdfsdf x15 Not lm
CodePudding user response:
Use this code instead. This will work also this is easiest one mentioned here. No need for str, no need of axis, no need for vectorize, no need for anything simple and easy. enjoy!
df_freight['LM'] = df_freight['Cost rate'].apply(lambda x: 'LM' if "LM" in x else "Not lm")
Output
Cost rate TO Number LM
0 11.53 LM x12 LM
1 12.22kg x13 Not lm
2 22 LM x14 LM
3 sdfdfsdf x15 Not lm