I have DataFrame in Python Pandas like below:
COL_1 | COL_2 | COL_3
------|---------------------|---------
111 | CV_COUNT_ABC_XM_BF | CV_SUM_ABC_XM_BF
222 | CV_COUNT_DEF_XM_BF | CV_SUM_CC_XM_BF
333 | CV_COUNT_CC_XM_BF | LACK
444 | LACK | CV_SUM_DEF_XM_BF
... | ... | ...
And I need to modify above DataFrame to have in COL_2 and COL_3 values like:
if there is "LACK" in COL_2 or COL_3 stay it
if there is something other than "LACK" take value:
between "CV_COUNNT_" and "_6M_BF" or between "CV_SUM_" and "_XM_BF"
So, as a result I need something like below:
COL_1 | COL_2 | COL_3
------|-------------------|---------
111 | ABC | ABC
222 | DEF | CC
333 | CC | LACK
444 | LACK | DEF
... | ... | ...
CodePudding user response:
We can use np.where
along with str.extract
here:
df["COL_2"] = np.where(df["COL_2"] == 'LACK', df["COL_2"],
df["COL_2"].str.extract(r'CV_COUNT_(\w ?)_XM_BF'))
df["COL_3"] = np.where(df["COL_3"] == 'LACK', df["COL_3"],
df["COL_3"].str.extract(r'CV_SUM_(\w ?)_XM_BF'))
CodePudding user response:
You can use replace
:
df = df.replace({'COL_2': {'CV_COUNT_(\w )_XM_BF': r'\1'},
'COL_3': {'CV_SUM_(\w )_XM_BF': r'\1'}}, regex=True)
print(df)
# Output
COL_1 COL_2 COL_3
0 111 ABC ABC
1 222 DEF CC
2 333 CC LACK
3 444 LACK DEF
A shortest version: df.replace({'CV_(?:COUNT|SUM)_(\w )_XM_BF': r'\1'}, regex=True)