How to modify Data Frame so as to take values between some other character in column in Python Panda-CodePudding

I have DataFrame in Python Pandas like below:

COL_1 | COL_2               | COL_3
------|---------------------|---------
111   | CV_COUNT_ABC_XM_BF  | CV_SUM_ABC_XM_BF
222   | CV_COUNT_DEF_XM_BF  | CV_SUM_CC_XM_BF
333   | CV_COUNT_CC_XM_BF   | LACK
444   | LACK                | CV_SUM_DEF_XM_BF
...   | ...                 |  ...

And I need to modify above DataFrame to have in COL_2 and COL_3 values like:

if there is "LACK" in COL_2 or COL_3 stay it

if there is something other than "LACK" take value:

between "CV_COUNNT_" and "_6M_BF" 

or 

between "CV_SUM_" and "_XM_BF"

So, as a result I need something like below:

COL_1 | COL_2             | COL_3
------|-------------------|---------
111   | ABC               | ABC
222   | DEF               | CC
333   | CC                | LACK
444   | LACK              | DEF
...   | ...               | ...

CodePudding user response：

We can use np.where along with str.extract here:

df["COL_2"] = np.where(df["COL_2"] == 'LACK', df["COL_2"],
                       df["COL_2"].str.extract(r'CV_COUNT_(\w ?)_XM_BF'))
df["COL_3"] = np.where(df["COL_3"] == 'LACK', df["COL_3"],
                   df["COL_3"].str.extract(r'CV_SUM_(\w ?)_XM_BF'))

CodePudding user response：

You can use replace:

df = df.replace({'COL_2': {'CV_COUNT_(\w )_XM_BF': r'\1'},
                 'COL_3': {'CV_SUM_(\w )_XM_BF': r'\1'}}, regex=True)
print(df)

# Output
   COL_1 COL_2 COL_3
0    111   ABC   ABC
1    222   DEF    CC
2    333    CC  LACK
3    444  LACK   DEF

A shortest version: df.replace({'CV_(?:COUNT|SUM)_(\w )_XM_BF': r'\1'}, regex=True)