Home > Blockchain >  How to modify Data Frame so as to take values between some other character in column in Python Panda
How to modify Data Frame so as to take values between some other character in column in Python Panda

Time:01-24

I have DataFrame in Python Pandas like below:

COL_1 | COL_2               | COL_3
------|---------------------|---------
111   | CV_COUNT_ABC_XM_BF  | CV_SUM_ABC_XM_BF
222   | CV_COUNT_DEF_XM_BF  | CV_SUM_CC_XM_BF
333   | CV_COUNT_CC_XM_BF   | LACK
444   | LACK                | CV_SUM_DEF_XM_BF
...   | ...                 |  ...

And I need to modify above DataFrame to have in COL_2 and COL_3 values like:

  • if there is "LACK" in COL_2 or COL_3 stay it

  • if there is something other than "LACK" take value:

    between "CV_COUNNT_" and "_6M_BF" 
    
    or 
    
    between "CV_SUM_" and "_XM_BF"
    

So, as a result I need something like below:

COL_1 | COL_2             | COL_3
------|-------------------|---------
111   | ABC               | ABC
222   | DEF               | CC
333   | CC                | LACK
444   | LACK              | DEF
...   | ...               | ...

CodePudding user response:

We can use np.where along with str.extract here:

df["COL_2"] = np.where(df["COL_2"] == 'LACK', df["COL_2"],
                       df["COL_2"].str.extract(r'CV_COUNT_(\w ?)_XM_BF'))
df["COL_3"] = np.where(df["COL_3"] == 'LACK', df["COL_3"],
                   df["COL_3"].str.extract(r'CV_SUM_(\w ?)_XM_BF'))

CodePudding user response:

You can use replace:

df = df.replace({'COL_2': {'CV_COUNT_(\w )_XM_BF': r'\1'},
                 'COL_3': {'CV_SUM_(\w )_XM_BF': r'\1'}}, regex=True)
print(df)

# Output
   COL_1 COL_2 COL_3
0    111   ABC   ABC
1    222   DEF    CC
2    333    CC  LACK
3    444  LACK   DEF

A shortest version: df.replace({'CV_(?:COUNT|SUM)_(\w )_XM_BF': r'\1'}, regex=True)

  • Related