Home > Net >  Combine 2 string columns in pandas with different conditions in both columns with another condition
Combine 2 string columns in pandas with different conditions in both columns with another condition

Time:12-21

I have the following data frame, there are other categories besides cat1 but I only want to make a change to the part of the dataframe where the category is cat1.

name   short code     category
thyrax thx   thxar.po cat1
gha    gh    gh.cd    cat1
play   pl    pl.v     cat1
xxdx   xd    xda.v    cat1
......

For cases where everything after the . in the code column is anything besides .cd, I want the short column to take what's in the short column what is after the . in the code column and become something like thx.po, but in cases where there is a cd, I want it to become .cn. I want the output to look like this,

name   short    code     category
thyrax thx.po   thxar.po cat1
gha    gh.cn    gh.cd    cat1
play   pl.v     pl.v     cat1
xxdx   xd.v     xda.v    cat1
......

I don't know how to add a condition that IF the category is cat1 and the code after the . is .cd, make it into short what's after the . cn.

I want the same condition for everything unless what is after the . to be copied, but if what is after the . is a .cn, I want it to be a .cd. What is the best way to do it?

I have gotten this code so far,

df['short'] = (df['short'].add("." df['code'].str.split(".").str[-1]).where(df['category'].eq("cat1"),df['short']))

But I can't figure out how to add the condition where if in the code column, what appears after the . is .cd and to have something different happen.

So basically my conditions are this,

firstly, category must be cat1,

then take what is in the short column and merge it with what is after the . in the code column.

If what is after the . in the code column is cd, make it into cn.

CodePudding user response:

Try:

mask = df['category'] == 'cat1'
df.loc[mask, 'short']  = '.'   df.loc[mask, 'code'] \
                                 .str.split('.').str[1].replace({'cd': 'cn'})
print(df)

# Output:
     name   short      code category
0  thyrax  thx.po  thxar.po     cat1
1     gha   gh.cd     gh.cd     cat1
2    play    pl.v      pl.v     cat1
3    xxdx    xd.v     xda.v     cat1

CodePudding user response:

Try this:

df['short'] = df['short'].astype(str)   np.where(df['category'].eq('cat1'), df['code'].astype(str).str.extract('(\.. )')[0].replace('.cd', '.cn'), '')

Output:

>>> df
     name   short      code category
0  thyrax  thx.po  thxar.po     cat1
1     gha      gh     gh.cd     cat2
2     gha   gh.cn     gh.cd     cat1
3    play    pl.v      pl.v     cat1
4    xxdx      xd     xda.v     cat2
5    xxdx      xd     xda.v     cat2
6    xxdx    xd.v     xda.v     cat1

(I added some dummy rows in the above data to demonstrate that it works with the right category, cat1.)

CodePudding user response:

def custom_apply_function(row):
    if row['category'] != 'cat1':
        return row.short
    code_after_dot = row.code.split('.')[1]
    if code_after_dot == 'cd':
        code_after_dot = 'cn'
    new_short = row.short   '.'   code_after_dot
    return new_short

df.apply(axis=1, func=custom_apply_function)

returns

0    thx.po
1     gh.cn
2      pl.v
3      xd.v

Other answers in this thread are certainly faster due to vectorisation. In case you'd like to complicate the conditions further though, I would go with the more explicit version.

  • Related