Combine 2 string columns in pandas with different conditions in both columns-CodePudding

I have 2 columns in pandas, with data that looks like this.

code fx         category
AXD  AXDG.R     cat1
AXF  AXDG_e.FE  cat1 
333  333.R      cat1
....

There are other categories but I am only interested in cat1.

I want to combine everything from the code column, and everything after the . in the fx column and replace the code column with the new combination without affecting the other rows.

code    fx         category
AXD.R   AXDG.R     cat1
AXF.FE  AXDG_e.FE  cat1
333.R   333.R      cat1
.....

Here is my code, I think I have to use regex but I'm not sure how to combine it in this way and only in the columns which I choose without affecting the rest of the rows.

df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)

I'm not sure how to select the second column also. Any help would be greatly appreciated.

CodePudding user response：

You can extract the part of fx and append it to code:

df['code']  = df['fx'].str.extract('(\..*$)')[0]

output:

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

to limit to cat1 only:

df.loc[df['category'].eq('cat1'), 'code']  = df['fx'].str.extract('(\..*$)')[0]

CodePudding user response：

There are other categories but I am only interested in cat1

You can use str.split with series.where to add the extention for cat1:

df['code'] = (df['code'].add("." df['fx'].str.split(".").str[-1])
             .where(df['category'].eq("cat1"),df['code']))

print(df)

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

CodePudding user response：

You can use Series.str.extract:

df['code'] = df['code'].astype(str)   np.where(df['category'].eq('cat1'), df['fx'].astype(str).str.extract('(\.. )')[0], '')

Output:

>>> df
     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

CodePudding user response：

We can get the expected result using split like so :

>>> df['code'] = df['code']   '.'   df['fx'].str.split(pat=".", expand=True)[1]
>>> df
    code    fx          category    
0   AXD.R   AXDG.R      cat1        
1   AXF.FE  AXDG_e.FE   cat1        
2   333.R   333.R       cat1

To filter only on cat1, as @anky did very well, we can add a where statement:

>>> df['code'] = (df['code']   '.'   df['fx'].str.split(pat=".", expand=True)[1]).where(df['category'].eq("cat1"), df['code'])

CodePudding user response：

Replace alphanumerics before the dot. Append the result to column code.

df['code']  =df['fx'].str.replace('(^[\w] (?=\.))','',regex=True)




    code         fx  category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1