I have 2 columns in pandas, with data that looks like this.
code fx category
AXD AXDG.R cat1
AXF AXDG_e.FE cat1
333 333.R cat1
....
There are other categories but I am only interested in cat1.
I want to combine everything from the code
column, and everything after the .
in the fx
column and replace the code column with the new combination without affecting the other rows.
code fx category
AXD.R AXDG.R cat1
AXF.FE AXDG_e.FE cat1
333.R 333.R cat1
.....
Here is my code, I think I have to use regex but I'm not sure how to combine it in this way and only in the columns which I choose without affecting the rest of the rows.
df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)
I'm not sure how to select the second column also. Any help would be greatly appreciated.
CodePudding user response:
You can extract
the part of fx
and append it to code
:
df['code'] = df['fx'].str.extract('(\..*$)')[0]
output:
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1
to limit to cat1
only:
df.loc[df['category'].eq('cat1'), 'code'] = df['fx'].str.extract('(\..*$)')[0]
CodePudding user response:
There are other categories but I am only interested in cat1
You can use str.split
with series.where
to add the extention for cat1:
df['code'] = (df['code'].add("." df['fx'].str.split(".").str[-1])
.where(df['category'].eq("cat1"),df['code']))
print(df)
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1
CodePudding user response:
You can use Series.str.extract
:
df['code'] = df['code'].astype(str) np.where(df['category'].eq('cat1'), df['fx'].astype(str).str.extract('(\.. )')[0], '')
Output:
>>> df
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1
CodePudding user response:
We can get the expected result using split
like so :
>>> df['code'] = df['code'] '.' df['fx'].str.split(pat=".", expand=True)[1]
>>> df
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1
To filter only on cat1
, as @anky did very well, we can add a where
statement:
>>> df['code'] = (df['code'] '.' df['fx'].str.split(pat=".", expand=True)[1]).where(df['category'].eq("cat1"), df['code'])
CodePudding user response:
Replace alphanumerics before the dot. Append the result to column code.
df['code'] =df['fx'].str.replace('(^[\w] (?=\.))','',regex=True)
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1