I have the following data frame, there are other categories besides cat1
but I only want to make a change to the part of the dataframe where the category is cat1
.
name short code category
thyrax thx thxar.po cat1
gha gh gh.cd cat1
play pl pl.v cat1
xxdx xd xda.v cat1
......
For cases where everything after the .
in the code column is anything besides .cd
, I want the short column to take what's in the short column what is after the .
in the code
column and become something like thx.po
, but in cases where there is a cd
, I want it to become .cn
. I want the output to look like this,
name short code category
thyrax thx.po thxar.po cat1
gha gh.cn gh.cd cat1
play pl.v pl.v cat1
xxdx xd.v xda.v cat1
......
I don't know how to add a condition that IF
the category is cat1
and the code after the .
is .cd
, make it into short
what's after the .
cn.
I want the same condition for everything unless what is after the .
to be copied, but if what is after the .
is a .cn
, I want it to be a .cd
. What is the best way to do it?
I have gotten this code so far,
df['short'] = (df['short'].add("." df['code'].str.split(".").str[-1]).where(df['category'].eq("cat1"),df['short']))
But I can't figure out how to add the condition where if in the code column, what appears after the .
is .cd
and to have something different happen.
So basically my conditions are this,
firstly, category must be cat1
,
then take what is in the short column and merge it with what is after the .
in the code column.
If what is after the .
in the code column is cd
, make it into cn
.
CodePudding user response:
Try:
mask = df['category'] == 'cat1'
df.loc[mask, 'short'] = '.' df.loc[mask, 'code'] \
.str.split('.').str[1].replace({'cd': 'cn'})
print(df)
# Output:
name short code category
0 thyrax thx.po thxar.po cat1
1 gha gh.cd gh.cd cat1
2 play pl.v pl.v cat1
3 xxdx xd.v xda.v cat1
CodePudding user response:
Try this:
df['short'] = df['short'].astype(str) np.where(df['category'].eq('cat1'), df['code'].astype(str).str.extract('(\.. )')[0].replace('.cd', '.cn'), '')
Output:
>>> df
name short code category
0 thyrax thx.po thxar.po cat1
1 gha gh gh.cd cat2
2 gha gh.cn gh.cd cat1
3 play pl.v pl.v cat1
4 xxdx xd xda.v cat2
5 xxdx xd xda.v cat2
6 xxdx xd.v xda.v cat1
(I added some dummy rows in the above data to demonstrate that it works with the right category, cat1
.)
CodePudding user response:
def custom_apply_function(row):
if row['category'] != 'cat1':
return row.short
code_after_dot = row.code.split('.')[1]
if code_after_dot == 'cd':
code_after_dot = 'cn'
new_short = row.short '.' code_after_dot
return new_short
df.apply(axis=1, func=custom_apply_function)
returns
0 thx.po
1 gh.cn
2 pl.v
3 xd.v
Other answers in this thread are certainly faster due to vectorisation. In case you'd like to complicate the conditions further though, I would go with the more explicit version.