If using the following DataFrame I can split the "ccy" string and create two new columns:
df_so = pd.DataFrame.from_dict({0: 'gbp_usd',
1: 'eur_usd',
2: 'usd_cad',
3: 'usd_jpy',
4: 'eur_usd',
5: 'eur_usd'},orient='index',columns=["ccy"])
df_so[['base_ccy', 'quote_ccy']] = df_so['ccy'].str.split('_', 1, expand=True)
giving the following DataFrame.
index | ccy | base_ccy | quote_ccy |
---|---|---|---|
0 | gbp_usd | gbp | usd |
1 | eur_usd | eur | usd |
2 | usd_cad | usd | cad |
3 | usd_jpy | usd | jpy |
4 | eur_usd | eur | usd |
5 | eur_usd | eur | usd |
How do I do the same str.split
using DataFrame.assign within my tweak function below
?
I can do this with a list comprehension to get the same result, but is there a simpler/cleaner way using assign?:
def tweak_df (df_):
return (df_.assign(base_currency= lambda df_: [i[0] for i in df_['ccy'].str.split('_', 1)],
quote_currency= lambda df_: [i[1] for i in df_['ccy'].str.split('_', 1)],
)
)
tweak_df(df_so)
Yields same result as the table above but the code is not very intuitive and simple is better than complex.
CodePudding user response:
A possible solution:
df_so.assign(**tweak_df(df_so))
Output:
ccy base_ccy quote_ccy base_currency quote_currency
0 gbp_usd gbp usd gbp usd
1 eur_usd eur usd eur usd
2 usd_cad usd cad usd cad
3 usd_jpy usd jpy usd jpy
4 eur_usd eur usd eur usd
5 eur_usd eur usd eur usd
CodePudding user response:
I actually think the first version you suggested is the best.
df_so[['base_ccy', 'quote_ccy']] = df_so['ccy'].str.split('_', 1, expand=True)
If you want to do it using assign, you can do it like this utilising the rename function.
df_so.assign(**df_so['ccy'].str.split('_', n=1, expand=True)
.rename(columns={0: "base_ccy", 1: "quote_ccy"}))