I want to make two transformations to the amount
column of following df:
Address type amount
0 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250,000 VSO
1 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250,000 VSO
2 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250,000 VSO
- I want to cut the ' VSO' substring from all rows.
- I want to apply
locale.setlocale(locale.LC_ALL, 'en_us')
to every row, turning every string into a float following that format.
The current code I have is:
locale.setlocale(locale.LC_ALL, 'en_us')
df_test['amount'].str.split(' VSO')[0]
locale.atof((str(df_test['amount'].values)))
Which yields me the error:
ValueError: could not convert string to float: "['250000 VSO' '250000 VSO' '250000 VSO' '33333 VSO' '33333 VSO'\n '10400000 VSO' '170833 VSO' '170833 VSO' '170833 VSO' '170833 VSO'\n
CodePudding user response:
Try with apply
after removing the trailing "VSO" with rstrip
:
import locale
locale.setlocale(locale.LC_ALL, 'en_us')
df["amount"] = df["amount"].str.rstrip(" VSO").apply(locale.atof)
>>> df
Address type amount
0 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250000.0
1 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250000.0
2 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250000.0
CodePudding user response:
I think that @not_speshal answers the question perfectly.
In the case that the string change slightly (like VSO
is changed for example), we can use the following regex
:
>>> df['amount'] = df.amount.str.extract(r"(\d \,\d |\d )")[0].str.replace(',', '').astype(float)
>>> df
Address type amount
0 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250000.0
1 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250000.0
2 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250000.0