I am trying to convert string to int for a column in my dataframe.
I have amount column contains values like this:
123,123
(343,344)
I am conveting this:
123123
343344
For that my code I worte:
def strToInt(str2):
'''print(type(str2))'''
if type(str2) == str:
temp = str2.replace("(", "").replace(")","").replace(",","")
'''print("GOT :" str2 " RETURN :" str(int(temp)))'''
if checkInt(temp):
return int(temp)
return None
def checkInt(s):
if s[0] in ('-', ' '):
return s[1:].isdigit()
return s.isdigit()
print(df['amount'])
df['amount'] = df[['amount']].apply(lambda a: strToInt(a))
print(df['amount'])
print(df.columns)
print(df['amount'])
But I am getting all null values: I check the function strToInt individually, it giving correct output.
But after apply I am getting all NaN values.
Before:
0 45,105
1 24,250
2 8,35,440
3 3,00,900
4 1,69,920
After:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Need your help please.
CodePudding user response:
You can probably use a regex to make things more efficient:
df = pd.DataFrame({'amount': ['123,456', '(123,456)', '-123,465', '(-123,456)']})
df = df['amount'].str.replace(r'[^-\d]', '', regex=True).astype(int)
```
output:
```
0 123456
1 123456
2 -123465
3 -123456
Name: amount, dtype: int64
```
CodePudding user response:
Pass function to column df['amount']
, not one column DataFrame
- df[['amount']]
:
df['amount'] = df['amount'].apply(strToInt)
print (df)
amount
0 45105
1 24250
2 835440
3 300900
4 169920
Solution from comments wotking with negative sign:
a = np.where(df['amount'].str.startswith('-'), -1, 1)
df['amount'] = df['amount'].str.replace(r'\D', '', regex=True).astype(int).mul(a)