I have a data frame where numeric data is stored in String with some Prefix character which I need to remove. On top of this it has double quotes inside the quotes i.e. ' "" '.
dict_1 = {"Col1" : [1001, 1002, 1003, 1004, 1005],
"Col2" : ['"Rs. 5131"', '"Rs. 0"', '"Rs 351157"', '"Rs 535391"', '"Rs. 6513"']}
a = pd.DataFrame(dict_1)
a.head(6)
| | Col1 | Col2 |
|----|----------|-------------|
| 0 |1001 |"Rs. 5131" |
| 1 |1002 |"Rs. 0" |
| 2 |1003 |"Rs 351157" |
| 3 |1004 |"Rs 535391" |
| 4 |1005 |"Rs. 6513" |
As you can see I want to remove Quotes defined inside Col2 and along with this I have to remove Rs.
I tried following code to subset
b = a['Col2'][0]
b = b[5:]
b = b[:-1]
b
But the issue in some observation it is defined as Rs. and in some Rs without period.
Can someone help me with the code and at last need to convert this into integer. The issue is that in some data
CodePudding user response:
You can simply use removeprefix and removesuffix methods for string first b = a['Col2'][0]
b=b.removeprefix("'")
b=b.removesuffix("'")
b=b.removeprefix("Rs. ")
b=b.removesuffix("Rs ")
By this even if there will be Rs. with a period or without period both will be removed without any error
CodePudding user response:
Or use .str.replace():
a["Col2"] = a["Col2"].str.replace('Rs. ', '').replace('"', '')
CodePudding user response:
Another option:
import pandas as pd
dict_1 = {"Col1": [1001, 1002, 1003, 1004, 1005],
"Col2": ['"Rs. 5131"', '"Rs. 0"', '"Rs 351157"', '"Rs 535391"', '"Rs. 6513"']}
a = pd.DataFrame(dict_1)
a['Col2'] = a['Col2'].replace({'"': ''}, regex=True)
a['Col2'] = a['Col2'].replace({'Rs.': ''}, regex=True)
a['Col2'] = a['Col2'].replace({'Rs': ''}, regex=True)
a['Col2'] = a['Col2'].replace({' ': ''}, regex=True)
print(a.head(6))