A have a csv file with columns as below:
Last_Price, Price, Marked
3,2.89,
1.99,2.09,
3.9,3.79,
I created a pandas dataframe named subdf. I want to compare last digit of Last_Price and Price columns, If they are not equal I assign 'X' to the column 'Marked' otherwise leave it as blank or NaN.
I have tried:
subdf['Marked'] = np.where([x.strip()[-1] for x in subdf['Last_Price']] == [x.strip()[-1] for x in subdf['Price']],
'X',
np.nan)
It says: AttributeError: 'float' object has no attribute 'strip'
I tried the below as well but it didn't work. It doesn't capture last digit. I guess I need a for loop as well.
str(subdf['Price']).strip()[-1]
CodePudding user response:
Here's an example.
It assumes that all prices have 2 digits or less after the decimal point. And that for "3" you want to compare for the "0" in "3.00" and not the last digit "3".
Multiplying by 10 moves only the last digit to the end
Then doing a MOD (%) division by 1 removes the front.
Please note: I removed the spaces from the csv captions to allow proper importing of the names
import pandas as pd
import io
TESTDATA = """Last_Price,Price,Marked
3,2.89,
1.99,2.09,
3.9,3.79,"""
subdf = pd.read_csv(io.StringIO(TESTDATA), sep=",")
subdf.loc[((subdf['Last_Price'] * 10) % 1) !=
((subdf['Price'] * 10) % 1), 'Marked'] = 'X'
print(subdf)
The result
Last_Price Price Marked
0 3.00 2.89 X
1 1.99 2.09 NaN
2 3.90 3.79 X
If you really want to compare against the string output, so the "9" from "3.9" vs "3.79" match, then import the dataframe as a str
import pandas as pd
import io
TESTDATA = """Last_Price,Price,Marked
3,2.89,
1.99,2.09,
3.9,3.79,"""
subdf = pd.read_csv(io.StringIO(TESTDATA), sep=",", dtype='str')
subdf.loc[(subdf['Last_Price'].str.slice(-1)) !=
(subdf['Price'].str.slice(-1)), 'Marked'] = 'X'
print(subdf)
New result
Last_Price Price Marked
0 3 2.89 X
1 1.99 2.09 NaN
2 3.9 3.79 NaN