I am trying to use the conditioned selection of interested rows/columns into the followng dataset:
import pandas as pd
already_read = [("Il nome della rosa","Umberto Eco", 1980),
("L'amore che ti meriti","Daria Bignardi", 2014),
("Memorie dal sottsuolo", " Fëdor Dostoevskij", 1864),
("Oblomov", "Ivan Alexandrovich Goncharov ", '/')]
index = range(1,5,1)
data = pd.DataFrame(already_read, columns = ["Books'Title", "Authors", "Publishing Year"], index = index)
data
In the following way:
data[(data['Publishing Year'] >= 1850) & (data['Publishing Year'] <= 1950)]
As you could see, the column I have chosen contains mixed data (int and str)
and indeed I have this error after running the code:
TypeError: '>=' not supported between instances of 'str' and 'int'
If please, since I'm moving my very first step with Python, could you please suggest some way to run that code in a way that the string value is excluded or it is read as an integer, possibly by implementing *if statement?*
(or another method)?
Thanks
CodePudding user response:
One way to go, would be to use df.apply
with a custom function. Something like this:
def check_int(x):
if isinstance(x, int):
return (x >= 1850) & (x <= 1950)
return False
data[data['Publishing Year'].apply(lambda x: check_int(x))]
Here check_int
will return False
for every value that is not an int
, and apply the evaluation just on the ints
. So, we are getting:
data['Publishing Year'].apply(lambda x: check_int(x))
1 False
2 False
3 True
4 False
Name: Publishing Year, dtype: bool
And next we use this pd.Series with booleans
to select from the data:
data[data['Publishing Year'].apply(lambda x: check_int(x))]
Books'Title Authors Publishing Year
3 Memorie dal sottsuolo Fëdor Dostoevskij 1864