The blank cells with no data can be checked with:
if pd.isna(dataframe.loc[index_name, column_name] == True)
but if the cell has #N/A, the above command does not work nor
dataframe.loc[index, column_name] == '#N/A'.
On reading that cell, it shows NaN, but the above codes does not work. My main target is to capture the release dates and store it in a list.
CodePudding user response:
If you're reading your dataframe tft
from a spreadsheet (and it seems to be the case here), you can use the parameter na_values
of pandas.read_excel
to consider some values (e.g #N/A
) as NaN
values like below :
tft= pd.read_excel("path_to_the_file.xlsx", na_values=["#N/A"])
Otherwise, if you want to preserve those #N/A
values/strings, you can check/select them like this :
tft.loc[tft["Release Data"].eq("#N/A")] #will return a dataframe
In the first scenario, your code would be like this :
rel_date= []
for i in range(len(tft)):
if pd.isna(tft["Release Date"])
continue
else:
rel_date.append(int(str(tft.loc[i, "Release Date"]).split()[1]))
However, there is no need for the loop here, you can make a list of the release dates with this :
rel_date= (
tft["Release Date"]
.str.extract(("Release (\d{8})"), expand=False)
.dropna()
.astype(int)
.drop_duplicates()
.tolist()
)
print(rel_date)
[20220603, 20220610]