Home > Net >  How to detect #N/A in a data frame (data taken from xlsx file) using pandas?
How to detect #N/A in a data frame (data taken from xlsx file) using pandas?

Time:11-24

The blank cells with no data can be checked with: if pd.isna(dataframe.loc[index_name, column_name] == True)

but if the cell has #N/A, the above command does not work nor dataframe.loc[index, column_name] == '#N/A'.

On reading that cell, it shows NaN, but the above codes does not work. My main target is to capture the release dates and store it in a list.

Dataframe Code

CodePudding user response:

If you're reading your dataframe tft from a spreadsheet (and it seems to be the case here), you can use the parameter na_values of pandas.read_excel to consider some values (e.g #N/A) as NaN values like below :

tft= pd.read_excel("path_to_the_file.xlsx", na_values=["#N/A"])

Otherwise, if you want to preserve those #N/A values/strings, you can check/select them like this :

tft.loc[tft["Release Data"].eq("#N/A")] #will return a dataframe

In the first scenario, your code would be like this :

rel_date= []

for i in range(len(tft)):
    if pd.isna(tft["Release Date"])
        continue
    else:
        rel_date.append(int(str(tft.loc[i, "Release Date"]).split()[1]))

However, there is no need for the loop here, you can make a list of the release dates with this :

rel_date= (
            tft["Release Date"]
                .str.extract(("Release (\d{8})"), expand=False)
                .dropna()
                .astype(int)
                .drop_duplicates()
                .tolist()
          )
print(rel_date)
[20220603, 20220610]
  • Related