Home > Mobile >  Replace NaNs with unique reference in Pandas data frame
Replace NaNs with unique reference in Pandas data frame

Time:10-05

I've got a data frame with missing unique references, and I'd like to generate unique refs for these in the dataset. I thought I'd use the index/row values for this as it's an incremental number, but I just need any number that changes.

So far I've managed to create a column to get the index value (sure I probably don't have to do this bit, but it was the closest I've gotten to getting it to work):

# Create column with the index values so they can be used to create unique refs for missing planning references
ah_df['Index Values'] = ah_df.index.values

Then I've tried to reference this when trying to replace the NaNs, giving each of my new references the prefix 'Unknown Ref':

# Creates unique references to replace the blanks
ah_df.loc[ah_df["Planning Reference"].isnull(),'Planning Reference'] = "Unknown Ref"   str(ah_df['Index Values'])

This 'works' in so far as it gives me something, but the index bit is not giving me the expected incremental number. Instead I'm getting this:

"Unknown Ref0 0\n1 1\n2 2."

What am I doing wrong?

Thanks :)

CodePudding user response:

For converting to strings use Series.astype:

ah_df.loc[ah_df["Planning Reference"].isnull(),'Planning Reference'] = "Unknown Ref"   ah_df['Index Values'].astype(str)

Or new column is not necessary, use Index.astype:

ah_df.loc[ah_df["Planning Reference"].isnull(),'Planning Reference'] = "Unknown Ref"   ah_df.index.astype(str)

If need counter from 0 only for NaNs:

m = ah_df["Planning Reference"].isnull()
ah_df.loc[m,'Planning Reference'] = [f"Unknown Ref{i}" for i in range(m.sum)]
  • Related