I've got a data frame with missing unique references, and I'd like to generate unique refs for these in the dataset. I thought I'd use the index/row values for this as it's an incremental number, but I just need any number that changes.
So far I've managed to create a column to get the index value (sure I probably don't have to do this bit, but it was the closest I've gotten to getting it to work):
# Create column with the index values so they can be used to create unique refs for missing planning references
ah_df['Index Values'] = ah_df.index.values
Then I've tried to reference this when trying to replace the NaNs, giving each of my new references the prefix 'Unknown Ref':
# Creates unique references to replace the blanks
ah_df.loc[ah_df["Planning Reference"].isnull(),'Planning Reference'] = "Unknown Ref" str(ah_df['Index Values'])
This 'works' in so far as it gives me something, but the index bit is not giving me the expected incremental number. Instead I'm getting this:
"Unknown Ref0 0\n1 1\n2 2."
What am I doing wrong?
Thanks :)
CodePudding user response:
For converting to strings use Series.astype
:
ah_df.loc[ah_df["Planning Reference"].isnull(),'Planning Reference'] = "Unknown Ref" ah_df['Index Values'].astype(str)
Or new column is not necessary, use Index.astype
:
ah_df.loc[ah_df["Planning Reference"].isnull(),'Planning Reference'] = "Unknown Ref" ah_df.index.astype(str)
If need counter from 0
only for NaN
s:
m = ah_df["Planning Reference"].isnull()
ah_df.loc[m,'Planning Reference'] = [f"Unknown Ref{i}" for i in range(m.sum)]