I'm trying to read a csv file with a column of data that has a scrambled ID number that includes the occasional consecutive $$ along with #, numbers, and letters.
SCRAMBLE_ID |
---|
AL9LLL677 |
AL9$AM657 |
$L9$$4440 |
#L9$306A1 |
etc.
I tried the following:
df = pd.read_csv('MASTER~1.CSV',
dtype = {'SCRAMBLE_ID': str})
which rendered the third entry as L9$4440 (L9 appear in serif font, italicized, and the first and second $ vanish).
Faced with an entire column of ID numbers configured in this manner, what is the best way of dealing with such data? I can imagine:
- PRIOR TO pd.read_csv: replacing the offending symbols with substitutes that don't create this problem (and what would those be), OR,
- is there a way of preserving the IDs as is but making them into a data type that ignores these symbols while keeping them present?
Thank you. I've attached a screenshot of the .csv side by side with resulting df (Jupyter notebook) below.
csv column to pandas df with $$
CodePudding user response:
I cannot replicate this using the same values as you in a mock CSV file.
Are you sure that the formatting based on the $ symbol is not occurring in wherever you are rendering your dataframe values? Have you checked to see if the data in the dataframe is what you expect or are you just rendering it externally?