Issue: I have gotten a csv
file (wtih delimiter ~
) that came from a third party, and about 4000 records, and has 150 columns with real column names such as FirstName~LastName~OrderID~City~.....
. But when the file is loaded into a pandas dataframe df
and when I use print(list(df.columns))
it displays the column names as follows (I've simplified it for brevity):
['ÿþA', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',,,,,'Unnamed: 49']
Question: What I may be doing wrong, and how can we fix the issue to simply display the real column names in df? I'm using latest version of python
. I see some relevant articles such as this one but they are all related to one column.
Remark: It's a UTF-16 LE BOM
file. I discovered the issue when in my code, I referenced a column as df['OrdeID']
and I got well know KeyError that means you are refencing a column that does not exist.
Code:
import pandas as pd
df = pd.read_csv('/dbfs/FileStore/tables/MyDataFile.txt', sep='~', low_memory=False, quotechar='"', header='infer', encoding='cp1252')
print(df['OrdId'])
MyDataFile.txt sample:
FirstName~LastName~OrderID~City~.....
Kim~Doe~1234~New York~...............
Bob~Mason~456~Seattle~...............
..................
CodePudding user response:
Are you sure you have the right encoding?
I see your data file starts with ÿþ
when read in a cp1252 encoding. That looks like a UTF16 byte order mark (BOM.) Wikipedia has a table of these, and if you look at that table, you'll see it's a match with UTF16-LE (little endian.)
Once you figure out the right encoding, you can tell Pandas what encoding to use by calling pd.read_csv(..., encoding='...')
. To figure out what to put in the encoding field, you can consult this table. If you want UTF16-LE, that's 'utf_16_le'
.
More information:
What is this "ÿþA"? This is the same question, but about R instead of Python.
CodePudding user response:
Hey you can not use it directly and wanna use another way out by renaming it as per my understanding its non exixtent
try using
df.rename(columns={'Unnamed: 0':'new name0','Unnamed: 1':'new name1'}, inplace=True)