Home > Net >  Why pandas dataframe displaying column names as 'unnamed: 1', unnamed: 2',.......
Why pandas dataframe displaying column names as 'unnamed: 1', unnamed: 2',.......

Time:06-18

Issue: I have gotten a csv file (wtih delimiter ~) that came from a third party, and about 4000 records, and has 150 columns with real column names such as FirstName~LastName~OrderID~City~...... But when the file is loaded into a pandas dataframe df and when I use print(list(df.columns)) it displays the column names as follows (I've simplified it for brevity):

['ÿþA', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',,,,,'Unnamed: 49']

Question: What I may be doing wrong, and how can we fix the issue to simply display the real column names in df? I'm using latest version of python. I see some relevant articles such as this one but they are all related to one column.

Remark: It's a UTF-16 LE BOM file. I discovered the issue when in my code, I referenced a column as df['OrdeID'] and I got well know KeyError that means you are refencing a column that does not exist.

Code:

import pandas as pd

df = pd.read_csv('/dbfs/FileStore/tables/MyDataFile.txt', sep='~', low_memory=False, quotechar='"', header='infer', encoding='cp1252')

print(df['OrdId'])

MyDataFile.txt sample:

FirstName~LastName~OrderID~City~.....
Kim~Doe~1234~New York~...............
Bob~Mason~456~Seattle~...............
..................

CodePudding user response:

Are you sure you have the right encoding?

I see your data file starts with ÿþ when read in a cp1252 encoding. That looks like a UTF16 byte order mark (BOM.) Wikipedia has a table of these, and if you look at that table, you'll see it's a match with UTF16-LE (little endian.)

Once you figure out the right encoding, you can tell Pandas what encoding to use by calling pd.read_csv(..., encoding='...'). To figure out what to put in the encoding field, you can consult this table. If you want UTF16-LE, that's 'utf_16_le'.

More information:

Pandas docs on read_csv

What is this "ÿþA"? This is the same question, but about R instead of Python.

CodePudding user response:

Hey you can not use it directly and wanna use another way out by renaming it as per my understanding its non exixtent

try using

df.rename(columns={'Unnamed: 0':'new name0','Unnamed: 1':'new name1'}, inplace=True)

  • Related