pandas obtaining ranking from csv-CodePudding

So I'm doing an assignment. I am given a file called 'population.csv'.

The file contains a list of population of countries by year. Using pandas, I want to obtain the top 20 populations given a column(year).

import pandas as pd
df = pd.read_csv('population.csv', sep='\t')
print(df.nlargest(20,columns='2018'))

I am getting a weird error shown here:

Traceback (most recent call last): File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key) File "pandas_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '2018' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\Users\nacho\Desktop\Personal\Fordham\Senior\Spring 2022\CompSci\Labs\Lab 8\lab8.py", line 7, in print(df.nlargest(5,columns='2018')) File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\frame.py", line 6684, in nlargest return algorithms.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest() File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\algorithms.py", line 1137, in nlargest return self.compute("nlargest") File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\algorithms.py", line 1274, in compute dtype = frame[column].dtype File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\frame.py", line 3505, in getitem indexer = self.columns.get_loc(key) File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err KeyError: '2018'

CodePudding user response：

NEVERMIND... File is actually separated by commas. Professor said it was separated by \t when it wasnt...

CodePudding user response：

You can select top 20 rows from dataframe with specific column in this way:

df = df[['year']].head(20)

print("First 20 rows of the Dataframe for year column: ")
print(df)