Home > OS >  pandas obtaining ranking from csv
pandas obtaining ranking from csv

Time:04-21

So I'm doing an assignment. I am given a file called 'population.csv'.

The file contains a list of population of countries by year. Using pandas, I want to obtain the top 20 populations given a column(year).

import pandas as pd
df = pd.read_csv('population.csv', sep='\t')
print(df.nlargest(20,columns='2018'))

I am getting a weird error shown here:

Traceback (most recent call last): File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key) File "pandas_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '2018' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\Users\nacho\Desktop\Personal\Fordham\Senior\Spring 2022\CompSci\Labs\Lab 8\lab8.py", line 7, in print(df.nlargest(5,columns='2018')) File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\frame.py", line 6684, in nlargest return algorithms.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest() File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\algorithms.py", line 1137, in nlargest return self.compute("nlargest") File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\algorithms.py", line 1274, in compute dtype = frame[column].dtype File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\frame.py", line 3505, in getitem indexer = self.columns.get_loc(key) File "C:\Users\nacho\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err KeyError: '2018'

CodePudding user response:

NEVERMIND... File is actually separated by commas. Professor said it was separated by \t when it wasnt...

CodePudding user response:

You can select top 20 rows from dataframe with specific column in this way:

df = df[['year']].head(20)

print("First 20 rows of the Dataframe for year column: ")
print(df)
  • Related