In the last line of this code I want to set the index of to 'Country' but when I look at the columns of the data frame it is still called 'index'. I have tried without the inplace and create a new df and with option drop=True. But that doesn't to work.
import pandas as pd
import numpy as np
Energy = pd.read_excel('./assets/Energy Indicators.xls', header=None, footer=None, usecols=range(2,6))
Energy = Energy[18:245].reset_index()
Energy.rename(columns={2 : 'Country', 3 :'Energy Supply', 4 : 'Energy Supply per Capita', 5 : '% Renewable'}, inplace=True)
Energy.replace('...', np.nan, inplace=True)
Energy.replace(["Republic of Korea", "United States of America", "United Kingdom of Great Britain and Northern Ireland", "China, Hong Kong Special Administrative Region"],["South Korea", "United States", "United Kingdom", "Hong Kong"], inplace = True)
Energy['Country'] = Energy['Country'].str.replace(r"\(.*\)","")
Energy['Country'] = Energy['Country'].str.replace('\d ', '',)
Energy['Energy Supply'] = Energy['Energy Supply'].apply(lambda x : x * 1000000)
Energy.set_index('Country', inplace=True)
print(Energy.index)
print(Energy.columns.values)
The output is:
Index(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra',
'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia',
...
'United States Virgin Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu',
'Venezuela ', 'Viet Nam', 'Wallis and Futuna Islands', 'Yemen',
'Zambia', 'Zimbabwe'],
dtype='object', name='Country', length=227)
['index' 'Energy Supply' 'Energy Supply per Capita' '% Renewable']
How do you set the index?
CodePudding user response:
The 'index'
you see in your columns is not your index, it is a column left over from when you did Energy = Energy[18:245].reset_index()
CodePudding user response:
You have done it right!
When you did Energy.set_index('Country', inplace=True)
, it did work!
That's why when you printed the index, Energy.index
, it gave you the Countries as the result. Index
is a class within Pandas. Read more here
The output of print(Energy.index)
also indicates the index to be set as countries.
The next output, print(Energy.columns)
shows an index
column, because you did a reset_index()
previously. Hope this helps!