I have data that looks like this.
state sex salary
jordan CA m 100
lebron NY m 200
There are 4 columns however the first one does not have a column name. The other 3 columns are state
, sex,
salary`. How do i initialize a data frame withe the above data?
I tried the following.
import pandas as pd
data = [['jordan','CA','m',100], ['lebron','NY','m',200]]
df = pd.DataFrame(data, columns = ['','state','sex','Age'])
When i do df.columns
I see
Index(['', 'state', 'sex', 'Age'], dtype='object')
However I expect to see Index(['state', 'sex', 'Age'], dtype='object')
when i do df.columns
So i am wondering how can i initialize the dataframe such that the column that has the names jordan
and lebron
is not actually a column.
CodePudding user response:
data = [['CA','m',100], ['NY','m',200]]
df = pd.DataFrame(data,columns= ['state','sex','Age'], index=['jordan', 'lebron'])
or you can do with your existing datafram as below
import pandas as pd
data = [['jordan','CA','m',100], ['lebron','NY','m',200]]
df = pd.DataFrame(data, columns = ['','state','sex','Age'])
df.set_index(df[''],inplace=True)
df.drop(columns=[''], inplace=True)
CodePudding user response:
Just want to add the scenario when loading from csv file, you can use index_col
to specify which column to use as index.
Assumes the data is in a file named temp.csv
like:
,state,sex,salary
jordan,CA,m,100
lebron,NY,m,200
you can read in the data with:
import pandas as pd
df = pd.read_csv("temp.csv", index_col=0)
then you can get
df.index # Index(['jordan', 'lebron'], dtype='object')
df.columns # Index(['state', 'sex', 'salary'], dtype='object')
Reference: