The values of my first column are going into the index but the column name is the first column outside the index, so I cannot use df.reset_index. For instance, my dataframe looks like this:
CHA_NUMB | CHA_NAME | UN_CHA_ID | |
---|---|---|---|
1 | m_3_1 | 12345 | lcha |
2 | t_1_2 | 12456 | lcha |
3 | blah | 90244 | lcha |
4 | blah | 23435 | lcha |
When it should look like this:
CHA_NUMB | CHA_NAME | UN_CHA_ID | |
---|---|---|---|
0 | 1 | m_3_1 | 12345 |
1 | 2 | t_1_2 | 12456 |
2 | 3 | blah | 90244 |
3 | 4 | blah | 23435 |
I tried resetting the index but it didn't work. Resetting the index makes the dataframe look like this:
index | CHA_NUMB | CHA_NAME | UN_CHA_ID | |
---|---|---|---|---|
0 | 0 | m_3_1 | 12345 | lcha |
1 | 1 | t_1_2 | 12456 | lcha |
2 | 2 | blah | 90244 | lcha |
3 | 3 | blah | 23435 | lcha |
CodePudding user response:
First use DataFrame.reset_index
, then remove last column by indexing in DataFrame.iloc
and last set columns names by original DataFrame by DataFrame.set_axis
:
df = df.reset_index().iloc[:, :-1].set_axis(df.columns, axis=1)
print (df)
CHA_NUMB CHA_NAME UN_CHA_ID
0 1 m_3_1 12345
1 2 t_1_2 12456
2 3 blah 90244
3 4 blah 23435
Alternative:
cols = df.columns
df = df.reset_index().iloc[:, :-1]
df.columns = cols
EDIT: If first row of columns names not matched data you can omit columns names by header=None
and skiprows=1
, get columns names like RangeIndex, then use usecols
for select first and third column and last set columns names by names
parameter:
df = pd.read_csv(file,
header=None,
skiprows=1,
usecols=[0,2],
names=['CHA_NUMB','UN_CHA_ID'])
print (df)
CHA_NUMB UN_CHA_ID
0 1 12345
1 2 12456
2 3 90244
3 4 23435