I have a csv file that looks like this:
Val1 Val2 Val3 Val4
Row1 Row2 Row3 Row4 Row5 Row6 Row7 Row8
1 2 3 4 5 6 7 8
I read it as df=pd.read_csv('file.csv', header=[0,1])
That produces the following:
Val1 Unnamed_1 Unnamed_2 Val2 Unnamed_3 ...
Row1 Row2 Row3 Row4 Row5 ...
1 2 3 4 5 ...
Namely, it fills the empty headers at level 0 with "Unnamed_x". I was trying to manually correct that to get what I wanted, which is val1
spans Row1
and Row2
, then Val2
rows 3, 4, and 5, etc. For that I used the following inelegant solution:
tups = []
for col in df.columns:
if "Unnamed" not in col[0]:
tups.append(col)
current = col[0]
else:
cols = (current, col[1])
tups.append(col)
df.columns = pd.MultiIndex.from_tuples(tups, names=['Vals', 'Rows'])
However, that simply copies the columns over, namely:
Val1 Val1 Val2 Val2 Val2 Val3 Val3 Val4
Row1 Row2 Row3 Row4 Row5 Row6 Row7 Row8
1 2 3 4 5 6 7 8
and when I call df['Val1']
, then I, expectedly, get the error that multi index have to be unique.
I am very confused because this seems a relatively straightforward thing to do, but for some reason it isn't working. I made sure to closely follow the documentation in the official pandas website. I would appreciate if someone can help me figure out how to fix this; or even better, if there is a way to directly tell pandas that the empty columns in the header should have the same value as the value to the left of them.
CodePudding user response:
I tried a slightly different approach: reading the headers and the content and then sticking them together. No issues with the MultiIndex for me
headers = pd.read_csv(StringIO("""Val1,,Val2,,,Val3,Val4,
Row1,Row2,Row3,Row4,Row5,Row6,Row7,Row8
1,2,3,4,5,6,7,8"""),header=None,nrows=2)
df=pd.read_csv(StringIO("""Val1,,Val2,,,Val3,Val4,
Row1,Row2,Row3,Row4,Row5,Row6,Row7,Row8
1,2,3,4,5,6,7,8"""),header=None,skiprows=2)
newCols = pd.MultiIndex.from_frame(headers.T.ffill())
df.columns = newCols
>>> df
0 Val1 Val2 Val3 Val4
1 Row1 Row2 Row3 Row4 Row5 Row6 Row7 Row8
2 1 2 3 4 5 6 7 8
>>> df['Val1']
1 Row1 Row2
2 1 2
>>>