new_data = {'mid':mids, 'human':all_tags, 'new':new_tags, 'old':old_tags}
df = pd.DataFrame(new_data.items(), columns=['mid', 'human', 'new', 'old'])
new_data is a dictionary, in which the value of each column is a list with equal length. I tried to convert it into a df, but it gives this error:
ValueError: 4 columns passed, passed data had 2 columns
How to convert this new_data into a df?
CodePudding user response:
Remove .items()
:
new_data = {'mid':[1, 2], 'human':[1, 2], 'new':[1, 2], 'old':[1, 2]}
df = pd.DataFrame(new_data, columns=['mid', 'human', 'new', 'old'])
Note:
Passing
columns
here is redundant, because their names equal the dictionary keys anyways. So just use:>>> pd.DataFrame(new_data) mid human new old 0 1 1 1 1 1 2 2 2 2
The reason behind the error:
If you try this, here is what you'll get:
>>> pd.DataFrame(new_data.items())
0 1
0 mid [1, 2]
1 human [1, 2]
2 new [1, 2]
3 old [1, 2]
Why?
Check this:
>>> list(new_data.items())
[('mid', [1, 2]), ('human', [1, 2]), ('new', [1, 2]), ('old', [1, 2])]
It is in a format "list of lists" (well, list of tuples in this case). If pd.DataFrame()
receives this, it will assume you are going row by row. This is why it constructs only two columns. And that is why your assignment of column names fails - there are 2 columns but you are providing 4 column names.