I have a problem I couldn't figure out for weeks now but it sounds so simple that i cannot imagine that it's not possible. Assuming that I have data which should be represented in columns, minimum 2 or more. I know there are many ways to create the df and the easiest would be create a dict like
df = pd.DataFrame(
{
'col1' : ['a', 'b', 'c'],
'col2' : ['d', 'e', 'f'],
'col3' : [1, 2, 3],
})
but I would like to create it with the syntax:
df = pd.DataFrame(data="here the lists which represent columns", index='...', columns=['...'])
If I have one single list of values as data with index=list('ABCD') and columns=['col1'] it works. The list of data will be a column in the DataFrame with the shape df.shape=(4,1)
If data parameter looks like this:
data = [['a', 'b', 'c'], ['d', 'e', 'f']]
the output will be a df with shape (3,2) because every list will be interpreted as row where 1st row is "a" and "d" and so on, or if index=list('ABC') and columns=[['col1','col2']]
will be added then I get the ValueError that "2 columns passed, passed data had 3 columns"
A little workaround would be:
df = pd.DataFrame(data=[['a', 'b', 'c'], ['d', 'e', 'f']], index=['col1', 'col2'])
df = df.T
Is there a way I didn't think of? Change the input of "data" from list to Series or np.array also didn't help for me.
CodePudding user response:
You may want to pass the dict
pd.DataFrame(dict(zip(['col1','col2'],data)))
col1 col2
0 a d
1 b e
2 c f
CodePudding user response:
You can assign your data to the column header like this:
data = [['a', 'b', 'c'], ['d', 'e', 'f'], [1,2,3]]
df = pd.DataFrame(data={
'col1': data[0],
'col2': data[1],
'col3': data[2]
})