Home > Software engineering >  Create pandas DataFrame with more than one column using "data" parameter with lists as inp
Create pandas DataFrame with more than one column using "data" parameter with lists as inp

Time:11-07

I have a problem I couldn't figure out for weeks now but it sounds so simple that i cannot imagine that it's not possible. Assuming that I have data which should be represented in columns, minimum 2 or more. I know there are many ways to create the df and the easiest would be create a dict like

df = pd.DataFrame(
    {
        'col1' : ['a', 'b', 'c'],
        'col2' : ['d', 'e', 'f'],
        'col3' : [1, 2, 3],
    })

but I would like to create it with the syntax:

df = pd.DataFrame(data="here the lists which represent columns", index='...', columns=['...'])

If I have one single list of values as data with index=list('ABCD') and columns=['col1'] it works. The list of data will be a column in the DataFrame with the shape df.shape=(4,1)

If data parameter looks like this:

data = [['a', 'b', 'c'], ['d', 'e', 'f']] 

the output will be a df with shape (3,2) because every list will be interpreted as row where 1st row is "a" and "d" and so on, or if index=list('ABC') and columns=[['col1','col2']] will be added then I get the ValueError that "2 columns passed, passed data had 3 columns"

A little workaround would be:

df = pd.DataFrame(data=[['a', 'b', 'c'], ['d', 'e', 'f']], index=['col1', 'col2'])
df = df.T

Is there a way I didn't think of? Change the input of "data" from list to Series or np.array also didn't help for me.

CodePudding user response:

You may want to pass the dict

pd.DataFrame(dict(zip(['col1','col2'],data)))
  col1 col2
0    a    d
1    b    e
2    c    f

CodePudding user response:

You can assign your data to the column header like this:

data = [['a', 'b', 'c'], ['d', 'e', 'f'], [1,2,3]]
df = pd.DataFrame(data={
  'col1': data[0],
  'col2': data[1],
  'col3': data[2]
})
  • Related