Home > Software design >  Combine unequal length lists to dataframe pandas with values repeating
Combine unequal length lists to dataframe pandas with values repeating

Time:05-27

How to add a list to a dataframe column such that the values repeat for every row of the dataframe?

mylist = ['one error','delay error']
df['error'] = mylist

This gives error of unequal length as df has 2000 rows. I can still add it if I make mylist into a series, however that only appends to the first row and the output looks like this:

d = {'col1': [1, 2, 3, 4, 5], 
    'col2': [3, 4, 9, 11, 17], 
    'error':['one error',np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=d)

However I would want the solution to look like this:

d = {'col1': [1, 2, 3, 4, 5], 
    'col2': [3, 4, 9, 11, 17], 
    'error':[''one error','delay error'',''one error','delay error'',''one error','delay error'',''one error','delay error'',''one error','delay error'']}
df = pd.DataFrame(data=d)

I have tried ffill() but it didn't work.

CodePudding user response:

You can assign to the result of df.to_numpy(). Note that you'll have to use [mylist] instead of mylist, even though it's already a list ;)

>>> mylist = ['one error']
>>> df['error'].to_numpy()[:] = [mylist]
>>> df
   col1  col2        error
0     1     3  [one error]
1     2     4  [one error]
2     3     9  [one error]
3     4    11  [one error]
4     5    17  [one error]

>>> mylist = ['abc', 'def', 'ghi']
>>> df['error'].to_numpy()[:] = [mylist]
>>> df
   col1  col2            error
0     1     3  [abc, def, ghi]
1     2     4  [abc, def, ghi]
2     3     9  [abc, def, ghi]
3     4    11  [abc, def, ghi]
4     5    17  [abc, def, ghi]

CodePudding user response:

It's not a very clean way to do it, but you can first update your mylist to become the same length as the rows in dataframe, and only then you put it into your dataframe.

mylist = ['one error','delay error']
new_mylist = [mylist for i in range(len(df['col1']))]

df['error'] = new_mylist

CodePudding user response:

Repeat the elements in mylist exactly N times where N is the ceil of quotient obtained after dividing length of dataframe with length of list, now assign this to new column but while assigning make sure that the length of repeated list don't exceed the length of column

df['error'] = (mylist * (len(df) // len(mylist)   1))[:len(df)]

   col1  col2        error
0     1     3    one error
1     2     4  delay error
2     3     9    one error
3     4    11  delay error
4     5    17    one error
  • Related