Home > Software engineering >  Pandas - pad groups up to max value
Pandas - pad groups up to max value

Time:09-23

I have a dataframe with 'i' (index) 'x' (observation_number for i) and 'v' (value).

If I have observations less n for an index, I want to add a row with value 0.

Example with n = 5

import pandas as pd
test = pd.DataFrame({'i':[1,1,1,1,2,2],'x':[1,2,3,4,1,2],'v':[1,2,3,4,5,6]})
desired = pd.DataFrame({'i':[1,1,1,1,1,2,2,2,2,2],'x':[1,2,3,4,5,1,2,3,4,5],'v'[1,2,3,4,0,5,6,0,0,0]})

I.e. because i=1 was missing x=5 and i=2 was missing x = 3,4,5, these have been added with v = 0.

What is a reasonably efficient way of doing this?

CodePudding user response:

You can use:

x_values = [1,2,3,4,5]

(test.set_index(['i', 'x'])
     .reindex(pd.MultiIndex.from_product([test['i'].unique(),
                                          x_values],
                                         names=['i', 'x']))
     .fillna(0)
     .reset_index()
)

output:

   i  x    v
0  1  1  1.0
1  1  2  2.0
2  1  3  3.0
3  1  4  4.0
4  1  5  0.0
5  2  1  5.0
6  2  2  6.0
7  2  3  0.0
8  2  4  0.0
9  2  5  0.0

CodePudding user response:

To make it clearer what I was trying to do I have added a crude solution here, although the answer posted by another user seems more elegant.

idx = pd.DataFrame(data=test.i.unique(),columns=['i'])
idx['j'] = 1
numbers = pd.DataFrame(data={'x':range(1,6),'j':[1] * 5})
cross_product=numbers.merge(idx,on='j').drop(columns='j')
combined = test.merge(cross_product,on=['i','x'],how='outer')
combined.fillna(0).sort_values(by=['i','x'])
  • Related