I have a dataframe with 'i' (index) 'x' (observation_number for i) and 'v' (value).
If I have observations less n for an index, I want to add a row with value 0.
Example with n = 5
import pandas as pd
test = pd.DataFrame({'i':[1,1,1,1,2,2],'x':[1,2,3,4,1,2],'v':[1,2,3,4,5,6]})
desired = pd.DataFrame({'i':[1,1,1,1,1,2,2,2,2,2],'x':[1,2,3,4,5,1,2,3,4,5],'v'[1,2,3,4,0,5,6,0,0,0]})
I.e. because i=1 was missing x=5 and i=2 was missing x = 3,4,5, these have been added with v = 0.
What is a reasonably efficient way of doing this?
CodePudding user response:
You can use:
x_values = [1,2,3,4,5]
(test.set_index(['i', 'x'])
.reindex(pd.MultiIndex.from_product([test['i'].unique(),
x_values],
names=['i', 'x']))
.fillna(0)
.reset_index()
)
output:
i x v
0 1 1 1.0
1 1 2 2.0
2 1 3 3.0
3 1 4 4.0
4 1 5 0.0
5 2 1 5.0
6 2 2 6.0
7 2 3 0.0
8 2 4 0.0
9 2 5 0.0
CodePudding user response:
To make it clearer what I was trying to do I have added a crude solution here, although the answer posted by another user seems more elegant.
idx = pd.DataFrame(data=test.i.unique(),columns=['i'])
idx['j'] = 1
numbers = pd.DataFrame(data={'x':range(1,6),'j':[1] * 5})
cross_product=numbers.merge(idx,on='j').drop(columns='j')
combined = test.merge(cross_product,on=['i','x'],how='outer')
combined.fillna(0).sort_values(by=['i','x'])