For example, there are three columns in a dataframe, x, y, z
. x
and y
have 3 different values with 0.5 intervals. They are coordinates so they map with each other and there will be 3*3=9 rows with some z
values. But the actual dataframe has only let say 7 rows. How to add two missing rows with NaN
value in z
column? Below are the example input and output. Thank you!
Input:
DataFrame:
x y z
0 -0.5 0 5
1 -0.5 -0.5 10
2 0 -0.5 7
3 0 0.5 6
4 0 0 12
5 0.5 0 8
6 0.5 0.5 2
Output:
DataFrame:
x y z
0 -0.5 0 5
1 -0.5 -0.5 10
2 0 -0.5 7
3 0 0.5 6
4 0 0 12
5 0.5 0 8
6 0.5 0.5 2
7 -0.5 0.5 NaN // missing row
8 0.5 -0.5 NaN // missing row
CodePudding user response:
One option is with complete from pyjanitor, to add the missing rows, based on a combination of x
and y
:
# pip install pyjanitor
import pandas as pd
import janitor
df.complete('x', 'y')
x y z
0 -0.5 0.0 5.0
1 -0.5 -0.5 10.0
2 -0.5 0.5 NaN
3 0.0 0.0 12.0
4 0.0 -0.5 7.0
5 0.0 0.5 6.0
6 0.5 0.0 8.0
7 0.5 -0.5 NaN
8 0.5 0.5 2.0
complete is just an efficient helper (wrapper around pandas functions); if your data does not have duplicates that can throw off pivot
, then use it:
df.pivot('x', 'y', 'z').stack(dropna=False).rename('z').reset_index()
x y z
0 -0.5 -0.5 10.0
1 -0.5 0.0 5.0
2 -0.5 0.5 NaN
3 0.0 -0.5 7.0
4 0.0 0.0 12.0
5 0.0 0.5 6.0
6 0.5 -0.5 NaN
7 0.5 0.0 8.0
8 0.5 0.5 2.0
CodePudding user response:
import numpy as np
new_row = {"x": -0.5, "y": 0.5, "z": np.nan}
df.append(new_row, ignore_index=True)
I don't know exactly how you calculate "x" and "y" values, but I guess your question is directed towards the NaN
value in the "z" column.
CodePudding user response:
You can find the product of [-0.5,0,0.5]
with itertools.product
import itertools
lst = pd.concat([df['x'], df['y']]).unique().tolist()
p = list(itertools.product(lst, repeat=2))
print(p)
[(-0.5, -0.5), (-0.5, 0.0), (-0.5, 0.5), (0.0, -0.5), (0.0, 0.0), (0.0, 0.5), (0.5, -0.5), (0.5, 0.0), (0.5, 0.5)]
Then fill the missing index:
out = df.set_index(['x', 'y']).reindex(p).reset_index()
print(out)
x y z
0 -0.5 -0.5 10.0
1 -0.5 0.0 5.0
2 -0.5 0.5 NaN
3 0.0 -0.5 7.0
4 0.0 0.0 12.0
5 0.0 0.5 6.0
6 0.5 -0.5 NaN
7 0.5 0.0 8.0
8 0.5 0.5 2.0