Home > Mobile >  Pandas dataframe add rows with NaN in one column based on values in other columns
Pandas dataframe add rows with NaN in one column based on values in other columns

Time:05-06

For example, there are three columns in a dataframe, x, y, z. x and y have 3 different values with 0.5 intervals. They are coordinates so they map with each other and there will be 3*3=9 rows with some z values. But the actual dataframe has only let say 7 rows. How to add two missing rows with NaN value in z column? Below are the example input and output. Thank you!

Input:

DataFrame:
     x    y   z
0  -0.5   0   5
1  -0.5 -0.5  10
2    0  -0.5  7
3    0   0.5  6
4    0    0   12
5   0.5   0   8
6   0.5  0.5  2

Output:

DataFrame:
     x    y   z
0  -0.5   0   5
1  -0.5 -0.5  10
2    0  -0.5  7
3    0   0.5  6
4    0    0   12
5   0.5   0   8
6   0.5  0.5  2
7  -0.5  0.5  NaN   // missing row
8   0.5 -0.5  NaN   // missing row

CodePudding user response:

One option is with complete from pyjanitor, to add the missing rows, based on a combination of x and y :

# pip install pyjanitor
import pandas as pd
import janitor

df.complete('x', 'y')

     x    y     z
0 -0.5  0.0   5.0
1 -0.5 -0.5  10.0
2 -0.5  0.5   NaN
3  0.0  0.0  12.0
4  0.0 -0.5   7.0
5  0.0  0.5   6.0
6  0.5  0.0   8.0
7  0.5 -0.5   NaN
8  0.5  0.5   2.0

complete is just an efficient helper (wrapper around pandas functions); if your data does not have duplicates that can throw off pivot, then use it:

 df.pivot('x', 'y', 'z').stack(dropna=False).rename('z').reset_index()

     x    y     z
0 -0.5 -0.5  10.0
1 -0.5  0.0   5.0
2 -0.5  0.5   NaN
3  0.0 -0.5   7.0
4  0.0  0.0  12.0
5  0.0  0.5   6.0
6  0.5 -0.5   NaN
7  0.5  0.0   8.0
8  0.5  0.5   2.0

CodePudding user response:

import numpy as np

new_row = {"x": -0.5, "y": 0.5, "z": np.nan}

df.append(new_row, ignore_index=True)

I don't know exactly how you calculate "x" and "y" values, but I guess your question is directed towards the NaN value in the "z" column.

CodePudding user response:

You can find the product of [-0.5,0,0.5] with itertools.product

import itertools

lst = pd.concat([df['x'], df['y']]).unique().tolist()
p = list(itertools.product(lst, repeat=2))
print(p)

[(-0.5, -0.5), (-0.5, 0.0), (-0.5, 0.5), (0.0, -0.5), (0.0, 0.0), (0.0, 0.5), (0.5, -0.5), (0.5, 0.0), (0.5, 0.5)]

Then fill the missing index:

out = df.set_index(['x', 'y']).reindex(p).reset_index()
print(out)

     x    y     z
0 -0.5 -0.5  10.0
1 -0.5  0.0   5.0
2 -0.5  0.5   NaN
3  0.0 -0.5   7.0
4  0.0  0.0  12.0
5  0.0  0.5   6.0
6  0.5 -0.5   NaN
7  0.5  0.0   8.0
8  0.5  0.5   2.0
  • Related