Sample data:
sample_data = [
{'Case #': 'A25', 'Parent Case #': 'A24', 'Data': 'Blah blah'},
{'Case #': 'B46', 'Parent Case #': nan, 'Data': 'Waka waka'},
{'Case #': 'B89', 'Parent Case #': 'B46', 'Data': 'Moo moo'},
{'Case #': 'C12', 'Parent Case #': nan, 'Data': 'Meow'},
{'Case #': 'C44', 'Parent Case #': nan, 'Data': 'Woof'},
{'Case #': 'C77', 'Parent Case #': 'C12', 'Data': 'Hiss'},
{'Case #': 'D55', 'Parent Case #': 'D2', 'Data': 'Ribbet'}
]
df = pd.DataFrame(sample_data)
The data consists of cases that may or may not have parent cases (i.e., they may be children or not). No grandchildren / max depth = 1.
However, some of the referenced parents are not present in this data set, and so these cases are effectively orphans.
For the purposes of my data, simply removing the reference to the parent will suffice for orphans. I can identify these orphans like so:
df.loc[~df["Parent Case #"].isna() & ~df2["Parent Case #"].isin(df2["Case #"].values)]
For these two matching rows, I want to remove the "Parent Case #" reference (make that value nan
/ empty for only these two rows). How do I do this? I feel like I am just missing one final step. I'm not sure how to do assignment using my condition with its & logic.
CodePudding user response:
You need only specify column for assign missing values:
#changed df2 to df
df.loc[df["Parent Case #"].notna() & ~df["Parent Case #"].isin(df["Case #"]), 'Parent Case #'] = np.nan
print (df)
Case # Parent Case # Data
0 A25 NaN Blah blah
1 B46 NaN Waka waka
2 B89 B46 Moo moo
3 C12 NaN Meow
4 C44 NaN Woof
5 C77 C12 Hiss
6 D55 NaN Ribbet
Another idea - missing values are reassign for all rows:
df.loc[~df["Parent Case #"].isin(df["Case #"]), 'Parent Case #'] = np.nan