Change specific values based on loc condition-CodePudding

Sample data:

sample_data = [
  {'Case #': 'A25', 'Parent Case #': 'A24', 'Data': 'Blah blah'},
  {'Case #': 'B46', 'Parent Case #': nan, 'Data': 'Waka waka'},
  {'Case #': 'B89', 'Parent Case #': 'B46', 'Data': 'Moo moo'},
  {'Case #': 'C12', 'Parent Case #': nan, 'Data': 'Meow'},
  {'Case #': 'C44', 'Parent Case #': nan, 'Data': 'Woof'},
  {'Case #': 'C77', 'Parent Case #': 'C12', 'Data': 'Hiss'},
  {'Case #': 'D55', 'Parent Case #': 'D2', 'Data': 'Ribbet'}
]

df = pd.DataFrame(sample_data)

The data consists of cases that may or may not have parent cases (i.e., they may be children or not). No grandchildren / max depth = 1.

However, some of the referenced parents are not present in this data set, and so these cases are effectively orphans.

For the purposes of my data, simply removing the reference to the parent will suffice for orphans. I can identify these orphans like so:

df.loc[~df["Parent Case #"].isna() & ~df2["Parent Case #"].isin(df2["Case #"].values)]

For these two matching rows, I want to remove the "Parent Case #" reference (make that value nan / empty for only these two rows). How do I do this? I feel like I am just missing one final step. I'm not sure how to do assignment using my condition with its & logic.

CodePudding user response：

You need only specify column for assign missing values:

#changed df2 to df
df.loc[df["Parent Case #"].notna() & ~df["Parent Case #"].isin(df["Case #"]), 'Parent Case #'] = np.nan
print (df)
  Case # Parent Case #       Data
0    A25           NaN  Blah blah
1    B46           NaN  Waka waka
2    B89           B46    Moo moo
3    C12           NaN       Meow
4    C44           NaN       Woof
5    C77           C12       Hiss
6    D55           NaN     Ribbet

Another idea - missing values are reassign for all rows:

df.loc[~df["Parent Case #"].isin(df["Case #"]), 'Parent Case #'] = np.nan