I am trying to work with parent child relational data in pandas and having some issues getting the proper parent/child mapping on portions of my data.
I attempted to use ffill and fillna to no avail, but I may have conducted that incorrectly.
I have tried two methods with issues on both. Any assistance getting over this hurdle would be amazing. Thank you for your help.
code:
import pandas as pd
df = pd.DataFrame(
{
"child_string": ["string42","string23","string23","string54","string28","string86","string15","string1"],
"child": [None, 8675, 8675, 8676, 2048, 5442, 1942, 3185],
"parent": [None, 2048, 2048, 2048, 1942, 1942, 3185, None],
"interesting": ["some_unique_field1", "some_unique_field2", "some_unique_field3", "some_unique_field4", "some_unique_field5", "some_unique_field6", "some_unique_field7", "some_unique_field8"]
}
)
# This gives me the right output except for parent string for string 1 and string 42
print(df.merge(
df[['child', 'child_string']].rename(columns={"child":"parent", "child_string": "parent_string"}),
on='parent',
how='left'
))
# This fails with an invalid index error.
df['parent_string'] = df['parent'].map(df.set_index('child').child_string)
print(df)
Expected output:
child_string, child, parent, interesting parent_string
string42, NaN, NaN, some_unique_field1, NaN
string23, 8675, 2048, some_unique_field2, string28
string23, 8675, 2048, some_unique_field3, string28
string54, 8676, 2048, some_unique_field4, string28
string28, 2048, 1942, some_unique_field5, string15
string86, 5442, 1942, some_unique_field6, string15
string15, 1942, 3185, some_unique_field7, string1
string1, 3185, NaN, some_unique_field8, NaN
CodePudding user response:
You can create a dictionary that has the information of "child" column as key
's and "child_string" as values.
child_info = df[['child_string','child']].dropna()
child_string_to_child_dict = dict(zip(child_info.child,child_info.child_string))
>>> child_string_to_child_dict
{8675.0: 'string23',
8676.0: 'string54',
2048.0: 'string28',
5442.0: 'string86',
1942.0: 'string15',
3185.0: 'string1'}
Then you can map
that dictionary on your "parent" column
df['parent_string'] = df['parent'].map(child_string_to_child_dict)
Result:
child_string child parent interesting parent_string
0 string42 NaN NaN some_unique_field1 NaN
1 string23 8675.0 2048.0 some_unique_field2 string28
2 string23 8675.0 2048.0 some_unique_field3 string28
3 string54 8676.0 2048.0 some_unique_field4 string28
4 string28 2048.0 1942.0 some_unique_field5 string15
5 string86 5442.0 1942.0 some_unique_field6 string15
6 string15 1942.0 3185.0 some_unique_field7 string1
7 string1 3185.0 NaN some_unique_field8 NaN
A similar approach to what you tried