How do you convert a given dataframe with a hierarchical structure and arbitrary shape (say, similar to the one below) into a new dataframe with a parent and child column?
Edit: Note that a constraint is that a child cannot be its own parent.
data = {'level1': ['A', 'A', 'B', 'B', 'C'],
'level2': ['James', 'Robert', 'Patricia', 'Patricia', 'John'],
'level3': ['Stockholm', 'Denver', 'Moscow', 'Moscow', 'Palermo'],
'level4': ['red', 'Denver', 'yellow', 'purple', 'blue']
}
df = pd.DataFrame(data)
level1 level2 level3 level4
0 A James Stockholm red
1 A Robert Denver Denver
2 B Patricia Moscow yellow
3 B Patricia Moscow purple
4 C John Palermo blue
Desired output is something like this:
parent child
0 A James
1 A Robert
2 B Patricia
3 C John
4 James Stockholm
5 Robert Denver
6 Patricia Moscow
7 John Palermo
8 Stockholm red
9 Moscow yellow
10 Moscow purple
11 Palermo blue
CodePudding user response:
What I can think of is using a for loop over the columns of the dataframe:
columns = df.columns
length = len(columns)
parent = []
children = []
for i in range(length):
if i != length - 1 :
parent = df[columns[i]].to_list()
children = df[columns[i 1]].to_list()
newDf = pd.DataFrame({"parent":parent, "children":children}).drop_duplicates()
newDf[newDf["parent"] != newDf["children"]]
Output
parent children
0 A James
1 A Robert
2 B Patricia
4 C John
5 James Stockholm
6 Robert Denver
7 Patricia Moscow
9 John Palermo
10 Stockholm red
12 Moscow yellow
13 Moscow purple
14 Palermo blue