How to modify dataframe based on column values-CodePudding

I want to add relationships to column 'relations' based on rel_list. Specifically, for each tuple, i.e. ('a', 'b'), I want to replace the relationships column value '' with 'b' in the first row, but no duplicate, meaning that for the 2nd row, don't replace '' with 'a', since they are considered as duplicated. The following code doesn't work fully correct:

import pandas as pd

data = {
  "names": ['a', 'b', 'c', 'd'],
  "ages": [50, 40, 45, 20],
  "relations": ['', '', '', '']
}
rel_list = [('a', 'b'), ('a', 'c'), ('c', 'd')]

df = pd.DataFrame(data)

for rel_tuple in rel_list:
  head = rel_tuple[0]
  tail = rel_tuple[1]

  df.loc[df.names == head, 'relations'] = tail

print(df)

The current result of df is:

     names  ages relations
0     a    50         c
1     b    40          
2     c    45         d
3     d    20

However, the correct one is:

    names  ages relations
0     a    50         b
0     a    50         c
1     b    40          
2     c    45         d
3     d    20

There are new rows that need to be added. The 2nd row in this case, like above. How to do that?

CodePudding user response：

You can craft a dataframe and merge:

(df.drop('relations', axis=1)
   .merge(pd.DataFrame(rel_list, columns=['names', 'relations']),
          on='names',
          how='outer'
         )
  # .fillna('') # uncomment to replace NaN with empty string
 )

Output:

  names  ages relations
0     a    50         b
1     a    50         c
2     b    40       NaN
3     c    45         d
4     d    20       NaN

CodePudding user response：

Instead of updating df you can create a new one and add relations row by row:

import pandas as pd

data = {
  "names": ['a', 'b', 'c', 'd'],
  "ages": [50, 40, 45, 20],
  "relations": ['', '', '', '']
}
rel_list = [('a', 'b'), ('a', 'c'), ('c', 'd')]

df = pd.DataFrame(data)
new_df = pd.DataFrame(data)
new_df.loc[:, 'relations'] = ''

for head, tail in rel_list:
    new_row = df[df.names == head]
    new_row.loc[:,'relations'] = tail
    new_df = new_df.append(new_row)

print(new_df)

Output:

names  ages relations
0     a    50          
1     b    40          
2     c    45          
3     d    20          
0     a    50         b
0     a    50         c
2     c    45         d

Then, if needed, in the end you can delete all rows without value in 'relations':

new_df = new_df[new_df['relations']!='']