Insert dictionary of lists as column into a sliced dataframe-CodePudding

In a follow-up of my previous question I am trying add another column to the following sliced dataframe:

>>> df = pd.DataFrame(np.array([[1, 1, 1, 1, 2, 2, 2], [0, 0, 0, 1, 0, 0, 1], ['some text', 'other text', 'more text', 'new text', 'text sample', 'sample', 'sample text'], ['kw1, kw2', 'kw1, kw2, kw3', 'kw1', 'kw1, kw2, kw3, kw4', 'kw1', 'kw1, kw2, kw3', 'kw1, kw2']), columns=['value', 'cluster', 'text', 'keywords'])
>>> result = df.groupby(['value', 'cluster', 'text']).keywords.sum().to_frame()
>>> result =
value      cluster      text      keywords
  1           0      some text       kw1, kw2
                     other text      kw1, kw2, kw3
                     more text       kw1
              1      new text        kw1, kw2, kw3, kw4
  2           0      text sample     kw1
                     sample          kw1, kw2, kw3
              1      sample text     kw1, kw2

Based on the last question, the content of the column I want to add should be based on a dictionary like this:

>>> summary2 = {0: ['some, summary', 'this, too, summ'], 1: ['kws, of, summ', 'summ, based, kw']}

My plan is to match the keys of the dictionary with the column "value" and the items within the dictionary lists with the cluster, so I receive this output:

value      cluster   summary            text       keywords
  1           0      some, summary    some text       kw1, kw2
                                      other text      kw1, kw2, kw3
                                      more text       kw1
              1      this, too, summ   new text        kw1, kw2, kw3, kw4
  2           0      kws, of, summ    text sample     kw1
                                      sample          kw1, kw2, kw3
              1      summ, based, kw  sample text     kw1, kw2

What I've tried so far is the following:

result['summary2'] = result.groupby(['value','cluster']).ngroup().map({item: k for k, v in summary2.items() for item in v})

The column however outputs only NaNs.

CodePudding user response：

If you want to fill the "summary" column by position (i.e. no mapping of value/cluster to the keys), you could try:

#flatten all your dictionary values to a list
lst = [item for sublist in summary2.values() for item in sublist]

#map to ngroup
result['summary'] = result.groupby(["value", "cluster"]).ngroup().map({i: s for i, s in enumerate(lst)})

#assign summary to index and reorder levels if needed
result = result.set_index("summary", append=True).reorder_levels(["value", "cluster", "summary", "text"])

>>> result
                                                     keywords
value cluster summary         text                           
1     0       some, summary   more text                   kw1
                              other text        kw1, kw2, kw3
                              some text              kw1, kw2
      1       this, too, summ new text     kw1, kw2, kw3, kw4
2     0       kws, of, summ   sample            kw1, kw2, kw3
                              text sample                 kw1
      1       summ, based, kw sample text            kw1, kw2

CodePudding user response：

IIUC, you can try apply on rows

result = df.groupby(['value', 'cluster', 'text']).keywords.sum().to_frame()
summary2 = {0: ['some, summary', 'this, too, summ'], 1: ['kws, of, summ', 'summ, based, kw']}

result = (result.assign(summary=result.apply(lambda row: summary2[row.name[0]-1][row.name[1]], axis=1))
          .set_index('summary', append=True)
          .reorder_levels(["value", "cluster", "summary", "text"]))

print(result)

                                                     keywords
value cluster summary         text
1     0       some, summary   more text                   kw1
                              other text        kw1, kw2, kw3
                              some text              kw1, kw2
      1       this, too, summ new text     kw1, kw2, kw3, kw4
2     0       kws, of, summ   sample            kw1, kw2, kw3
                              text sample                 kw1
      1       summ, based, kw sample text            kw1, kw2