Python: Indicate the elements in groupby-CodePudding

I encountered a problem to indicate the element which is not used as the grouping criteria in groupby .

I am expecting the output to be in the format of:

{
    "0": {
          "food_type": "drink", 
          "review": "bad", 
          "example": {"0": "cola", "1": "milk"}
    },
    "1": {
         "food_type": "fruit",
         "review": "good",
         "example": {"0": "apple", "1": "banana", "2": "orange"}
    },
    "2": {
         "food_type": "vegetable", 
         "review": "normal",
         "example": {"0": "cabbage", "1": "carrot"}
    },
}

This is the code I used(suggested by Andrej Kesely), grouping the elements based on the element with index [0].

And I encountered some problem when trying to add more key-value pair when the lst grows larger. I added the "review" part:

from itertools import groupby

lst = [
    ["fruit", 'good', "apple"],
    ["fruit", 'good', "orange"],
    ["fruit", 'good', "banana"],
    ["vegetable", 'normal', "cabbage"],
    ["vegetable", 'normal', "carrot"],
    ["drink", 'bad', "cola"],
    ["drink", 'bad', "milk"],
]

out = {}
for i, (v, g) in enumerate(groupby(sorted(lst), lambda k: k[0])):
    out[str(i)] = {
        "food_type": v,
        "review": (v for i, (_, v, _) in enumerate(g)),
        "example": {str(i): v for i, (_, _, v) in enumerate(g)},
    }

The output:

{
    "0": {
          "food_type": "drink", 
          "review": <generator object <genexpr> at 0x7fcc04f3fdd0>, 
          "example": {"0": "cola", "1": "milk"}
    },
    "1": {
         "food_type": "fruit",
         "review": <generator object <genexpr> at 0x7fcc04f3fc50>,
         "example": {"0": "apple", "1": "banana", "2": "orange"}
    },
    "2": {
         "food_type": "vegetable", 
         "review": <generator object <genexpr> at 0x7fcc04d044d0>,
         "example": {"0": "cabbage", "1": "carrot"}
    },
}

Hope someone has more experience in groupby can give me some suggestions. Thank you!!!

CodePudding user response：

If I understand you correctly, this should work:

lst = [
    ["fruit", 'good', "apple"],
    ["fruit", 'good', "orange"],
    ["fruit", 'good', "banana"],
    ["vegetable", 'normal', "cabbage"],
    ["vegetable", 'normal', "carrot"],
    ["drink", 'bad', "cola"],
    ["drink", 'bad', "milk"],
]


(
    pd.DataFrame(
        lst, columns=["food_type", "review", "example"]
    ).groupby(["food_type", "review"])["example"].unique()
    .reset_index()
    .assign(example = lambda df: (
        df["example"].apply(lambda x: {k:v for k, v in zip(range(len(x)), x)})
    ))
    .T.to_dict()
)

Output:

{0: {'food_type': 'drink', 'review': 'bad', 'example': {0: 'cola', 1: 'milk'}},
 1: {'food_type': 'fruit',
  'review': 'good',
  'example': {0: 'apple', 1: 'orange', 2: 'banana'}},
 2: {'food_type': 'vegetable',
  'review': 'normal',
  'example': {0: 'cabbage', 1: 'carrot'}}}

CodePudding user response：

First of all we need to change the generators into a string. We would use the ', '.join function for this.

"review": ', '.join(v for i, (_, v, _) in enumerate(g)),

But upon doing this I found that good was repeated three times and normal and bad are repeated twice. To get it to look like your desired output data we need to remove the duplicates. sets are often used as a way of storing data when you don't want duplicates. We can use set comprehension instead of generator comprehension to accomplish this which is as easy as inserting some {} brackets like so

"review": ', '.join({v for i, (_, v, _) in enumerate(g)}),

Using that will generate the reviews you are wanting.