How to map uniquely list in a dataframe?-CodePudding

i am very new to pandas can anybody tell me how to map uniquely lists for a dataframe?

Data

[phone, laptop]

[life, death, mortal]

[happy]

Expected output:

[1,2]

[3,4,5]

[6]

I used map() and enumerate but both give me errors.

CodePudding user response：

For efficiency, use a list comprehension.

For simple counts:

from itertools import count

c = count(1)

df['new'] = [[next(c) for x in l ] for l in df['Data']]

For unique identifiers in case of duplicates:

from itertools import count

c = count(1)
d = {}

df['new'] = [[d[x] if x in d else d.setdefault(x, next(c)) for x in l ] for l in df['Data']]

Output:

                    Data        new
0        [phone, laptop]     [1, 2]
1  [life, death, mortal]  [3, 4, 5]
2                [happy]        [6]

CodePudding user response：

You could explode, replace, and groupby to undo the explode operation:

df = pd.DataFrame({"data": [["phone", "laptop"], 
                            ["life", "death", "mortal"], 
                            ["happy", "phone"]]})

df_expl = df.explode("data")
df["data_mapped"] = (
   df_expl.assign(data=lambda df: range(1, len(df)   1))
          .groupby(df_expl.index).data.apply(list))

print(df)

                    data data_mapped
0        [phone, laptop]      [1, 2]
1  [life, death, mortal]   [3, 4, 5]
2         [happy, phone]      [6, 7]

This always increments the counter, even if list items are duplicates.

In case duplicates should have unique integer values, use factorize instead of range:

df_expl = df.explode("data")
df["data_mapped"] = (
    df_expl.assign(data=df_expl.data.factorize()[0]   1)
           .groupby(df_expl.index).data.apply(list))

print(df)

# output:
                    data data_mapped
0        [phone, laptop]      [1, 2]
1  [life, death, mortal]   [3, 4, 5]
2         [happy, phone]      [6, 1]