I have the following DateFrame:
| tag | list |
| -------- | ----------------------------------------------------|
| icecream | [['A',0.9],['B',0.6],['C',0.5],['D',0.3],['E',0.1]] |
| potato | [['U',0.8],['V',0.7],['W',0.4],['X',0.3],['Y',0.2]] |
The column list is a list of lists with each list having an item and a value between 1 to 0. The lists are arranged in descending order of this value.
I want to extract each item from here and get the top 3 item but not the item itself. Resultant data frame should be:
| item | top_3 |
| ---- | --------------------------------|
| A | [['B',0.6],['C',0.5],['D',0.3]] |
| B | [['A',0.9],['C',0.5],['D',0.3]] |
| C | [['A',0.9],['B',0.6],['D',0.3]] |
| D | [['A',0.9],['B',0.6],['C',0.5]] |
| E | [['A',0.9],['B',0.6],['C',0.5]] |
| U | [['V',0.7],['W',0.4],['X',0.3]] |
| V | [['U',0.8],['W',0.4],['X',0.3]] |
| W | [['U',0.8],['V',0.7],['X',0.3]] |
| X | [['U',0.8],['V',0.7],['W',0.4]] |
| Y | [['U',0.8],['V',0.7],['W',0.4]] |
I tried and I am able to extract the value, I am stuck at the part where I want to ignore the item itself while creating the top_3. This is what I have done:
data = [['icecream', [['A', 0.9],['B', 0.6],['C',0.5],['D',0.3],['E',0.1]]],
['potato', [['U', 0.8],['V', 0.7],['W',0.4],['X',0.3],['Y',0.2]]]]
df = pd.DataFrame(data, columns=['tag', 'list'])
df
--
temp = {}
for idx, row in df.iterrows():
for item in row["list"]:
temp[item[0]] = row["tag"]
top_items = {}
for idx, row in df.iterrows():
top_items[row["tag"]] = row["list"]
similar = []
for item, category in temp.items():
top_3 = top_items.get(category)
sample = top_3[:3]
similar.append([item, sample])
df = pd.DataFrame(similar)
df.columns = ["item", "top_3"]
My result:
| item | top_3 |
| ---- | --------------------------------|
| A | [['A',0.9],['B',0.6],['C',0.5]] |
| B | [['A',0.9],['B',0.6],['C',0.5]] |
| C | [['A',0.9],['B',0.6],['C',0.5]] |
| D | [['A',0.9],['B',0.6],['C',0.5]] |
| E | [['A',0.9],['B',0.6],['C',0.5]] |
| U | [['U',0.8],['V',0.7],['W',0.4]] |
| V | [['U',0.8],['V',0.7],['W',0.4]] |
| W | [['U',0.8],['V',0.7],['W',0.4]] |
| X | [['U',0.8],['V',0.7],['W',0.4]] |
| Y | [['U',0.8],['V',0.7],['W',0.4]] |
You see, the top_3 is wrong for A,B,C,U,V,W because in all cases it takes top 3 and thus doesn't care about the item itself.
The result I get is always bringing the top 3 and I tried to put filters but unable to get it working.
If there are better ways to extract the data than how I did, do let me know ways to optimize it.
CodePudding user response:
In this part you are missing an if/else condition, you just take the 3 first items ignoring that you should not take the same item key in case is in the top 3
for item, category in temp.items():
top_3 = top_items.get(category)
sample = top_3[:3]
similar.append([item, sample])
Solution would be, remove the item from top_3 first, and then get the "sample"
for item, category in temp.items():
top_3 = top_items.get(category)
top_3_without_item = [x for x in top_3 if x[0] != item]
sample = top_3_without_item[:3]
similar.append([item, sample])
CodePudding user response:
As starting point, you can explode your list
column then merge on itself. Next, you have to remove rows where the two list columns are equal and finally group the top 3 values:
out = df.explode('list')
out = (out.merge(df1, on='tag').query('list_x != list_y')
.sort_values('list_y', key=lambda x: x.str[1], ascending=False)
.assign(item=lambda x: x.pop('list_x').str[0])
.groupby(['tag', 'item'])['list_y'].apply(lambda x: x.head(3).tolist())
.rename('top_3').reset_index())
Output:
>>> out
tag item top_3
0 icecream A [[B, 0.6], [C, 0.5], [D, 0.3]]
1 icecream B [[A, 0.9], [C, 0.5], [D, 0.3]]
2 icecream C [[A, 0.9], [B, 0.6], [D, 0.3]]
3 icecream D [[A, 0.9], [B, 0.6], [C, 0.5]]
4 icecream E [[A, 0.9], [B, 0.6], [C, 0.5]]
5 potato U [[V, 0.7], [W, 0.4], [X, 0.3]]
6 potato V [[U, 0.8], [W, 0.4], [X, 0.3]]
7 potato W [[U, 0.8], [V, 0.7], [X, 0.3]]
8 potato X [[U, 0.8], [V, 0.7], [W, 0.4]]
9 potato Y [[U, 0.8], [V, 0.7], [W, 0.4]]
CodePudding user response:
You can replicate each list with the number of element it has using