sorting csv data in pandas-CodePudding

I'm looking for your advice, I have a CSV file with data:

I'm trying to find all unique keywords in 'keyword' columns and add them to a nested dict e.g.

data = { 
         "sony a5000": {"7/25/22": "422,484", "8/25/22":"405,284", "9/25/22": "402,210" ....,
         "regalo para hombre": {"7/25/22": "422,485", "8/25/22": "382,485", "9/25/22": "327,321" ....,

I have used duplicated() method to find all duplicate keywords in a column and add them to a variable (res = df[df.duplicated('keyword')]) but, I can't manage to add the items into a DICT, I tried for loop but without results.

can you please give me a clue what function or method I should use in pandas to convert the data into dict.

CodePudding user response：

Sample data:

df = pd.DataFrame({
    
    "keyword": ["sony", "sony", "maroon", "maroon"],
    "searches": ["422,484", "422,485", "422,486", "422,487"],
    "date": ["7/5/22", "7/5/23", "7/5/24", "7/5/25"]
    
})

split the data

def extract_values(group):
    d = {}
    for index, row in group.iterrows():
        d[row["date"]] = row["searches"]
        
    return {row["keyword"] : d}
        

result = df.groupby("keyword").apply(lambda x: extract_values(x))
result.tolist()

sample output

[{'maroon': {'7/5/24': '422,486', '7/5/25': '422,487'}},
 {'sony': {'7/5/22': '422,484', '7/5/23': '422,485'}}]

CodePudding user response：

You can use dictionary comprehension for that:

{g[0]:{k:v for k, v in zip(g[1].date, g[1].searches)} for g in df.groupby('keyword')}

Input:

df = pd.DataFrame({"keyword": ["sony", "sony", "maroon", "maroon"], "searches": ["422,484", "422,485", "422,486", "422,487"], "date": ["7/5/22", "7/5/23", "7/5/24", "7/5/25"]})

Output:

{'maroon': {'7/5/24': '422,486', '7/5/25': '422,487'},
 'sony': {'7/5/22': '422,484', '7/5/23': '422,485'}}