Efficient dictionary creation from pandas dataframe for looping-CodePudding

I have the following dataframe:

V1 = ['a','a','c','d']
V2 = ['test1', 'test2'  , 'test3'  , 'test4' ]
        
df = pd.DataFrame({'V1':V1,'V2':V2})
print(df.head())

V1     V2
a    test1 
a    test2
c    test3 
d    test4

I would like to loop over it as follow:

for [unique element in v1 column]:
    for [corresponding elements in V2]:

I thought about building a dictionary with the following format:

    dic = { 'a':['test1', 'test2'], 'c':['test3'] , 'd':['test4'] }

for elt in dic:
    for i in dic[elt]:

Is there a better way/more efficient way to do this? If not how can I build such a dictionary efficiently?

Many thanks for your help!

CodePudding user response：

You can aggregate list by GroupBy.agg and then Series convert to dictionary by DataFrame.to_dict:

 #your DataFrame
 df = pd.DataFrame({'V1':V1,'V2':V2})

 d = df.groupby('V1')['V2'].agg(list).to_dict()

CodePudding user response：

Just using python, no pandas! Below code take only O(n) time, so this is pretty fast.

from collections import defaultdict

V1 = ['a','a','c','d']
V2 = ['test1', 'test2'  , 'test3'  , 'test4' ]

my_dict = defaultdict(list)

for x, y in zip(V1, V2):
    my_dict[x].append(y)

print(my_dict)

output

defaultdict(<class 'list'>, {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']})

for name, d in my_dict.items():
    print(f'entering group {name}')
    for value in d:
        print(f' value {value}')

output

entering group a
 value test1
 value test2
entering group c
 value test3
entering group d
 value test4

You can use other pandas group by solutions if you think you have very large datasets otherwise simple & efficient solutions like above is good enough for general use cases.

CodePudding user response：

An option to build the dictionary using pandas would be:

dic = pd.Series(V2, index=V1).groupby(level=0).agg(list).to_dict()

output: {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']}

With classical python, use collections.defaultdict:

from collections import defaultdict
dic = defaultdict(list)
for k,v in zip(V1, V2):
    dic[k].append(v)
    
dict(dic)
# {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']}

To loop over your values from the initial dataframe:

df = pd.DataFrame({'V1':V1,'V2':V2})

for name, d in df.groupby('V1'):
    print(f'entering group {name}')
    for value in d['V2']:
        print(f' value {value}')

output:

entering group a
 value test1
 value test2
entering group c
 value test3
entering group d
 value test4