I have the following dataframe:
V1 = ['a','a','c','d']
V2 = ['test1', 'test2' , 'test3' , 'test4' ]
df = pd.DataFrame({'V1':V1,'V2':V2})
print(df.head())
V1 V2
a test1
a test2
c test3
d test4
I would like to loop over it as follow:
for [unique element in v1 column]:
for [corresponding elements in V2]:
I thought about building a dictionary with the following format:
dic = { 'a':['test1', 'test2'], 'c':['test3'] , 'd':['test4'] }
for elt in dic:
for i in dic[elt]:
Is there a better way/more efficient way to do this? If not how can I build such a dictionary efficiently?
Many thanks for your help!
CodePudding user response:
You can aggregate list
by GroupBy.agg
and then Series
convert to dictionary by DataFrame.to_dict
:
#your DataFrame
df = pd.DataFrame({'V1':V1,'V2':V2})
d = df.groupby('V1')['V2'].agg(list).to_dict()
CodePudding user response:
Just using python, no pandas! Below code take only O(n)
time, so this is pretty fast.
from collections import defaultdict
V1 = ['a','a','c','d']
V2 = ['test1', 'test2' , 'test3' , 'test4' ]
my_dict = defaultdict(list)
for x, y in zip(V1, V2):
my_dict[x].append(y)
print(my_dict)
output
defaultdict(<class 'list'>, {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']})
for name, d in my_dict.items():
print(f'entering group {name}')
for value in d:
print(f' value {value}')
output
entering group a
value test1
value test2
entering group c
value test3
entering group d
value test4
You can use other pandas group by
solutions if you think you have very large datasets otherwise simple & efficient solutions like above is good enough for general use cases.
CodePudding user response:
An option to build the dictionary using pandas would be:
dic = pd.Series(V2, index=V1).groupby(level=0).agg(list).to_dict()
output: {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']}
With classical python, use collections.defaultdict
:
from collections import defaultdict
dic = defaultdict(list)
for k,v in zip(V1, V2):
dic[k].append(v)
dict(dic)
# {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']}
To loop over your values from the initial dataframe:
df = pd.DataFrame({'V1':V1,'V2':V2})
for name, d in df.groupby('V1'):
print(f'entering group {name}')
for value in d['V2']:
print(f' value {value}')
output:
entering group a
value test1
value test2
entering group c
value test3
entering group d
value test4