I have the below pandas data frame.
d = {'id1': ['85643', '85644','85643','8564312','8564314','85645','8564316','85646','8564318','85647','85648','85649','85655','56731','34566','78931','78931'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00002','G-00001','G-00001','G-00001','G-00001','G-00001','G-00002','G-00002','G-00002','G-00002','G-00003','G-00003','G-00003'],'col1': [671, 2,5,3,4,5,60,0,0,6,3,2,4,32,3,1,23],'Goal': [np.nan, 56,78,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan, 7, 84,np.nan,5 ], 'col2': [793, 4,8,32,43,55,610,0,0,16,23,72,48,3,28,5,3],'col3': [500, 22,89,33,44,55,60,1,5,6,3,2,4,13,12,14,98],'Date': ['2021-06-13', '2021-06-13','2021-06-14','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16','2021-05-23','2021-05-13','2021-03-26','2021-05-13']}
dff = pd.DataFrame(data=d)
dff
id1 ID col1 Goal col2 col3 Date
0 85643 G-00001 671 NaN 793 500 2021-06-13
1 85644 G-00001 2 56.0000 4 22 2021-06-13
2 85643 G-00002 5 78.0000 8 89 2021-06-14
3 8564312 G-00002 3 NaN 32 33 2021-06-13
4 8564314 G-00002 4 89.0000 43 44 2021-06-14
5 85645 G-00001 5 73.0000 55 55 2021-06-15
6 8564316 G-00001 60 NaN 610 60 2021-06-15
7 85646 G-00001 0 NaN 0 1 2021-06-13
8 8564318 G-00001 0 NaN 0 5 2021-06-16
9 85647 G-00001 6 NaN 16 6 2021-06-13
10 85648 G-00002 3 NaN 23 3 2021-06-13
11 85649 G-00002 2 34.0000 72 2 2021-06-13
12 85655 G-00002 4 NaN 48 4 2021-06-16
13 56731 G-00002 32 7.0000 3 13 2021-05-23
14 34566 G-00003 3 84.0000 28 12 2021-05-13
15 78931 G-00003 1 NaN 5 14 2021-03-26
16 78931 G-00003 23 5.0000 3 98 2021-05-13
Also, I have the below dictionary
dic = {'G-00001':{'aasd':['G-00001','85646','85648','345_2','85655','85659','85647'],
'vaasd':['G-00001','85649','34554','85655','22183','45335','8564316']},
'G-00002':{'aasd2':['G-00002','85343','85649','85655','78931','45121','56731']},
'G-00003':{'gsd3':['G-00003','34566','8564312','45121','78931']}}
I want to get a unique count of id1, based on the list inside the dictionary based on theire ID column. For example, if we consider ID - 'G-0002' has only one list aasd2. 'aasd2':['G-0002','85343','85649','85655','78931','45121','56731']. I want to get how many unique id1's are in the 'G-0002' ID column in pandas' data frame. So it should be for aasd2- 3 values ('85649','85655','56731'). Those are the only three id1 values in pandas for aasd2 in 'G-0002' ID.
So I want to create a table just like below for the list name and count of id1s
listName count of id1s
aasd 2
vaasd 1
aasd2 3
gsd3 2
Is it possible to do this in python? Any suggestion would be appreciated. Thanks in advance!
CodePudding user response:
From my previous answer, I slightly modified the code:
data = []
for g, d in dic.items():
for k, l in d.items():
data.extend([(g, v, k) for v in l])
df1 = pd.DataFrame(data, columns=['ID', 'id1', 'id2'])
out = dff.merge(df1, on=['ID', 'id1']) \
.drop_duplicates(['ID', 'id1']) \
.value_counts('id2')
print(out)
# Output:
id2
aasd2 3
aasd 2
gsd3 2
vaasd 1
dtype: int64
CodePudding user response:
Look through the dictionary and create a dataframe and merge with dff
then take nunique
and create a dictionary with the results:
d={}
for k,v in dic.items():
for k1,v1 in v.items():
tmp = pd.DataFrame(v1,columns=['id1']).iloc[1:].assign(ID=k)
d[k1] = tmp.merge(dff[['id1','ID']])['id1'].nunique()
print(d)
{'aasd': 2, 'vaasd': 1, 'aasd2': 3, 'gsd3': 2}
#for output as series
print(pd.Series(d)
aasd 2
vaasd 1
aasd2 3
gsd3 2
dtype: int64