Finding all missing elements in list from original list-CodePudding

In a project that I am working on, I need to calculate distances between vectors. The thing is that I originally only have the non-zero values and the addresses of these sensors (those that did not give zero amplitude values):

 id        address amplitudes
0   1  [a:1,b:1,c:1]    [2,3,5]
1   2  [a:1,c:1,d:1]    [2,4,4]
2   3  [b:1,d:1,f:1]    [3,4,6]

Now, the list of all sensors is ORIG = ['a:1','b:1','c:1','d:1','e:1','f:1']

What I actually want to achieve is the following dataframe:

   id              address     amplitudes
0   1  [a:1,b:1,c:1,0,0,0]  [2,3,5,0,0,0]
1   2  [a:1,0,c:1,d:1,0,0]  [2,0,4,4,0,0]
2   3  [0,b:1,0,d:1,0,f:1]  [0,3,0,4,0,6]

I originally thought about applying a function of this type:

def missing_letter_basic(s):
    for letter in string.ascii_lowercase:
        if letter not in s: return letter
    raise Exception("No missing letter")

to determine thes missing adresses:

DF1['miss'] = DF1['address'].apply(missing_letter_basic)

  id        address amplitudes miss
0   1  [a:1,b:1,c:1]    [2,3,5]    d
1   2  [a:1,c:1,d:1]    [2,4,4]    b
2   3  [b:1,d:1,f:1]    [3,4,6]    a

but as you see, it doesn't do anything intelligent. For one, it doesn't list all missing letters and I have not idea about how to go forward.

Any ideas?

CodePudding user response：

Idea is create list of dictionaries, so possible create DataFrame and add missing values by DataFrame.fillna and DataFrame.reindex:

#if necessary
#df['address'] = df['address'].str.strip('[]').str.split(',')
#df['amplitudes'] = df['amplitudes'].str.strip('[]').str.split(',')

ORIG = ['a:1','b:1','c:1','d:1','e:1','f:1']

d = [dict(zip(a, b)) for a, b in zip(df['address'], df['amplitudes'])]
df1 = pd.DataFrame(d).fillna(0).astype(int).reindex(ORIG, axis=1, fill_value=0)

d2 = [dict(zip(a, a)) for a in df['address']]
df2 = pd.DataFrame(d2).fillna(0).reindex(ORIG, axis=1, fill_value=0)

df['address'] = df2.to_numpy().tolist()
df['amplitudes'] = df1.to_numpy().tolist()
print (df)
   id                   address          amplitudes
0   1  [a:1, b:1, c:1, 0, 0, 0]  [2, 3, 5, 0, 0, 0]
1   2  [a:1, 0, c:1, d:1, 0, 0]  [2, 0, 4, 4, 0, 0]
2   3  [0, b:1, 0, d:1, 0, f:1]  [0, 3, 0, 4, 0, 6]