Home > Software engineering >  Finding all missing elements in list from original list
Finding all missing elements in list from original list

Time:11-22

In a project that I am working on, I need to calculate distances between vectors. The thing is that I originally only have the non-zero values and the addresses of these sensors (those that did not give zero amplitude values):

 id        address amplitudes
0   1  [a:1,b:1,c:1]    [2,3,5]
1   2  [a:1,c:1,d:1]    [2,4,4]
2   3  [b:1,d:1,f:1]    [3,4,6]

Now, the list of all sensors is ORIG = ['a:1','b:1','c:1','d:1','e:1','f:1']

What I actually want to achieve is the following dataframe:

   id              address     amplitudes
0   1  [a:1,b:1,c:1,0,0,0]  [2,3,5,0,0,0]
1   2  [a:1,0,c:1,d:1,0,0]  [2,0,4,4,0,0]
2   3  [0,b:1,0,d:1,0,f:1]  [0,3,0,4,0,6]

I originally thought about applying a function of this type:

def missing_letter_basic(s):
    for letter in string.ascii_lowercase:
        if letter not in s: return letter
    raise Exception("No missing letter")

to determine thes missing adresses:

DF1['miss'] = DF1['address'].apply(missing_letter_basic)

  id        address amplitudes miss
0   1  [a:1,b:1,c:1]    [2,3,5]    d
1   2  [a:1,c:1,d:1]    [2,4,4]    b
2   3  [b:1,d:1,f:1]    [3,4,6]    a

but as you see, it doesn't do anything intelligent. For one, it doesn't list all missing letters and I have not idea about how to go forward.

Any ideas?

CodePudding user response:

Idea is create list of dictionaries, so possible create DataFrame and add missing values by DataFrame.fillna and DataFrame.reindex:

#if necessary
#df['address'] = df['address'].str.strip('[]').str.split(',')
#df['amplitudes'] = df['amplitudes'].str.strip('[]').str.split(',')

ORIG = ['a:1','b:1','c:1','d:1','e:1','f:1']

d = [dict(zip(a, b)) for a, b in zip(df['address'], df['amplitudes'])]
df1 = pd.DataFrame(d).fillna(0).astype(int).reindex(ORIG, axis=1, fill_value=0)

d2 = [dict(zip(a, a)) for a in df['address']]
df2 = pd.DataFrame(d2).fillna(0).reindex(ORIG, axis=1, fill_value=0)

df['address'] = df2.to_numpy().tolist()
df['amplitudes'] = df1.to_numpy().tolist()
print (df)
   id                   address          amplitudes
0   1  [a:1, b:1, c:1, 0, 0, 0]  [2, 3, 5, 0, 0, 0]
1   2  [a:1, 0, c:1, d:1, 0, 0]  [2, 0, 4, 4, 0, 0]
2   3  [0, b:1, 0, d:1, 0, f:1]  [0, 3, 0, 4, 0, 6]
  • Related