In a project that I am working on, I need to calculate distances between vectors. The thing is that I originally only have the non-zero values and the addresses of these sensors (those that did not give zero amplitude values):
id address amplitudes
0 1 [a:1,b:1,c:1] [2,3,5]
1 2 [a:1,c:1,d:1] [2,4,4]
2 3 [b:1,d:1,f:1] [3,4,6]
Now, the list of all sensors is ORIG = ['a:1','b:1','c:1','d:1','e:1','f:1']
What I actually want to achieve is the following dataframe:
id address amplitudes
0 1 [a:1,b:1,c:1,0,0,0] [2,3,5,0,0,0]
1 2 [a:1,0,c:1,d:1,0,0] [2,0,4,4,0,0]
2 3 [0,b:1,0,d:1,0,f:1] [0,3,0,4,0,6]
I originally thought about applying a function of this type:
def missing_letter_basic(s):
for letter in string.ascii_lowercase:
if letter not in s: return letter
raise Exception("No missing letter")
to determine thes missing adresses:
DF1['miss'] = DF1['address'].apply(missing_letter_basic)
id address amplitudes miss
0 1 [a:1,b:1,c:1] [2,3,5] d
1 2 [a:1,c:1,d:1] [2,4,4] b
2 3 [b:1,d:1,f:1] [3,4,6] a
but as you see, it doesn't do anything intelligent. For one, it doesn't list all missing letters and I have not idea about how to go forward.
Any ideas?
CodePudding user response:
Idea is create list of dictionaries, so possible create DataFrame
and add missing values by DataFrame.fillna
and DataFrame.reindex
:
#if necessary
#df['address'] = df['address'].str.strip('[]').str.split(',')
#df['amplitudes'] = df['amplitudes'].str.strip('[]').str.split(',')
ORIG = ['a:1','b:1','c:1','d:1','e:1','f:1']
d = [dict(zip(a, b)) for a, b in zip(df['address'], df['amplitudes'])]
df1 = pd.DataFrame(d).fillna(0).astype(int).reindex(ORIG, axis=1, fill_value=0)
d2 = [dict(zip(a, a)) for a in df['address']]
df2 = pd.DataFrame(d2).fillna(0).reindex(ORIG, axis=1, fill_value=0)
df['address'] = df2.to_numpy().tolist()
df['amplitudes'] = df1.to_numpy().tolist()
print (df)
id address amplitudes
0 1 [a:1, b:1, c:1, 0, 0, 0] [2, 3, 5, 0, 0, 0]
1 2 [a:1, 0, c:1, d:1, 0, 0] [2, 0, 4, 4, 0, 0]
2 3 [0, b:1, 0, d:1, 0, f:1] [0, 3, 0, 4, 0, 6]