I am attempting to find dot product between a dataframe and a column vector (another dataframe). The dataframe looks like below:
df = pd.DataFrame([
[0,23,0, 0, 1,0],
[1,33,1, 0, 0,0],
[2,40,1, 0, 1,1]],
columns=['SN','Age', 'Nice_540', 'Nice_200', 'Nice_153','Nice_124'])
SN |Age |Nice_540 |Nice_200| Nice_153|Nice_124|
----|-----|--------|--------|----------|--------|
0 | 23 | 0 | 0 | 1 | 0 |
1 | 33 | 1 | 0 | 0 | 0 |
2 | 40 | 1 | 0 | 1 | 1 |
I want the rows to matrix multiply numbers of last four column headings (540; 200, etc) so that the last column "FINAL" looks like this after matrix multiplication:
SN | Age |Nice_540 |Nice_200| Nice_153|Nice_124|FINAL |
----|-----|--------|--------|----------|--------|--------|
0 | 23 | 0 | 0 | 1 | 0 | 153 |
1 | 33 | 1 | 0 | 0 | 0 | 540 |
2 | 40 | 1 | 0 | 1 | 1 | 817 |
I have got the first two numbers in the last column right (153 and 540); but for the last one I am getting 540153124 and not 817 as would be expected from dot product.
This is what I have done so far:
AAAA = df.columns.to_list()
AAAA = str(AAAA)
string_pattern = r"\d{3}"
regex_pattern = re.compile(string_pattern)
BBBB = regex_pattern.findall(AAAA)
cols=df.filter(regex='Nice_',axis=1).columns
seR = pd.DataFrame(BBBB,index=cols)
df["FINAL"] = df[cols].dot(seR)
What am I doing wrong to get 540153124 and not 817? Thanks in advance.
CodePudding user response:
You could do:
d = df.filter(like='Nice_')
vals = d.columns.str.split('_').str[-1].astype(int)
df['FINAL'] = d.dot(vals)
output:
SN Age Nice_540 Nice_200 Nice_153 Nice_124 FINAL
0 0 23 0 0 1 0 153
1 1 33 1 0 0 0 540
2 2 40 1 0 1 1 817