I want to create a dataframe/matrix where each entry is a concatenation of the corresponding index and column names. I can do it with a nested for loop [see below] but was wondering whether there is a better/faster way to do this?
My problem partly comes from just not knowing how to phrase correctly what I am trying to do. I think it is like matrix multiplication, but with strings and concatenation instead of multiplication. [So any comments on how this problem could be better phrased are welcome too]
I found that itertools has a product() function that does somewhat what I want but creates a generator instead of returning a matrix/dataframe.
[Motivation; In biology we have plates with 96 wells for sample storage etc. They have columns 1-12 and rows A-H. Some machines return the well code (A3, G12 etc.). I want to generate the codes to map between plates and experiment outputs.]
import pandas as pd
rows = [x for x in "ABCDEFGH"]
columns = [str(x) for x in range(1, 13)]
wells = pd.DataFrame(columns=columns, index=rows)
for i in rows:
for j in columns:
wells.loc[i, j] = "".join([i, j])
>wells
Out[37]:
1 2 3 4 5 6 7 8 9 10 11 12
A A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12
B B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12
C C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
D D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12
E E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12
F F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12
G G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12
H H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
CodePudding user response:
You can use np.char.add()
and broadcasting for this:
rows = [x for x in "ABCDEFGH"]
columns = [str(x) for x in range(1, 13)]
data = np.char.add(np.array(rows)[:, None], np.array(columns)[None, :])
wells = pd.DataFrame(data=data, columns=columns, index=rows)
# 1 2 3 4 5 6 7 8 9 10 11 12
# A A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12
# B B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12
# C C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
# D D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12
# E E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12
# F F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12
# G G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12
# H H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
CodePudding user response:
Why not use list comprehension:
wells = pd.DataFrame([[r c for c in columns] for r in rows], columns=columns, index=rows)