I have a pandas dataframe with one of the columns being a numpy array of 600 floats.
key text vectors
id
0 SAG-123 'some long text' [2.88, 0.17, ..., 1.56]
1 SAG-596 'more text' [1.45, 2.24, ..., 3.46]
2 SAG-888 'even more text' [1.45, 2.24, ..., 4.81]
I would like to transform it into the following format:
key text 0 1 ... 600
id
0 SAG-123 'some long text' 2.88 0.17 ... 1.56
1 SAG-596 'more text' 1.45 2.24 ... 3.46
2 SAG-888 'even more text' 1.45 2.24 ... 4.81
CodePudding user response:
Complete minimal example:
import pandas as pd
# Create example dataframe
d = {"key": ["SAG-123", "SAG-596", "SAG-888"], "text": ["some long text", "more text", "even more text"],
"vectors": [[2.88, 0.17, 1.56], [1.45, 2.24, 3.46], [1.45, 2.24, 4.81]]
}
df = pd.DataFrame(data=d)
# Transform dataframe
for i in range(len(df["vectors"][0])):
tmp_lst = []
for v in df["vectors"]:
tmp_lst.append(v[i])
df[i] = tmp_lst
# Drop the unwanted column
df.drop("vectors", axis=1, inplace=True)
print(df)
Output:
key text 0 1 2
0 SAG-123 some long text 2.88 0.17 1.56
1 SAG-596 more text 1.45 2.24 3.46
2 SAG-888 even more text 1.45 2.24 4.81
CodePudding user response:
You can use pandas.Series.astype
with pandas.str.split
:
df_vectors = df.join(df.pop("vectors").astype(str).str.strip("[]").str.split(expand=True))
out = df.join(df_vectors)
# Output :
print(out)
key text 0 1 2 3
id
0 SAG-123 'some long text' 2.88 0.17 ... 1.56
1 SAG-596 'more text' 1.45 2.24 ... 3.46
2 SAG-888 'even more text' 1.45 2.24 ... 4.81