Home > Net >  Numpy values as new column in Pandas dataframe
Numpy values as new column in Pandas dataframe

Time:12-14

I have a pandas dataframe with one of the columns being a numpy array of 600 floats.

    key       text              vectors
id
0   SAG-123   'some long text'  [2.88, 0.17, ..., 1.56]
1   SAG-596   'more text'       [1.45, 2.24, ..., 3.46]
2   SAG-888   'even more text'  [1.45, 2.24, ..., 4.81]

I would like to transform it into the following format:

    key       text                0    1  ... 600
id
0   SAG-123   'some long text'  2.88 0.17 ... 1.56
1   SAG-596   'more text'       1.45 2.24 ... 3.46
2   SAG-888   'even more text'  1.45 2.24 ... 4.81

CodePudding user response:

Complete minimal example:

import pandas as pd

# Create example dataframe
d = {"key": ["SAG-123", "SAG-596", "SAG-888"], "text": ["some long text", "more text", "even more text"],
  "vectors": [[2.88, 0.17, 1.56], [1.45, 2.24, 3.46], [1.45, 2.24, 4.81]]
}
df = pd.DataFrame(data=d)

# Transform dataframe
for i in range(len(df["vectors"][0])):
  tmp_lst = []
  for v in df["vectors"]:
    tmp_lst.append(v[i])
  df[i] = tmp_lst

# Drop the unwanted column
df.drop("vectors", axis=1, inplace=True)

print(df)

Output:

       key            text     0     1     2
0  SAG-123  some long text  2.88  0.17  1.56
1  SAG-596       more text  1.45  2.24  3.46
2  SAG-888  even more text  1.45  2.24  4.81

CodePudding user response:

You can use pandas.Series.astype with pandas.str.split :

df_vectors = df.join(df.pop("vectors").astype(str).str.strip("[]").str.split(expand=True))

out = df.join(df_vectors)

# Output :

print(out)

        key              text     0     1    2     3
id                                                  
0   SAG-123  'some long text'  2.88  0.17  ...  1.56
1   SAG-596       'more text'  1.45  2.24  ...  3.46
2   SAG-888  'even more text'  1.45  2.24  ...  4.81
  • Related