Home > Software engineering >  Why am I losing information with .str.split(expand=True)?
Why am I losing information with .str.split(expand=True)?

Time:01-09

I'm trying to expand a column of a dataframe which is made up of strings, something like this:

ATTGG
CATGC
GTGCC

into several columns in a new dataframe.

The command I used is

newdf = pd.DataFrame(df['col'].str.split("", expand = True)

When printing, I found that the first column and the first row are actually the index:

0 1 2 3 4 5
1 C A T G C
2 G T G C C

and that my first row is cut off, presumably because of the presence of the index.

Why is my first row cut off? What can I do to fix this?

CodePudding user response:

Convert your string to list before creating the dataframe:

newdf = pd.DataFrame.from_records(df['col'].map(list))
print(newdf)

# Output
   0  1  2  3  4
0  A  T  T  G  G
1  C  A  T  G  C
2  G  T  G  C  C
  • Related