Home > Software engineering >  How do I split a single column dataframe into multiple columns by index
How do I split a single column dataframe into multiple columns by index

Time:11-04

I've browsed a few answers but haven't found the exact thing i'm looking for yet.

I have a pandas dataframe with a single column structured as follows (example)

0 alex
1 7 
2 female
3 nora
4 3
5 female 
...
999 fred 
1000 15 
1001 male 

i want to split that single column into 3 columns holding name, age, and gender. to look something like this:

  name  age  gender
0 alex  7    female
1 nora  3    female
...
100 fred 15  male

is there a way to do this? i was thinking about using the index but not sure how to actually do it

CodePudding user response:

assuming "0" is your column name:

list_a = list(df[0])
a  = np.array(list_a).reshape(-1, 3).tolist()
df2= pd.DataFrame(a,columns = ["name", "age","gender"])

CodePudding user response:

Not the most efficient solution perhaps, but you can use pd.concat() and put them all next to each other, if they're always in order:

df = pd.DataFrame({'Value':['alex',7,'female','nora',3,'female','fred',15,'male']})
df2 = pd.concat([df[(df.index   x) % 3 == 0].reset_index(drop=True) for x in range(3)],axis=1)
df2.columns = ["name", "gender", "age"]

Returns:

name    gender  age
0   alex    female  7
1   nora    female  3
2   fred    male    15

CodePudding user response:

Consider unstack:

import pandas as pd

df = pd.DataFrame(["alex", 7, "female", "nora", 3, "female", "fred", 15, "male"])

people = range(len(df) // 3)
attributes = ["name", "age", "gender"]

multi_index = pd.MultiIndex.from_product([people, attributes])

df.set_index(multi_index).unstack(level=1).droplevel(level=0, axis=1).reindex(columns=attributes)

Output:

   name age  gender
0  alex   7  female
1  nora   3  female
2  fred  15    male

CodePudding user response:

here is one way to do it

# step through the DF and get values for name, age and gender as series
# each starts from 0, 1 and 3

name=df['Value'][::3].values
age=df['Value'][1::3].values
gender=df['Value'][2::3].values

# create a DF based on the values
out=pd.DataFrame({'name': name,
             'age' : age,
            'gender': gender})
out
    name    age  gender
0   alex    7    female
1   nora    3    female
2   fred    15   male
  • Related