How to organize data frame with several variables in Python?-CodePudding

When I organize data frame with 1 variable, it works well.

import numpy as np

a = np.random.normal(45, 9, 10000)
source = {"Genotype": ["CV1"]*10000, "AGW": a}
df=pd.DataFrame(source)
df

However, when I add more variables, it does not work.

import numpy as np

a = np.random.normal(45, 9, 10000)
b = np.random.normal(35, 10, 10000)

source = {"Genotype": ["CV1"]*10000   ["CV2"]*10000, 
          "AGW": a   b}
df=pd.DataFrame(source)
df

and it says "ValueError: All arrays must be of the same length"

I think the AGW column calculates actual a b which results in 10,000 rows, not array numbers vertically. I want to make data frame with two columns with 20,000 rows.

Could you let me know how to do it?

Thanks!!

CodePudding user response：

Use numpy.hstack for join 2 numpy arrays:

source = {"Genotype": ["CV1"]*10000   ["CV2"]*10000, 
          "AGW": np.hstack((a, b))}
df=pd.DataFrame(source)

Or join lists:

source = {"Genotype": ["CV1"]*10000   ["CV2"]*10000, 
          "AGW": list(a)   list(b)}
df=pd.DataFrame(source)