When I organize data frame with 1 variable, it works well.
import numpy as np
a = np.random.normal(45, 9, 10000)
source = {"Genotype": ["CV1"]*10000, "AGW": a}
df=pd.DataFrame(source)
df
However, when I add more variables, it does not work.
import numpy as np
a = np.random.normal(45, 9, 10000)
b = np.random.normal(35, 10, 10000)
source = {"Genotype": ["CV1"]*10000 ["CV2"]*10000,
"AGW": a b}
df=pd.DataFrame(source)
df
and it says "ValueError: All arrays must be of the same length"
I think the AGW column calculates actual a b which results in 10,000 rows, not array numbers vertically. I want to make data frame with two columns with 20,000 rows.
Could you let me know how to do it?
Thanks!!
CodePudding user response:
Use numpy.hstack
for join 2 numpy arrays:
source = {"Genotype": ["CV1"]*10000 ["CV2"]*10000,
"AGW": np.hstack((a, b))}
df=pd.DataFrame(source)
Or join list
s:
source = {"Genotype": ["CV1"]*10000 ["CV2"]*10000,
"AGW": list(a) list(b)}
df=pd.DataFrame(source)