Home > Net >  How to combine multiple numpy arrays into a dataframe?
How to combine multiple numpy arrays into a dataframe?

Time:10-25

I have 4 numpy arrays with 5 values each. I need to combine them into a dataframe so that I can run ANOVA tests and Tukey honest significant difference tests.

The arrays are:

low = np.array([59.5, 53.3, 56.8, 63.1, 58.7]) # 1.6 nmhos/cm
med = np.array([55.2, 59.1, 52.8, 54.5, np.nan]) # 3.8
medh = np.array([51.7, 48.8, 53.9, 49.0, np.nan]) # 6.0
high = np.array([44.6, 48.5, 41.0, 47.3, 46.1]) # 10.2

and I need to combine these into a dataframe that when printed would yield the following:

         Yield  EC  
0        59.5   Low      
1        53.3   Low  
2        56.8   Low  
3        63.1   Low    
4        58.7   Low  
5        55.2   Med  
6        59.1   Med  
7        52.8   Med  
8        54.5   Med  
9        NaN    Med  
10       51.7   Medh  
11       48.8   Medh  
12       53.9   Medh  
13       49.0   Medh  
14       NaN    Medh  
15       44.6   high  
16       48.5   high  
17       41.0   high  
18       47.3   high  
19       46.1   high  

What would be the best way to accomplish this? I have tried combining into a single numpy array and passing that into a dataframe, but I get the error message "must pass 2-d input"

data_vals = np.array([[low],[med],[medh],[high]])
tomato_df = pd.DataFrame(data = data_vals)

CodePudding user response:

One approach is to simply use a nested for-loop:

res = (
    pd.DataFrame([[v, name] for arr, name in zip([low, med, medh, high], ["Low", "Med", "Medh", "High"]) for v in arr],
                 columns=["Yield", "EC"]))
print(res)

Output

    Yield    EC
0    59.5   Low
1    53.3   Low
2    56.8   Low
3    63.1   Low
4    58.7   Low
5    55.2   Med
6    59.1   Med
7    52.8   Med
8    54.5   Med
9     NaN   Med
10   51.7  Medh
11   48.8  Medh
12   53.9  Medh
13   49.0  Medh
14    NaN  Medh
15   44.6  High
16   48.5  High
17   41.0  High
18   47.3  High
19   46.1  High

CodePudding user response:

You need to convert them to dataframe and then append:

df_low = pd.DataFrame(low)
df_low['EC'] = 'Low'
df_med = pd.DataFrame(med)
df_med['EC'] = 'Med'
df_medh = pd.DataFrame(medh)
df_medh['EC'] = 'Medh'
df_high = pd.DataFrame(high)
df_high['EC'] = 'High'

df = df_low.append([df_med,df_medh, df_high])
df.rename(columns={ df.columns[0]: 'yield'}, inplace = True)
df

    yield   EC
0   59.5    Low
1   53.3    Low
2   56.8    Low
3   63.1    Low
4   58.7    Low
0   55.2    Med
1   59.1    Med
2   52.8    Med
3   54.5    Med
4   NaN     Med
0   51.7    Medh
1   48.8    Medh
2   53.9    Medh
3   49.0    Medh
4   NaN     Medh
0   44.6    High
1   48.5    High
2   41.0    High
3   47.3    High
4   46.1    High
  • Related