Home > Back-end >  Split an array into several arrays by comma (or by reshaping/changing read order of the array) pytho
Split an array into several arrays by comma (or by reshaping/changing read order of the array) pytho

Time:01-03

Context: I'm trying to create a dictionary with "column name: values" from a multi-index dataframe and to get the values of the dataframe (df_var) I'm using df_var.values which return a single array separating each column by comma. Is there a way to split this array into several arrays so that we can access each column?

I'm doing this because I want to create a boxplot using a dictionary

The original multi-index data frame (the dataframe has more columns, I'm just showing the first 2):

    Sexo
Masculino   Feminino
0   4245.0  4620.0
1   4274.0  4655.0
2   4304.5  4689.5
3   4322.0  4708.5
4   4318.0  4717.5
5   4288.0  4710.5
6   4247.5  4683.5
7   4224.5  4650.5
8   4223.5  4613.5
9   4191.0  4567.0
10  4133.7  4546.9

Doing df_var.values I get this:

array([[4245. , 4620. ],
       [4274. , 4655. ],
       [4304.5, 4689.5],
       [4322. , 4708.5],
       [4318. , 4717.5],
       [4288. , 4710.5],
       [4247.5, 4683.5],
       [4224.5, 4650.5],
       [4223.5, 4613.5]])

The goal is to separate each column from this array and I'm getting this:

dict_boxplot = {col:val for (col,val) in zip(subcolumns,df_var.values.reshape(2,60))}

subcolumns is simply a dynamic list with the sub-column names (Such as Masculino and Feminino). I say dynamic because this could take 2,3,4, etc elements (column names) I was trying to do reshape to put the columns as rows and while it seems to do something close to what I want, the values are mixed (the values are read from left to right, mixing the data from Masculino and Feminino). I get something like this (in the dict_boxplot):

{'Feminino': array([4812.9, 5170.3, 4800.9, 5159.4, 4795.6, 5156.9, 4800.5, 5164.2,
        4813.4, 5178.1, 4830.9, 5195.2, 4850.2, 5213.7, 4873. , 5236. ,
        4898.8, 5261.4, 4928.2, 5289.7, 4965.3, 5324.6, 5002.9, 5359.8,
        5028.4, 5391.3, 5042.3, 5416.5, 5050.5, 5433.3, 5056.3, 5447.1,
        5061.6, 5460.7, 5067.1, 5475.9, 5068. , 5490.2, 5065. , 5503.3,
        5058.6, 5514.5, 5042. , 5515.6, 5013.1, 5501.8, 4976.9, 5480.4,
        4940.8, 5460.2, 4912.6, 5445.5, 4892. , 5433.5, 4875.1, 5425.2,
        4860. , 5423.8, 4856.2, 5430.1]),
 'Masculino': array([4245. , 4620. , 4274. , 4655. , 4304.5, 4689.5, 4322. , 4708.5,
        4318. , 4717.5, 4288. , 4710.5, 4247.5, 4683.5, 4224.5, 4650.5,
        4223.5, 4613.5, 4191. , 4567. , 4133.7, 4546.9, 4093.6, 4550.1,
        4079. , 4551.5, 4070.5, 4562.6, 4129.7, 4624.7, 4314.8, 4778.6,
        4459.6, 4896.3, 4518.4, 4937.3, 4579.1, 4979.2, 4639.9, 5021.3,
        4700.7, 5065.6, 4746.8, 5104.5, 4777.1, 5134.7, 4800.6, 5157.3,
        4820.2, 5176. , 4834. , 5189.7, 4838.5, 5194.3, 4837.1, 5192.9,
        4831.8, 5187.8, 4824.1, 5180.9])}

And the boxplot has the wrong data (since they're mixed)

fig, ax = plt.subplots()
ax.boxplot(dict_boxplot.values())
ax.set_xticklabels(dict_boxplot.keys())

enter image description here

How could I do it so that each array is the specific column values? Thank you in advance

CodePudding user response:

This would work:

columns = {column : [o[i] for o in df_var.values] for (i, column) in enumerate(subcolumns)}
  • Related