Context: I'm trying to create a dictionary with "column name: values" from a multi-index dataframe and to get the values of the dataframe (df_var) I'm using df_var.values
which return a single array separating each column by comma. Is there a way to split this array into several arrays so that we can access each column?
I'm doing this because I want to create a boxplot using a dictionary
The original multi-index data frame (the dataframe has more columns, I'm just showing the first 2):
Sexo
Masculino Feminino
0 4245.0 4620.0
1 4274.0 4655.0
2 4304.5 4689.5
3 4322.0 4708.5
4 4318.0 4717.5
5 4288.0 4710.5
6 4247.5 4683.5
7 4224.5 4650.5
8 4223.5 4613.5
9 4191.0 4567.0
10 4133.7 4546.9
Doing df_var.values
I get this:
array([[4245. , 4620. ],
[4274. , 4655. ],
[4304.5, 4689.5],
[4322. , 4708.5],
[4318. , 4717.5],
[4288. , 4710.5],
[4247.5, 4683.5],
[4224.5, 4650.5],
[4223.5, 4613.5]])
The goal is to separate each column from this array and I'm getting this:
dict_boxplot = {col:val for (col,val) in zip(subcolumns,df_var.values.reshape(2,60))}
subcolumns
is simply a dynamic list with the sub-column names (Such as Masculino
and Feminino
). I say dynamic because this could take 2,3,4, etc elements (column names)
I was trying to do reshape
to put the columns as rows and while it seems to do something close to what I want, the values are mixed (the values are read from left to right, mixing the data from Masculino and Feminino). I get something like this (in the dict_boxplot
):
{'Feminino': array([4812.9, 5170.3, 4800.9, 5159.4, 4795.6, 5156.9, 4800.5, 5164.2,
4813.4, 5178.1, 4830.9, 5195.2, 4850.2, 5213.7, 4873. , 5236. ,
4898.8, 5261.4, 4928.2, 5289.7, 4965.3, 5324.6, 5002.9, 5359.8,
5028.4, 5391.3, 5042.3, 5416.5, 5050.5, 5433.3, 5056.3, 5447.1,
5061.6, 5460.7, 5067.1, 5475.9, 5068. , 5490.2, 5065. , 5503.3,
5058.6, 5514.5, 5042. , 5515.6, 5013.1, 5501.8, 4976.9, 5480.4,
4940.8, 5460.2, 4912.6, 5445.5, 4892. , 5433.5, 4875.1, 5425.2,
4860. , 5423.8, 4856.2, 5430.1]),
'Masculino': array([4245. , 4620. , 4274. , 4655. , 4304.5, 4689.5, 4322. , 4708.5,
4318. , 4717.5, 4288. , 4710.5, 4247.5, 4683.5, 4224.5, 4650.5,
4223.5, 4613.5, 4191. , 4567. , 4133.7, 4546.9, 4093.6, 4550.1,
4079. , 4551.5, 4070.5, 4562.6, 4129.7, 4624.7, 4314.8, 4778.6,
4459.6, 4896.3, 4518.4, 4937.3, 4579.1, 4979.2, 4639.9, 5021.3,
4700.7, 5065.6, 4746.8, 5104.5, 4777.1, 5134.7, 4800.6, 5157.3,
4820.2, 5176. , 4834. , 5189.7, 4838.5, 5194.3, 4837.1, 5192.9,
4831.8, 5187.8, 4824.1, 5180.9])}
And the boxplot has the wrong data (since they're mixed)
fig, ax = plt.subplots()
ax.boxplot(dict_boxplot.values())
ax.set_xticklabels(dict_boxplot.keys())
How could I do it so that each array is the specific column values? Thank you in advance
CodePudding user response:
This would work:
columns = {column : [o[i] for o in df_var.values] for (i, column) in enumerate(subcolumns)}