I have a list like this:
[[{'contributionScore': 0.841473400592804, 'variable': 'series_2'},
{'contributionScore': 0.6113986968994141, 'variable': 'series_3'},
{'contributionScore': 0.5985525250434875, 'variable': 'series_1'},
{'contributionScore': 0.5641148686408997, 'variable': 'series_4'},
{'contributionScore': 0.138543963432312, 'variable': 'series_0'}],
[{'contributionScore': 1.1316605806350708, 'variable': 'series_1'},
{'contributionScore': 0.5188271403312683, 'variable': 'series_4'},
{'contributionScore': 0.38711458444595337, 'variable': 'series_3'},
{'contributionScore': 0.35055238008499146, 'variable': 'series_0'},
{'contributionScore': 0.06044715642929077, 'variable': 'series_2'}]]
How can I obtain a dataframe with a column for each series?
I'd like to get a dataframe with contributionScore for each series.
Thanks!
CodePudding user response:
You should be able to create a dataframe using pd.DataFrame()
. Since each element in the list would be a dataframe itself, you can try using a list comprehension.
Let's say the list its called "raw_list":
df = pd.concat([pd.DataFrame(x).pivot_table(columns='variables') for x in raw_list])
This would output:
contributionScore variable
0 0.841473 series_2
1 0.611399 series_3
2 0.598553 series_1
3 0.564115 series_4
4 0.138544 series_0
EDIT:
Given OPs comment, we should pivot the table first so:
df = pd.concat([pd.DataFrame(x).pivot_table(columns='variables') for x in raw_list])
Outputting:
variable series_0 series_1 series_2 series_3 series_4
contributionScore 0.138544 0.598553 0.841473 0.611399 0.564115
contributionScore 0.350552 1.131661 0.060447 0.387115 0.518827
CodePudding user response:
I am a bit confused with the statement
How can I obtain a dataframe with a column for each series?
if you meant a single column, for all the series data with column "variable" then Celius Stingher's answer should be good enough.
If you meant as in each series value as its own individual column, I will extend on Celius's answer as :
##As already stated above
df = pd.concat([pd.DataFrame(x) for x in raw_list])
##To get a sorted list of unique Series values
series_list = sorted(df['variable'].unique())
##We first get a dictionary where each key is the unique series value and each dictionary value is the list of contributionScore unique to that series value. We turn it into a DataFrame in the end
series_df = pd.DataFrame({series : list(df[df['variable'] == series].["contributionScore"]) for series in series_list})
The output will look like
series_0 series_1 series_2 series_3 series_4
0 0.138544 0.598553 0.841473 0.611399 0.564115
1 0.350552 1.131661 0.060447 0.387115 0.518827
A reminder that this will work only when the series values all have the same count of contribution score.(all series have 2 contribution scores each above)
If each series has different counts of contribution score values, this will work when replaced with the third statement :
## We turn each "series" value and their contribution score as dataframe and concatenate them to accommodate for the varying array lengths of each "series" column.
series_df = pd.concat([pd.DataFrame({series : list(df[df['variable'] == series]["contributionScore"])}) for series in series_list], axis = 1)
Example : If series_3 had 3 contribution Scores it will look like this
series_0 series_1 series_2 series_3 series_4
0 0.138544 0.598553 0.841473 0.611399 0.564115
1 0.350552 1.131661 0.060447 0.387115 0.518827
2 NaN NaN NaN 1.200000 NaN
What pd.concat does here is that it allows us to join pandas DataFrames of different column lengths together. It fills the gap values with NaN. Something that wasnt possible with a mere pd.DataFrame() all together before. The "axis = 1" param tells the function to join the DataFrames created in the list to be "Concatenated" along the columns each.