Having now checked a multitude of Stack Overflow threads on this, I'm struggling to apply the answers to my particular use case so hoping someone can help me on my specific problem.
I'm trying to explode data out of a dictionary into two separate columns while maintaining a multi-index.
Here is what I currently have:
| short_url | platform | css_problem_files |
|-----------|----------|----------------------------------------------------------------------|
| /url_1/ | desktop | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| | mobile | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| /url_2/ | desktop | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| | mobile | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
and here is what I would like to achieve:
| short_url | platform | css_file | css_value |
|-----------|----------|------------|-----------|
| /url_1/ | desktop | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
| | mobile | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
| /url_2/ | desktop | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
| | mobile | css_file_1 | css_value |
| | | css_file_2 | css_value |
| | | css_file_3 | css_value |
The only thing I've come up with that's remotely close to what I need is the below, however this is creating over 200K rows when I'd expect it to be only in the thousands (and I've not included platform yet):
m = pd.DataFrame([*df['css_problem_files']], df.index).stack()\
.rename_axis([None,'css_files']).reset_index(1, name='pct usage')
out = df[['short_url']].join(m)
Any assistance or a point in the right direction would be greatly appreciated
CodePudding user response:
If you turn the dictionaries into lists of key-value pairs, you can explode them and then transform the result into two new columns with .apply(pd.Series)
(and rename them to your liking) like so:
df = (df
.css_problem_files.apply(dict.items) # turn into key value list
.explode() # explode
.apply(pd.Series) # turn into columns
.rename(columns={0: "css_file", 1: "css_value"}) # rename
)
CodePudding user response:
It was annoying to parse your data frame from a Markdown table. It would have been much easier if it were in a DataFrame constructor.
df = pd.read_csv(io.StringIO('''short_url | platform | css_problem_files
/url_1/ | desktop | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_1/ | mobile | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/ | desktop | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/ | mobile | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}'''), sep=r'\s \|\s ', index_col=[0, 1])
df['css_problem_files'] = df['css_problem_files'].apply(eval)
pd.DataFrame.apply(pd.Series)
is the most straightforward way to make this happen, though it is really inefficient (see answer for illustration thereof). This uses the DataFrame constructor directly, which avoids that performance issue. It preserves the index by copying it over to the new data frame and provides columns explicitly:
temp_df = df['css_problem_files'].apply(dict.items).explode()
pd.DataFrame(temp_df.to_list(), index=temp_df.index, columns=['css_file', 'css_value'])
css_file css_value short_url platform /url_1/ desktop css_file_1 css_value desktop css_file_2 css_value desktop css_fle_3 css_value mobile css_file_1 css_value mobile css_file_2 css_value mobile css_fle_3 css_value /url_2/ desktop css_file_1 css_value desktop css_file_2 css_value desktop css_fle_3 css_value mobile css_file_1 css_value mobile css_file_2 css_value mobile css_fle_3 css_value