import pandas as pd
pd.DataFrame({'genre': 'Pop',
'country': 'CA',
'artist_name': 'Olivia Rodrigo',
'title_name': 'good 4 u',
'release_date': '2021-05-13',
'core_genre': 'Pop',
'metrics': [],
'week_id': 202101,
'top_isrc': 'USUG12101245'})
is returning column names but an otherwise empty dataframe, and this is happening because of the empty list for metrics:
. This is a problem. It would be better if this returned a 1-row dataframe with an empty list in the metrics
column.
Here is an example of the data without missing metrics:
{'genre': 'Pop',
'country': 'CA',
'artist_name': 'Olivia Rodrigo',
'title_name': 'drivers license',
'release_date': '2021-01-07',
'core_genre': 'Pop',
'metrics': [{'name': 'Song w/SES On-Demand',
'value': [{'name': 'tp', 'value': 1},
{'name': 'lp', 'value': 0},
{'name': 'ytd', 'value': 1},
{'name': 'atd', 'value': 1}]},
{'name': 'Song w/SES On-Demand Audio',
'value': [{'name': 'tp', 'value': 0},
{'name': 'lp', 'value': 0},
{'name': 'ytd', 'value': 0},
{'name': 'atd', 'value': 0}]},
{'name': 'Streaming On-Demand Total',
'value': [{'name': 'tp', 'value': 414},
{'name': 'lp', 'value': 0},
{'name': 'ytd', 'value': 414},
{'name': 'atd', 'value': 414}]},
{'name': 'Streaming On-Demand Audio',
'value': [{'name': 'tp', 'value': 69},
{'name': 'lp', 'value': 0},
{'name': 'ytd', 'value': 69},
{'name': 'atd', 'value': 69}]}],
'week_id': 202101,
'top_isrc': 'USUG12004749'}
and this is handled quite nicely by pd.DataFrame()
, creating a row for each of the 4 nested options within the list in metrics
. I assume for the same reason pd.DataFrame() returns 4 rows on this second example (4 dicts in the list), pd.DataFrame() returns 0 rows in the example above (0 dicts in the list). However the lost row of data is a problem. How can we handle this?
CodePudding user response:
An empty list can be achieved by passing in a list of an empty list:
df = pd.DataFrame({'genre': 'Pop',
'country': 'CA',
'artist_name': 'Olivia Rodrigo',
'title_name': 'good 4 u',
'release_date': '2021-05-13',
'core_genre': 'Pop',
'metrics': [[]],
'week_id': 202101,
'top_isrc': 'USUG12101245'})
Gives
genre country artist_name title_name release_date core_genre metrics week_id top_isrc
0 Pop CA Olivia Rodrigo good 4 u 2021-05-13 Pop [] 202101 USUG12101245
Or you could make it a list of an empty dict [{}]
too.
Comment:
It's interesting that just specifying a single list returns a blank row, but I suppose from pandas's point of view, it might have trouble distinguishing a vector of row values from a single row value that is a vector, and the default behaviour is to, apparantly, throw the whole row away? Interesting.