There is a list of dicts d
, in which x
is an embedded list, e.g.,
d = [{"name":"Python", "x":[0,1,2,3,4,5]}, # x has 300 elements
{"name":"C ", "x":[0,1,0,3,4,4]},
{"name":"Java","x":[0,4,5,6,1]}]
I want to transform d
to Dataframe
, and add columns automatically for each element in x
that the added column name has a prefix "abc", e,g.,
df.columns = ["name", "abc0", "abc1", ..., "abc300"]
I'm looking for an efficient way, as d
has lots of dicts . When I manually added columns, Python says
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
CodePudding user response:
I hope this is what you need. If it help do upvote and accept the answer.
d = {
"name": "abc",
"x":[i for i in range(300)] # 300 elements
}
df = pd.DataFrame(d)
df = df.T
df.columns = [i str(idx) for idx, i in enumerate(df.iloc[0])]
df.drop(index=df.index[0], axis=0, inplace=True)
df
Out[91]:
abc0 abc1 abc2 abc3 abc4 abc5 abc6 abc7 abc8 abc9 ... abc290 abc291 abc292 \
x 0 1 2 3 4 5 6 7 8 9 ... 290 291 292
abc293 abc294 abc295 abc296 abc297 abc298 abc299
x 293 294 295 296 297 298 299
[1 rows x 300 columns]
CodePudding user response:
Are you looking for something like this:
d = [{"name":"Python", "x":[0,1,2,3,4,5]}, # x has 300 elements
{"name":"C ", "x":[0,1,0,3,4,4]},
{"name":"Java","x":[0,4,5,6,1]}]
df = pd.DataFrame(
{
"name": record["name"],
**{f"abc{i}": n for i, n in enumerate(record["x"])}
}
for record in d
)
Result for your sample:
name abc0 abc1 abc2 abc3 abc4 abc5
0 Python 0 1 2 3 4 5.0
1 C 0 1 0 3 4 4.0
2 Java 0 4 5 6 1 NaN
CodePudding user response:
You can take all content of the list of dictionaries and turn it into a list of strings with the following list comprehension
column_names = [p['name'] str(p['x'][idx]) for p in d for idx in range(len(p['x']))]
for your example, you obtain
['Python0', 'Python1', 'Python2', 'Python3', 'Python4', 'Python5', 'C 0', 'C 1', 'C 0', 'C 3', 'C 4', 'C 4', 'Java0', 'Java4', 'Java5', 'Java6', 'Java1']
and then you can construct an empty DataFrame with
df = pandas.DataFrame(columns=column_names)