Home > Software design >  Fast conversion of Pandas DataFrame to key->row presentation
Fast conversion of Pandas DataFrame to key->row presentation

Time:09-08

I need a key, row index for my Pandas DataFrame where key is the id column of Pandas DataFrame and data is the row data.

The data is sparse - I only need to access data for a few keys, but I do not know ahead of time which keys I need to access.

I am currently doing this using iterrows as:

pair_map = {}
for pair_id, data in df.iterrows():
     pair_map[pair_id] = data

However, for a very large number of rows (~100k-1M), this becomes slow. Would there be any faster ways to create sparse key-row indexes for Pandas, so that access to any row arbitrarily would be fast? Even better if the index is sparse and the data pulled out from Pandas on-demand (though I do not think this is possible).

CodePudding user response:

try this:

df.T.to_dict()

I don't know if you can transpose a df with 1M columns and if you re looking for a dict with values with type pd.Series it is not a the solution

CodePudding user response:

I believe you want a dict with "ID" as key and row values as a list values:

pair_map = df.set_index("ID").transpose().to_dict("list")
  • Related