I have a dictionary that looks like the following:
date_pair_dict = {
"15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
"15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
"16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
"16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]
}
And a list of headers:
headers = ["date pair header", "header val 1", "header val 2", "header val 3"]
I would like to create a pandas.DataFrame
and write this to Excel, where the format would be the following expected output:
date pair header | header val 1 | header val 2 | header val 3 |
---|---|---|---|
15-02-2022 15-02-2022 | key 1 val 1 | key 1 val 2 | key 1 val 3 |
15-02-2022 16-02-2022 | key 2 val 1 | key 2 val 2 | key 2 val 3 |
16-02-2022 16-02-2022 | key 3 val 1 | key 3 val 2 | key 3 val 3 |
16-02-2022 17-02-2022 | key 4 val 1 | key 4 val 2 | key 4 val 3 |
Right now, I'm using this (arguably very sad) method:
import pandas
date_pair_dict = {
"15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
"15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
"16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
"16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]
}
headers = ["date pair header", "header val 1", "header val 2", "header val 3"]
list_of_keys, list_of_val_1, list_of_val_2, list_of_val_3 = [], [], [], []
for key in date_pair_dict.keys():
list_of_keys.append(key)
val_1, val_2, val_3 = date_pair_dict.get(key)
list_of_val_1.append(val_1)
list_of_val_2.append(val_2)
list_of_val_3.append(val_3)
dataframe = pandas.DataFrame(
{
headers[0]: list_of_keys,
headers[1]: list_of_val_1,
headers[2]: list_of_val_2,
headers[3]: list_of_val_3,
}
)
Which is not scalable whatsoever. In reality, this date_pair_dict
can have any number of keys, each corresponding to a list of any length. The length of these lists will however always remain the same, and will be known beforehand (I will always predefine the headers
list).
Additionally, I believe this runs the risk of me having a dataframe that does not share the same order as the original keys, due to me doing the following:
for key in dictionary.keys():
....
The keys are date pairs, and need to remain in order when used as the first column of the dataframe.
Is there a better way to do this, preferably using a dictionary comprehension?
CodePudding user response:
Like you said you can use a comprehension on your dict key/value pairs:
import pandas as pd
date_pair_dict = {
"15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
"15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
"16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
"16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]
}
headers = ["date pair header", "header val 1", "header val 2", "header val 3"]
df = pd.DataFrame([[k] v for k,v in date_pair_dict.items()], columns=headers)
print(df)
Output:
date pair header header val 1 header val 2 header val 3
0 15-02-2022 15-02-2022 key 1 val 1 key 1 val 2 key 1 val 3
1 15-02-2022 16-02-2022 key 2 val 1 key 2 val 2 key 2 val 3
2 16-02-2022 16-02-2022 key 3 val 1 key 3 val 2 key 3 val 3
3 16-02-2022 17-02-2022 key 4 val 1 key 4 val 2 key 4 val 3
CodePudding user response:
IIUC, you can simply do:
df = pd.DataFrame.from_dict(date_pair_dict, orient='index').reset_index()
df.columns = headers
It's always a good idea to exploit existing methods before trying to come up with your own constructor.
CodePudding user response:
Use list comprehension to create a list of rows from the input dictionary, after which it is straightforward:
import pandas as pd
date_pair_dict = {
"15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
"15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
"16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
"16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]
}
headers = ["date pair header", "header val 1", "header val 2", "header val 3"]
date_pairs = [[k] v for k, v in date_pair_dict.items()]
df = pd.DataFrame(date_pairs)
df.columns = headers
print(df)
# date pair header header val 1 header val 2 header val 3
# 0 15-02-2022 15-02-2022 key 1 val 1 key 1 val 2 key 1 val 3
# 1 15-02-2022 16-02-2022 key 2 val 1 key 2 val 2 key 2 val 3
# 2 16-02-2022 16-02-2022 key 3 val 1 key 3 val 2 key 3 val 3
# 3 16-02-2022 17-02-2022 key 4 val 1 key 4 val 2 key 4 val 3
CodePudding user response:
Some fancy rename
approach:
>>> pandas.DataFrame([[key,*values] for key,values in date_pair_dict.items()]).rename(columns=headers.__getitem__))
date pair header header val 1 header val 2 header val 3
0 15-02-2022 15-02-2022 key 1 val 1 key 1 val 2 key 1 val 3
1 15-02-2022 16-02-2022 key 2 val 1 key 2 val 2 key 2 val 3
2 16-02-2022 16-02-2022 key 3 val 1 key 3 val 2 key 3 val 3
3 16-02-2022 17-02-2022 key 4 val 1 key 4 val 2 key 4 val 3
CodePudding user response:
You can use list traversal to construct the output dictionary:
import pandas
date_pair_dict = {
"15-02-2022 15-02-2022": ["key 1 val 1", "key 1 val 2", "key 1 val 3"],
"15-02-2022 16-02-2022": ["key 2 val 1", "key 2 val 2", "key 2 val 3"],
"16-02-2022 16-02-2022": ["key 3 val 1", "key 3 val 2", "key 3 val 3"],
"16-02-2022 17-02-2022": ["key 4 val 1", "key 4 val 2", "key 4 val 3"]
}
headers = ["date pair header", "header val 1", "header val 2", "header val 3"]
e = [[k] v for k,v in date_pair_dict.items()]
d = {}
for i in range(len(e[0])):
d[headers[i]] = [a[i] for a in e]
d = pandas.DataFrame(d)