I have some python code that create a dictionary where the key is followed with the value of a variable length array that contains pandas series.
{"key1":[series1,series2,series3], "key2":[series4,series5,series6,..],...}
All dataframes contains 2 columns, Word and frequency
I was wondering what will be the most appropriate method to convert this data structure to a csv. In the following format.
key, Word, frequency
key1, series1[0][0], series1[0][1]
key1, series1[1][0], series1[1][1]
......
key10, series76[100][0], series76[100][1] #<-Arbitrary indexes.
I have tried iterating through the dictionary and doing it like that. However remnants of the series gets saved to the CSV such as
Length: 65, dtype: int64]
Therefore ideally I would like to use pandas.to_csv() in this scenario in order to not need to manually parse this data.
CodePudding user response:
Use concat
with list comprehension:
s1 = pd.Series([2,3],index=['aaa','bbb'])
s2 = pd.Series([1,2,3],index=['ccc','fff','ggg'])
s3 = pd.Series([4,5],index=['ddd','eee'])
d = {"key1": [s1, s2], "key2": [s3]}
df = pd.concat([pd.DataFrame({'key': k, 'word': x.index, 'freq': x.to_numpy()})
for k, v in d.items()
for x in v
if isinstance(x, pd.Series)], ignore_index=True)
print (df)
key word freq
0 key1 aaa 2
1 key1 bbb 3
2 key1 ccc 1
3 key1 fff 2
4 key1 ggg 3
5 key2 ddd 4
6 key2 eee 5
df.to_csv("output.csv", index=False)
CodePudding user response:
One approach:
import pandas as pd
# toy data
df1 = pd.DataFrame([["hello", 1], ["world", 1]], columns=["word", "frequency"])
df2 = pd.DataFrame([["quick", 10], ["brown", 1], ["fox", 3]], columns=["word", "frequency"])
df3 = pd.DataFrame([["rice", 9], ["salt", 1], ["sugar", 7]], columns=["word", "frequency"])
data = {"key1": [df1, df2], "key2": [df3]}
# split in keys and values
keys, values = zip(*[(key, value) for key, values in data.items() for value in values])
# use the keys parameter of concat
df = pd.concat(values, keys=keys).droplevel(-1).reset_index().rename(columns={ "index" : "key"})
df.to_csv("output.csv", index=False)
Output (output.csv)
key,word,frequency
key1,hello,1
key1,world,1
key1,quick,10
key1,brown,1
key1,fox,3
key2,rice,9
key2,salt,1
key2,sugar,7