So I am doing a Time series/LSTM assignment and I have a stock dataset: https://www.kaggle.com/camnugent/sandp500
There are like 500 companies with a set of rows for each company, in the dataset, and what I want is to add the companies to a dictionary and set the key as the name of each company.
This is what I have for the moment:
dataframe = pd.read_csv('all_stocks_5yr.csv', parse_dates=['date'])
dataframe['date'] = pd.to_datetime(dataframe['date'])
grouped_df = dataframe.groupby('Name')
for i in grouped_df:
df_dict = grouped_df[i].to_dict
CodePudding user response:
This would solve your problem:
gp = dataframe.groupby("Name")
my_dict = {} # This is the output you want
for record in gp: # record is a tuple containing the elements of a row
if record[0] in my_dict: # record[0] will give the name of the company
my_dict[record[0]].append(record)
else:
my_dict[record[0]] = [record]
print(my_dict)
Another way to handle this problem is iterating over the dataframe:
my_dict = {}
for index, record in dataframe.iterrows():
if record['Name'] in my_dict:
my_dict[record['Name']].append(record)
else:
my_dict[record['Name']] = [record]
print(my_dict)