I have a python DataFrame with the following:
myDF = pd.DataFrame({"COLUMN_NAME": ["Col1", "Col2", "Col3", "Col4"],
"RULE_1": ["NULL", "DUPLICATE", "TEXT-ONLY", "INTEGER-ONLY"],
"RULE_2": ["DUPLICATE", np.nan, "DUPLICATE", np.nan] })
How can I convert this into a dictionary that looks like that:
my_dict = {"Col1": ["NULL", "DUPLICATE"], "Col2": ["DUPLICATE"], "Col3": ["TEXT-ONLY", "DUPLICATE"], "Col4": ["INTEGER-ONLY"]}
I am stuck doing multiple loops but not really finding a solution.
final_rules_dict = defaultdict(list)
for k in rule_dict:
row_dict = rule_dict[k]
for k in row_dict:
col_name = k
if col_name == "COLUMN_NAMES":
final_rules_dict[col_name].append()
CodePudding user response:
Here is a solution that works with the format you have given us
import json
myDF = pd.DataFrame({"COLUMN_NAME": ["Col1", "Col2", "Col3", "Col4"],
"RULE_1": ["NULL", "DUPLICATE", "TEXT-ONLY", "INTEGER-ONLY"],
"RULE_2": ["DUPLICATE", np.nan, "DUPLICATE", np.nan] })
myDF = myDF.T.reset_index(drop=True)
myDF.columns = myDF.iloc[0]
json.loads(pd.io.json.dumps(myDF[1:].to_dict(orient='list')))
CodePudding user response:
You can prepare your empty dict:
my_dict = {"Col1": [], "Col2": [], "Col3": [], "Col4": []}
based on "COLUMN_NAME"
field.
Then you can iterate over the myDF
keys (ignoring the "COLUMN_NAME"
field ofc) and for each of those keys rule_key
you can write something that looks like:
for i, my_dict_key in enumerate(myDF["COLUMN_NAME"]):
my_dict[my_dict_key].append(myDF[rule_key][i])
CodePudding user response:
d = myDF.set_index('COLUMN_NAME').T.to_dict()
CodePudding user response:
I used your approach to get the dictionary plus some more transformations without null values.
from collections import defaultdict
myDF2 = myDF.T.reset_index(drop=True)
myDF2.columns = myDF2.iloc[0]
myDF3 = myDF2[1:]
dfdict = defaultdict(list)
for (group, col), rule in myDF3.stack().items():
dfdict[col].append(rule)
dfdict