I have a dictionary called "file_dic" with the {key:file_path} structure. I want to read in the file path in pandas dataframe, grab the columns, and see if it exists in the other file paths in the dictionary. My solution works, but i want to avoid a nested for loop. What would be the best way to do this? I'm trying to learn better code lol
file_diff = {}
for i in file_dic.keys():
temp_col1 = pd.read_csv(file_dic[i], nrows=1).columns.tolist()
for j in file_dic.keys():
if (j != i):
temp_col2 = pd.read_csv(file_dic[j], nrows=1).columns.tolist()
diff_cols = sorted(list(set(temp_col1).difference(set(temp_col2))))
file_diff[str(i) ' columns not in ' str(j)] = diff_cols
df = pd.DataFrame.from_dict(file_diff, orient='index').T
CodePudding user response:
As per the comments your second loop isn't necessary, you can use a count
variable to check if you are on the first key (first file) and a previous
variable to keep track of the file you read on the previous iterations:
file_diff = {}
count = 0
for i in file_dic.keys():
if count == 0: ## if first file
previous = pd.read_csv(file_dic[i], nrows=1).columns.tolist()
previous_key = i
else:
temp_col2 = pd.read_csv(file_dic[j], nrows=1).columns.tolist()
diff_cols = sorted(list(set(previous).difference(set(temp_col2))))
file_diff[str(previous_key) ' columns not in ' str(i)] = diff_cols
previous = temp_col2
previous_key = i
count = 1
df = pd.DataFrame.from_dict(file_diff, orient='index').T
This way, previous
stores the previous file read and compare it to the new file read (temp_col2
)