I came across a stopper.
In the below code i combined two dfs and then wanted to convert it into a dictionary to do some sanitization, and then convert it back to a df.
But when i convert it back to a df, it seems to only have one of the two df and not the combined version?
#Concatinating the two df
opel_Concat = pd.concat([df,df2])
#Converting to dictionary
opel_Concat.to_dict()
#COnverting back to pd.df
opel_Df = pd.DataFrame.from_dict(opel_Dict)
[![enter image description here][1]][1]
DF2 contains 4328 rows × 17 columns
[![enter image description here][2]][2]
So it seems to only be considering df1?
CodePudding user response:
If you closely look what happens when we convert the concatenated DataFrame to a dictionary, you will see the issue.
Say, df1 is
a b
0 1 3
1 2 4
2 3 5
and df2 is
a b
0 2 9
1 4 8
2 5 7
3 6 6
We concatenate them to get
a b
0 1 3
1 2 4
2 3 5
0 2 9
1 4 8
2 5 7
3 6 6
Notice the index? Yes, it is repeating.
Now what happens when I convert this concatenated DataFrame to dict?
{'a': {0: 2, 1: 4, 2: 5, 3: 6}, 'b': {0: 9, 1: 8, 2: 7, 3: 6}}
Even though there are 7 values in total in concatenated DataFrame, here we see only 4, because the indices are used as dictionary keys and they are overwritten as there are duplicates in index.
So you can solve this by:
concat_df.reset_index().to_dict()