I am trying to index columns in two different dataframes and there appears to be a mismatch in columns so it makes sense that I keep getting an error. A chunk of the code I am looking to use is the following:
df_3 = df_3[df_1.columns] df = pd.concat([df_1,df_3])
I know you cannot concatenate them until it is a 1:1 match, but I am confused primarily on the first line of code - can somebody define what "df_3 = df_3[df_1.columns]" is doing before it concatenates? This will help me insert the proper columns into the proper dataframes. (I am a beginner in case you couldn't notice)
When I typically run the code above it gives me the following error:
KeyError: "['STATUS', 'ID', 'ATTEMPT', 'TYPE'] not in index"
CodePudding user response:
define what "df_3 = df_3[df_1.columns]" is doing ... ?
Suppose that df_3
has columns a, b, c, d. And df_1
has just a, b, c columns.
Then df_1.columns
will be ['a', 'b', 'c']
.
And the expression you mentioned will project down
from 4 columns to just 3, eliding 'd'
.
In general, the expression df[some_list]
, such as df[['a', 'b', 'c']]
,
will project down to the columns mentioned in that list.
It is important that the original dataframe have all of those mentioned column names, or else an error will be raised. Use set intersection beforehand if you're unsure of which columns will be present.