I would like to iterate through a dataframe rows and concatenate that row to a different dataframe basically building up a different dataframe with some rows.
For example: `IPCSection and IPCClass Dataframes
allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis = 0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
for icl, clrow in IPCClass.iterrows():
if (secrow[0] in clrow[0]):
pdList = [finalpatentclasses, pd.DataFrame(secrow), pd.DataFrame(clrow)]
finalpatentclasses = pd.concat(pdList, axis=0, ignore_index=True)
display(finalpatentclasses)
I want the nan values to dissapear and move all the data under the correct columns. I tried axis = 1 but messes up the column names. Append does not work as well all values are placed diagonally at the table with nan values as well.
CodePudding user response:
The problem with the current implementation is that pd.concat
is being called with axis=0
and ignore_index=True
, resulting in the values from secrow
and clrow
being concatenated vertically and the original indices being ignored. This causes the values to be misaligned with the columns of the final dataframe, as shown in the output.
To solve this problem, you can create a new dataframe that has the same columns as the final dataframe, and then assign the values from secrow
and clrow
to the appropriate columns in the new dataframe. After that, you can append the new dataframe to the final dataframe using the pd.concat
function with axis=0
, as before.
Here is a modified version of the code that should produce the desired output:
allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis=0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
for icl, clrow in IPCClass.iterrows():
if (secrow[0] in clrow[0]):
# Create a new dataframe with the same columns as the final dataframe
newrow = pd.DataFrame(columns=allcolumns)
# Assign the values from secrow and clrow to the appropriate columns in the new dataframe
newrow[IPCSection.columns] = secrow.values
newrow[IPCClass.columns] = clrow.values
# Append the new dataframe to the final dataframe
finalpatentclasses = pd.concat([finalpatentclasses, newrow], axis=0)
display(finalpatentclasses)
This should result in a final dataframe that has the values from secrow
and clrow
concatenated horizontally under the correct columns, with no nan
values.
CodePudding user response:
Alright, I have figured it out. The idea is that you create a newrowDataframe and concatenate all the data in a list from there you can add it to the dataframe and then conc with the final dataframe.
Here is the code:
allcolumns = np.concatenate((IPCSection.columns, IPCClass.columns), axis = 0)
finalpatentclasses = pd.DataFrame(columns=allcolumns)
for isec, secrow in IPCSection.iterrows():
for icl, clrow in IPCClass.iterrows():
newrow = pd.DataFrame(columns=allcolumns)
values = np.concatenate((secrow.values, subclrow.values), axis=0)
newrow.loc[len(newrow.index)] = values
finalpatentclasses = pd.concat([finalpatentclasses, newrow], axis=0)
finalpatentclasses.reset_index(drop=false, inplace=True) display(finalpatentclasses)