Home > front end >  Match column values and copy row to new df
Match column values and copy row to new df

Time:12-19

I am working out an assignment I failed miserably at. I need to go through a dataframe, select rows based on conditions and then copy the row to another dataframe. Trying with df.append(), it seems to be working, but buries my machine and spits out deprecation warnings for each row. It tried pd.concat() but can't get the syntax right. My error is that it's not matching the other columns I don't care about.

The there are around 20k rows, so it should take too long. I'm clearly out there on this.

Yes, I am using iterrows, too. If I need to provide more detail, let me know.

Thanks

KeyError: "None of [Index([1.0, 'A', '9/1/2004', 'Math', 4, '1'], dtype='object')] are in the [columns]"

Here's what I have:

#get rows that are quantitative and match at least one other row on studentID, classDate and IQ
df_isquant = pd.DataFrame([])


for index, row in df_quant.iterrows():
    if row['IQ']== '1':
        for yndex, roe in df_quant.iterrows():
            if roe['IQ'] == row['IQ'] and roe['StudentID'] == row['StudentID'] and roe['ClassDate'] == row['ClassDate']:
                pd.concat(df_isquant[row])
#             df_isquant.append(row)

I am searching for rows that have a value of '1' and if they do, seeing if the row matches any other on 'StudentID', 'IQ' and 'ClassDate'. If it does, copy to another dataframe. I could also simply created another column and use a boolean to mark the rows that fit that description, which might make this easier. But this gave me enough grief that i need the answer now.

CodePudding user response:

Given the provided logic ("I am searching for rows that have a value of '1' and if they do, seeing if the row matches any other on 'StudentID', 'IQ' and 'ClassDate'."), use boolean indexing and concat:

# condition on IQ
m1 = df_quant['IQ'].eq('1')
# are there other rows matching the 3 columns
m2 = df_quant[['ID', 'StudentID', 'ClassDate']].duplicated(keep=False)

# concat
df_isquant = pd.concat([df_isquant, df_quant[m1&m2]])
  • Related