How to append from iterated dataframe row a calculated value into a new column into same row-CodePudding

unfortunately I can't get it to write a calculated value from one row to the same row, so that it creates a new dataframe that has two new columns of calculated values.

My dataframe looks like this:

VP	text1	text2
1	Text1	Text2
2	Text3	Text4
3	Text5	Text 6

My goal should look like this:

VP	text1	text2	error_count1	error_count2
1	Text1	Text2	2	5
2	Text3	Text4	4	7
3	Text5	Text 6	8	9

I tried this:

def compare_texts(text1: str, text2: str, data: pd.DataFrame, switch: bool ):
    """
    Compare each text from data with text1 and text2. Return founded errors. 

    :param text1: Correct Text 1 
    :param text2: Correct Text 2
    :param data: dataframe of participant data

    :return data: new dataframe
    """

    # Insert new empty columns for inseration. 
    if switch == False:
        data["error_count1"]        = ""
        data["error_count2"]        = ""
    else:
        data["error_count1_rev"]    = ""
        data["error_count2_rev"]    = ""

    for index, row in data.iterrows():
        # get participant data into variables to pass as parameter
        participant = row['VP']
        pp_text1 = row['text1']
        pp_text2 = row['text2']

        if switch == False:
            error_count_1 = Levenshtein.distance(words(pp_text1), words(text1))
            error_count_2 = Levenshtein.distance(words(pp_text2), words(text2))

            data[index,'error_count1'] = error_count_1  # Here is the problematic code that needs to be adjusted
            data[index,'error_count2'] = error_count_2  
        else:    # Switch compared text, because we changed texts in week 3. 
            error_count_1 = Levenshtein.distance(words(pp_text2), words(text1))
            error_count_2 = Levenshtein.distance(words(pp_text1), words(text2))

            data['error_count1_rev'] = error_count_1
            data['error_count2_rev'] = error_count_2 

    return data

But the end result, unfortunately, looks like this:

VP	text1	text2	error_count1	error_count2	error_count1	error_count 2	error_count1	error_count2
1	Text1	Text2	2	5	4	7	8	9
2	Text3	Text4	2	5	4	7	8	9
3	Text5	Text 6	2	5	4	7	8	9

If I omit "index", then the last value in all rows is stored in the columns.

So I have to make it somehow that only the value in the row of the corresponding column is stored.

CodePudding user response：

solution

using loc, data.loc[index,'error_count1'] = error_count_1

btw

I tested your code, but got result like this

for idx, row in data.iterrows():
    data[idx,'add col'] = idx

  text1 text2  (0, add col)  (1, add col)  (2, add col)
0   ABC   ABC             0             1             2
1   ABC   abc             0             1             2
2   XYZ   ABC             0             1             2

CodePudding user response：

I suggest using pandas.DataFrame.apply for this task consider following simple example: lets say you have text1 and text2 and your task is to find if they are same case-sensitive and case-insensitive then you might do

import pandas as pd
df = pd.DataFrame({'text1':['ABC','ABC','XYZ'],'text2':['ABC','abc','ABC']})
def same(row):
    return {"sensitive":row["text1"]==row["text2"],"insensitive":row["text1"].lower()==row["text2"].lower()}
dfsame = df.apply(same,axis=1,result_type="expand")
dffinal = pd.concat([df,dfsame],axis=1)
print(dffinal)

output

  text1 text2  sensitive  insensitive
0   ABC   ABC       True         True
1   ABC   abc      False         True
2   XYZ   ABC      False        False