Рow to swap values in columns depending on a certain condition?-CodePudding

I have a dataset:

id1          id2         val 
first10     second5      10
second3     first19      14
first2      second7      8
first10     second10     1
second8     first22      9

I want to swap values in columns id1 and id2 to have in id1 only values containing "first" and in id2 only values containing "second". So desired result is:

id1          id2         val 
first10     second5      10
first19     second3      14
first2      second7      8
first10     second10     1
first22     second8      9

How could I do that? i know about str.contains() but how should condition look like?

CodePudding user response：

here is one way do it, using np.where by checking if the first id starts with second, then swap, else keep as is

df['id1'],df['id2']=np.where(df['id1'].str.startswith('second'),
                           [df['id2'],df['id1']],
                           [df['id1'],df['id2']]
                          )
df

        id1         id2     val
0   first10     second5     10
1   first19     second3     14
2   first2      second7     8
3   first10     second10    1
4   first22     second8     9

CodePudding user response：

Maybe not the most elegant solution, but you can achieve it, transforming DataFrame in a list of dict and then making it a DataFrame again. For example:

df_list = []
for i in range(df.shape[0]):
    row = dict(df.iloc[i])
    new_row = {}
    for key in row:
        val = row[key]
        if type(val) == str and val.startswith('f'):
            new_row['id1'] = val
        elif type(val) == str and val.startswith('s'):
            new_row['id2'] = val
        else:
            new_row['val'] = val
    df_list.append(new_row)

new_df = pd.DataFrame(df_list)

CodePudding user response：

My suggestion is to define a function that can take a row of the dataframe and swap the id1 and id2 values in the condition required, and then apply it to the dataframe with df.apply(func, axis=1).

def conditional_swap(row):
    if row["id1"].startswith("second"):
        row["id1"], row["id2"] = row["id2"], row["id1"]
    return row

df.apply(conditional_swap, axis=1)

(axis=1 ensures that it is applied row-wise, not column-wise)

Hope this helps!