I have a dataset:
id1 id2 val
first10 second5 10
second3 first19 14
first2 second7 8
first10 second10 1
second8 first22 9
I want to swap values in columns id1 and id2 to have in id1 only values containing "first" and in id2 only values containing "second". So desired result is:
id1 id2 val
first10 second5 10
first19 second3 14
first2 second7 8
first10 second10 1
first22 second8 9
How could I do that? i know about str.contains() but how should condition look like?
CodePudding user response:
here is one way do it, using np.where by checking if the first id starts with second, then swap, else keep as is
df['id1'],df['id2']=np.where(df['id1'].str.startswith('second'),
[df['id2'],df['id1']],
[df['id1'],df['id2']]
)
df
id1 id2 val
0 first10 second5 10
1 first19 second3 14
2 first2 second7 8
3 first10 second10 1
4 first22 second8 9
CodePudding user response:
Maybe not the most elegant solution, but you can achieve it, transforming DataFrame in a list of dict and then making it a DataFrame again. For example:
df_list = []
for i in range(df.shape[0]):
row = dict(df.iloc[i])
new_row = {}
for key in row:
val = row[key]
if type(val) == str and val.startswith('f'):
new_row['id1'] = val
elif type(val) == str and val.startswith('s'):
new_row['id2'] = val
else:
new_row['val'] = val
df_list.append(new_row)
new_df = pd.DataFrame(df_list)
CodePudding user response:
My suggestion is to define a function that can take a row of the dataframe and swap the id1
and id2
values in the condition required, and then apply it to the dataframe with df.apply(func, axis=1)
.
def conditional_swap(row):
if row["id1"].startswith("second"):
row["id1"], row["id2"] = row["id2"], row["id1"]
return row
df.apply(conditional_swap, axis=1)
(axis=1
ensures that it is applied row-wise, not column-wise)
Hope this helps!