I need to add array as column to Dataframe:
results['TEST'] = results.apply(lambda x: results_02, axis=1)
As result I'm getting Dataframe like this:
ID TEST
1 [1,2,3,4,5,6,7,8,9,10]
2 [1,2,3,4,5,6,7,8,9,10]
3 [1,2,3,4,5,6,7,8,9,10]
4 [1,2,3,4,5,6,7,8,9,10]
5 [1,2,3,4,5,6,7,8,9,10]
6 [1,2,3,4,5,6,7,8,9,10]
But I want to add condition to check if results['ID'] in results_02
, add all values except existing to this row, and this I need to do for every row.
So result Dataframe need to be like this:
ID TEST
1 [2,3,4,5,6,7,8,9,10]
2 [1,3,4,5,6,7,8,9,10]
3 [1,2,4,5,6,7,8,9,10]
4 [1,2,3,5,6,7,8,9,10]
5 [1,2,3,4,6,7,8,9,10]
6 [1,2,3,4,5,7,8,9,10]
I thought that I can do it using:
results['TEST'] = results.apply(lambda x: results_02[:10] if x not in results_02[:10] else results_02.remove(x)[:10], axis=1)
But I'm getting error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What is best and more optimized way to solve this problem?
EDIT_1: DF
data = {'ID': [250274, 244473, 240274, 247178, 248667]}
df = pd.DataFrame(data)
results_02 = [250274, 244473, 240274, 247178, 248667]
CodePudding user response:
You can try this:
import numpy as np
import pandas as pd
data = {'ID': [250274, 244473, 240274, 247178, 248667]}
results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])
mask = results.values != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
results
----------------------------------------------
ID TEST
0 250274 [244473, 240274, 247178, 248667]
1 244473 [250274, 240274, 247178, 248667]
2 240274 [250274, 244473, 247178, 248667]
3 247178 [250274, 244473, 240274, 248667]
4 248667 [250274, 244473, 240274, 247178]
----------------------------------------------
If your data frame contains several columns and you are interested only in the ID
column, then you have to specify your mask by reshaping you ID array.
import numpy as np
import pandas as pd
data = {'ID': [250274, 244473, 240274, 247178, 248667], 'some_col': ['A', 'B', 'C', 'D', 'E']}
results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])
mask = results.ID.values.reshape(-1, 1) != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
EDIT
I am not quit sure what you mean by your comment. I suppose you want something like that?
import numpy as np
import pandas as pd
data = {
'ID1': [250274, 244473, 240274, 247178, 248667],
'ID2': [244473, 240274, 247178, 248667, 250274],
}
results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])
results['TEST'] = [result_02[~np.in1d(result_02, row)] for row in results.values]
------------------------------------------------
ID1 ID2 TEST
0 250274 244473 [240274, 247178, 248667]
1 244473 240274 [250274, 247178, 248667]
2 240274 247178 [250274, 244473, 248667]
3 247178 248667 [250274, 244473, 240274]
4 248667 250274 [244473, 240274, 247178]
------------------------------------------------
If not, please make your comment more precise.