Home > Blockchain >  Check array values and add resulted array as column to pandas dataframe
Check array values and add resulted array as column to pandas dataframe

Time:06-16

I need to add array as column to Dataframe:

results['TEST'] = results.apply(lambda x: results_02, axis=1)

As result I'm getting Dataframe like this:

ID TEST
1  [1,2,3,4,5,6,7,8,9,10]
2  [1,2,3,4,5,6,7,8,9,10]
3  [1,2,3,4,5,6,7,8,9,10]
4  [1,2,3,4,5,6,7,8,9,10]
5  [1,2,3,4,5,6,7,8,9,10]
6  [1,2,3,4,5,6,7,8,9,10]

But I want to add condition to check if results['ID'] in results_02, add all values except existing to this row, and this I need to do for every row.

So result Dataframe need to be like this:

ID TEST
1  [2,3,4,5,6,7,8,9,10]
2  [1,3,4,5,6,7,8,9,10]
3  [1,2,4,5,6,7,8,9,10]
4  [1,2,3,5,6,7,8,9,10]
5  [1,2,3,4,6,7,8,9,10]
6  [1,2,3,4,5,7,8,9,10]

I thought that I can do it using:

results['TEST'] = results.apply(lambda x: results_02[:10] if x not in results_02[:10] else results_02.remove(x)[:10], axis=1)

But I'm getting error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What is best and more optimized way to solve this problem?

EDIT_1: DF

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

df = pd.DataFrame(data)
results_02 = [250274, 244473, 240274, 247178, 248667]

CodePudding user response:

You can try this:

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.values != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
results

----------------------------------------------
    ID       TEST
0   250274  [244473, 240274, 247178, 248667]
1   244473  [250274, 240274, 247178, 248667]
2   240274  [250274, 244473, 247178, 248667]
3   247178  [250274, 244473, 240274, 248667]
4   248667  [250274, 244473, 240274, 247178]
----------------------------------------------

If your data frame contains several columns and you are interested only in the ID column, then you have to specify your mask by reshaping you ID array.

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667], 'some_col': ['A', 'B', 'C', 'D', 'E']}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.ID.values.reshape(-1, 1) != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]

EDIT

I am not quit sure what you mean by your comment. I suppose you want something like that?

import numpy as np
import pandas as pd

data = {
    'ID1': [250274, 244473, 240274, 247178, 248667],
    'ID2': [244473, 240274, 247178, 248667, 250274],
}



results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

results['TEST'] = [result_02[~np.in1d(result_02, row)] for row in results.values]

------------------------------------------------
    ID1     ID2     TEST
0   250274  244473  [240274, 247178, 248667]
1   244473  240274  [250274, 247178, 248667]
2   240274  247178  [250274, 244473, 248667]
3   247178  248667  [250274, 244473, 240274]
4   248667  250274  [244473, 240274, 247178]
------------------------------------------------

If not, please make your comment more precise.

  • Related