Home > Mobile >  Multiple vectorized condition - Length of values between two data frames not matching
Multiple vectorized condition - Length of values between two data frames not matching

Time:02-25

I am trying to perform a rather simple task by using vectorized conditions. The size of the two dataframes differ but still I do not understand why that may an issue.

df1_data = {'In-Person Status': {0: 'No', 1: 'Yes', 2: 'No', 3: 'Yes', 4: 'No', 5: 'Yes'},
 'ID': {0: 5, 1: 45, 2: 22, 3: 34, 4: 46, 5: 184}}
df1 = pd.DataFrame(df1_data)      

df2_data = {'Age': {0: 22, 1: 34, 2: 51, 3: 8}, 'ID': {0: 5, 1: 2145, 2: 5022, 3: 34}}
df2 = pd.DataFrame(df2_data)

I am using the following code:

conditions = [
    (df2['ID'].isin(df1['ID'])) & (df1['In-Person Status'] == 'No')
]
    
value = ['True']

df2['Result'] = NaN
df2['Result'] = np.select(conditions, value, 'False')

Desired output:

 Age             ID       Result 
 22             0005       True
 34             2145       False
 51             5022       False
 8              0034       False

Although the task might be very simple, I am getting the following error message: ValueError: Length of values (72610) does not match length of index (1634)

I would very much appreciate any suggestions.

CodePudding user response:

We can join the two dfs as suggested in the comments, then drop the nan value rows in the Age column. The last couple of lines are optinal to get the format to match your output.

dfj = df1.join(df2, rsuffix='_left')

conditions = [(dfj['ID'].isin(dfj['ID_left'])) & (dfj['In-Person Status'] == 'No')]
    
value = [True]
dfj['Result'] = np.select(conditions, value, False)

dfj = dfj.dropna(axis=0, how='any', subset=['Age'])

dfj = dfj[['Age' , 'ID_left', 'Result']]

dfj.columns = ['Age', 'ID', 'Result']

dfj['ID'] = dfj['ID'].apply(lambda x: str(x).zfill(6)[0:4])

dfj['Age'] = dfj['Age'].astype(int)

Output:

    Age ID      Result
0   22  0005    True
1   34  2145    False
2   51  5022    False
3   8   0034    False
  • Related