I am trying to perform a rather simple task by using vectorized conditions. The size of the two dataframes differ but still I do not understand why that may an issue.
df1_data = {'In-Person Status': {0: 'No', 1: 'Yes', 2: 'No', 3: 'Yes', 4: 'No', 5: 'Yes'},
'ID': {0: 5, 1: 45, 2: 22, 3: 34, 4: 46, 5: 184}}
df1 = pd.DataFrame(df1_data)
df2_data = {'Age': {0: 22, 1: 34, 2: 51, 3: 8}, 'ID': {0: 5, 1: 2145, 2: 5022, 3: 34}}
df2 = pd.DataFrame(df2_data)
I am using the following code:
conditions = [
(df2['ID'].isin(df1['ID'])) & (df1['In-Person Status'] == 'No')
]
value = ['True']
df2['Result'] = NaN
df2['Result'] = np.select(conditions, value, 'False')
Desired output:
Age ID Result
22 0005 True
34 2145 False
51 5022 False
8 0034 False
Although the task might be very simple, I am getting the following error message: ValueError: Length of values (72610) does not match length of index (1634)
I would very much appreciate any suggestions.
CodePudding user response:
We can join the two dfs as suggested in the comments, then drop the nan value rows in the Age column. The last couple of lines are optinal to get the format to match your output.
dfj = df1.join(df2, rsuffix='_left')
conditions = [(dfj['ID'].isin(dfj['ID_left'])) & (dfj['In-Person Status'] == 'No')]
value = [True]
dfj['Result'] = np.select(conditions, value, False)
dfj = dfj.dropna(axis=0, how='any', subset=['Age'])
dfj = dfj[['Age' , 'ID_left', 'Result']]
dfj.columns = ['Age', 'ID', 'Result']
dfj['ID'] = dfj['ID'].apply(lambda x: str(x).zfill(6)[0:4])
dfj['Age'] = dfj['Age'].astype(int)
Output:
Age ID Result
0 22 0005 True
1 34 2145 False
2 51 5022 False
3 8 0034 False