I'm working on Ethereum Fraud detection dataset, 0 denoting normal and 1 denoting fradulent
I have train_account.csv
as
account | flag |
---|---|
a17249 | 0 |
a03683 | 1 |
a22146 | 0 |
transactions.csv
as
from_account | to_account |
---|---|
a00996 | b31499 |
a07890 | a22146 |
a22146 | b31504 |
I want to make a test_account.csv
, where only accounts would be given and our task is to find if it is fradulent-1 or normal - 0
account |
---|
a27890 |
a03683 |
a22146 |
Rules I'm following to make below table
- If an account is present in
train_account.csv
account
column then a flag column with value that is present for that account intrain_account.csv
will be added if not then I go for checking if that account is present in trasnaction['from_account'] or trasaction['to_account'] if yes then add flag value for that account in test_account to be 0 else 1
account | flag |
---|---|
a27890 | 1 |
a03683 | 1 |
a22146 | 0 |
I'm planning to add a flag
column based on the above rule, not to add or remove extra rows
PS: I'm beginner and have no clue to do this, Thanks in advance
I tried looping in columns but not sure how to check and add it to the result. some thing like this:
for i in test_account['account']:
if i in train_account['account']:
test_account[i]['flag'] = train_account[i]['flag']
elif i in trasnaction['from_account'] or trasnaction['to_account']:
test_account[i]['flag'] = 1
else:
test_account[i]['flag'] = 0
CodePudding user response:
Try this:
import numpy as np
# step 1
train_accounts = train_account['account'].values
test_account['flag'] = np.where(test_account['account'].isin(train_accounts), train_account['flag'], test_account['flag'])
# step 2
accounts = np.append(transactions['from_account'].values,transactions['to_account'].values)
test_account['flag'] = np.where(test_account['account'].isin(accounts), 1, 0)