← Back to Portfolio

Fraud Detection Analysis

How do you know if users are abusing your system?

Solar Image
Solar Image

—Most technical information involving this project has been omitted to protect confidentiality—

Fraud detection and prevention is a necessary step for all applications, businesses and services. From food delivery, insurance to sports and crypto apps, all need layers of security to ensure that users are safe and to catch fraudulent activity when it happens.

In the case of transportation and logistics apps, the fraud detection is needed to ensure drivers and users are not making multiple accounts, are not making fraudulent orders or drivers trying to scam users on our platform. Traditionally, this type of work was manually conducted, using CSVs and pivot tables, complex operations of comparison to determine if users had shared characteristics with other users, or manually go through the flow of driver behaviour to determine the fraud.

Utilizing deep learning, we can now determine if users or drivers are trying to commit fraud without having to manually scan through hundreds of thousands of users. Looking deeper into the user and driver behaviour features, to find patterns that normally humans would have trouble discovering on our own. Utilizing deep learning feature selection to narrow down the most important features in detecting fraud, with techniques such as PCA, as well as look for features that users might have in common.

Simple fraud detection starts with looking at banking information, if payments get transfered, when users or drivers make silly mistakes of misusing the application.

Once we have established a baseline for fraudsters, we can use more complex techniques and algorithms such as feature engineering to find more drivers and users we fit the fraudster profile.

A simple random forest classifier supervised model

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initializing the Random Forest classifier
clf = RandomForestClassifier()

# Training the classifier
clf.fit(X_train, y_train)

# Predicting on the test set
y_pred = clf.predict(X_test)



Security
Security of users and drivers


Looking for users and drivers that attempt to place orders for free. Risk scoring to look for the frequency, location, and past behavior.

Looking for relationships between accounts, users and devices to find unusual connections and clusters. As well as identity verification, to check if individuals are who they claim to be.

id verification contents
Model lifecycle utilized in this project Source: "https://dis-blog.thalesgroup.com/mobile/2018/07/11/identity-verification-service-combating-fraud-and-improving-customer-care"

The data was then run through exploratory data analysis: simple correlation operations to determine the feature with the greatest correlation to fraud detection.



Fraud detection


Libraries:

Some of the tools required to make this project work:
Python - Python is a programming language that lets you work quickly and integrate systems more effectively.
OpenCV - OpenCV is a library of programming functions mainly for real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez…
Pandas - Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
AWS Sagemaker - Amazon SageMaker is a cloud-based machine-learning platform that allows the creation, training, and deployment by developers of machine-learning models on the cloud. It can be used to deploy ML models on embedded systems and edge-devices.