Catch Me If You Can

Background

Stripe processes billions of dollars worth of transactions every day. As guardians of the Internet ecosystem, it is our duty to ensure that legitimate merchants can safely transact with their customers, and that we quickly detect and block illegitimate or fraudulent activity.

To detect fraud, we employ various ML models at scale, such as Radar, to detect fraudulent transactions as they come in. These models examine various variables about incoming transactions to determine their authenticity. One input we can look at is the outcome from the credit card networks like Visa/Mastercard. These networks communicate with banks and provide different response codes to reflect the outcomes of credit card transactions.

While they act as a safety net, if we have enough data to determine a merchant is fraudulent, we should proactively block them to protect consumers from malicious activities like the usage of stolen cards.

Today, we will be building a very simple fraud detection model to determine if a merchant is fraudulent or not.

Part 1

Each Stripe merchant has an associated Merchant Consumer Code (MCC) specifying the kind of business the merchant operates. Certain businesses (e.g., airlines, event venues) are more risky than others, so we have different tolerances of fraud for them.

We will start with a very basic algorithm for detecting fraud: if a merchant is ever at or above a certain threshold of the total number of fraudulent transactions (threshold > 1), we will mark them as fraudulent. Additionally, we will only begin marking merchants as fraudulent once we've seen at least some initial number of transactions.

Inputs:

A comma-separated list of codes that are not fraudulent
A comma-separated list of codes that are fraudulent
A table of MCCs and their corresponding fraud thresholds
A table of merchants by account ID and their corresponding MCC
The minimum number of transactions to evaluate a merchant as fraudulent or not
A table of charges (charge ID, account ID, amount, etc.)

Output:

Return a lexicographically sorted comma-separated list of merchants (by account ID) that are fraudulent.

Example:

Input:
"approved","invalid_pin","expired_card"
"do_not_honor","stolen_card","lost_card"
retail,5
airline,2
restaurant,10
acct_1,airline
acct_2,venue
acct_3,retail
0
CHARGE,ch_1,acct_1,100,do_not_honor
CHARGE,ch_2,acct_1,200,approved
CHARGE,ch_3,acct_1,300,do_not_honor

Part 2

We deployed this model to production, but now we are getting complaints from large users like Ticketmaster and Amazon, who have a large number of transactions and are being marked as fraudulent even though most transactions are legitimate.

In this step, we will create a new algorithm that uses the percentage of fraudulent transactions as our threshold. If a merchant's fraud percentage is ever at or above the threshold, they will be marked as fraudulent and remain fraudulent even if their percentage drops with more transactions.

The output will again be a list of merchants that are fraudulent.

Part 3

Our fraud detection model is now working well in production. However, some merchants are incorrectly being marked as fraudulent by the credit card networks. To address this concern, we will include dispute information to help merchants challenge fraud claims.

Stripe NG OA

更多资源与服务

关键词：