Drift Protocol Technical Incident Report — 2022/05/11

19 min readMay 26, 2022

Background

Drift Protocol is an open-sourced dynamic AMM (DAMM) and decentralised orderbook (DLOB) perpetual swap exchange built on Solana. Drift provides leveraged trading of perpetual contracts for top crypto-assets.

Drift Protocol was paused on 2022/05/11 in response to a rapid increase in rate of the withdrawal of user funds. Drift successfully paused the protocol prior to a complete depletion of user funds. The pause was critical to ensure that no further funds were drained from the protocol.

This post-mortem provides a chronology of the incident, explains the underlying issues that led to the rapid withdrawal of funds, demonstrates how an attacker could have exploited the issues, describes how the issues forced the exchange to be paused and proposes immediate fixes and areas for further improvement ahead of Drift v2.

Drift’s core contributors are now full steam ahead working on a vastly improved Drift v2 that fixes the immediate technical issues present in v1 and builds upon v1’s foundation to ship a more robust leveraged trading system with sufficient risk mechanisms and guardrails, while taking into account community feedback.

Drift v1 is now sunset with all positions settled and the total settled collateral of $19.5m ($4.95m from the remaining Insurance Fund and $14.55m from financing) repaid to affected traders in full. Drift v2 is expected to launch in July with improvements outlined in Section 2.

1. Incident Details

In this section, Drift provides a chronology of events, a detailed breakdown of the issues that took place and a full code-based proof of concept to detail the bug.

a. Incident Chronology

May 11 12 AM to 12 PM UTC — Sharp spike in withdrawals noticed
Within about 12 hours starting May 11 12AM UTC, $8.72m of collateral net withdrew from the system, pushing the pool from $13.66m to $4.94m. The core developers noticed the alarming rate of withdrawals from the protocol and started to investigate the issue.
May 11 1:15 PM UTC — First pause to de-risk markets
Due to volatility in LUNA, trading was halted so that the core developers could reduce risk on the exchange. Risk was reduced by increasing initial margin requirement, the base market spreads, and the exchange fee.
May 11 2:39 PM UTC — Exchange unpaused and k reserves reduced
After derisking the exchange, the core developers believed that the risk was low enough for the exchange to re-operate. Upon unpausing, the core developers continued to investigate the root issue and reduced k reserves across the board for LUNA, SOL, AVAX, BTC, and ETH. Reducing k (the reserves) helps reduce growth in a Long-Short Imbalance.
May 11 7:29 PM UTC — Second pause to investigate withdrawal bug
The core developers identified the realised PNL and Insurance Fund withdrawal bug as the source of the drastic protocol withdrawals. The protocol does not have a withdrawal kill-switch or social loss mechanism so the core developers were forced to pause the exchange to distribute the remaining funds back to users.
May 16—Methodology for Position Settlement
With exchange paused, positions needed to be settled. The core developers developed a methodology to settle positions and realised collateral alongside the community.
May 18 — “Drift Drain” Exploit POC code published [link]
This code showcases how the protocol bug allows an attacker to drain entire vault in a single transaction.
May 20 — Redemption UI Live
May 21 — POC working patch for withdrawal bug pushed
May 25 — Settlement Resolution
Drift secured financing to cover the outstanding shortfall of $14.5mm in full to traders.
May 26 — Technical Incident Report Published
May 27 — Redemption Live with Full Outstanding Settled Shortfall Amount

b. Root Issues

There were two levels of issues at play — one is a surface-level unchecked withdrawal and PNL accounting bug — and one is a deeper leveraged trading design issue. This section delves into the two root causes of the incident.

Based on community feedback, the document uses the term issues instead of bug or design flaw. The Drift core developers acknowledge that there were challenges with the v1 implementation and are eager to resolve them in Drift v2.

i. Unchecked Withdrawals

At the surface level, this issue stemmed from the system mischaracterising realised collateral and allowing any profits to be withdrawn without any checks, gates or earmarking of funds and without a built-in socialised loss and clawback mechanism.

The withdrawal bug is described in more detail in Section 1b and is illustrated in a proof-of-concept with code and visualisations in Section 1d.

ii. Unchecked Leverage leading to Levered Losses

The root problem of levered losses exist in all leveraged markets, whether vAMM or orderbook-based. With Unchecked Leverage, issues can take place when there is an overextension of leverage of one side (long, short) of the positions such that large moves in price lead to an imbalance of realised profits and realised losses in the system.

If the amount of unrealised losses is greater than realised gains in a system, the system has an unrealised levered loss. While this is a problem for all leveraged markets, whether vAMM or orderbook-based, unchecked leverage can be amplified by long-short imbalances in a vAMM.

c. Issue Details

The vAMM enables asynchronous trading between users. Being asynchronous means one user enters into a long or short position before additional users enter their positions (versus synchronous trading where a long and short position are entered simultaneously).

There are two important properties of asynchronous trading:

PNL is attributed to users based on the order in which they open and close positions. The vAMM can not guarantee the order in which positions are opened/closed and thus the order in which PNL is realised;
There can be a different notional value of longs and shorts (also referred to as a long-short imbalance).

There were three issues that, catalysed by the severe LUNA drawdown on May 11 2022, caused the rapid withdrawal of user funds and thus for the emergency action that resulted in the protocol to be paused:

Ability for users to withdraw positive PNL without equal amounts of negative PNL being realised;
Ability for users to withdraw immediately from the Insurance Fund with no limitations or guardrails.
Unchecked leverage with respect to a market’s long-short imbalance;

The first issue with Drift v1 is that it does not account for the order in which users realise PNL. Users can realise positive PNL and withdraw their profit before an equivalent amount of negative PNL is realised. The issue was discovered as the sudden decrease in the price of LUNA-PERP allowed users to withdraw large amounts of realised PNL, without equivalent amounts of negative PNL being realised.

This accounting issue ultimately leads to shortfall between the sum of users’ withdrawable balance (the minimum of a user’s realised and free collateral) and the vault balance.

At the time the exchange was paused, the withdrawable collateral amount was $10.11M (shortfall of $5.18m). The script for this calculation can be found here.

The withdrawable collateral shortfall is the minimum of a user’s realised and free collateral minus the current vault balance. Note that this is not the same as the amount as the settled collateral, which is described in Section 1e.

The second issue is that Drift v1 allowed the positive realised PNL without offsetting realised losses from a single market to spill over and affect the solvency of all markets.

Users were able to withdraw their positive PNL immediately from the Insurance Fund balance budgeted for markets’ fee pools, instead of isolating the risks to a single pool.

The third issue is that Drift v1 extended a consistent amount of leverage as the long-short imbalance increased (i.e. as the mark price and terminal price diverge). As the long-short imbalance increased, shorts should have been allowed less leverage. Instead, by providing unchecked leverage to shorts, the exchange offered excessive free collateral and amplified the amount of withdrawals from the exchange. This left an implicit levered loss if the net user position (sum of all user positions) were forced to close.

The combination of unchecked withdrawals and unchecked leverage creates an exploit that enables a potential attacker to drain the protocol in one transaction. This is illustrated in the section below (1d).

d. Issue Exploit Proof-of-Concept

The core developers have written a code simulation of the withdrawal bug to illustrate the severity of the issue, where an attacker could exploit the system to extract all user funds in the vault within a single transaction. This PoC would have earned the maximum $500k bug bounty as per Drift’s Immunefi bounty if it had been uncovered by a white-hat or a community member.

This code example is a proof of a possible exploit had there been a malicious party in the exchange. In Drift’s May 11 incident, there is no proof that there was a malicious attacker. This POC simply illustrates how a hypothetical attacker could have exploited the entire system.

POC

Assume there are ten users who have each deposited $1M into the protocol (for a total of $10M deposits). In this example, an attacker can deposit $1.75M, exploit the vulnerability, and withdraw $11.75M from the system.

The SOL market is initialised at $53 with the AMM reserves that have the same magnitude as the SOL AMM on mainnet. The attack requires the oracle to be invalid (i.e. have stale data or be low confidence) in order for the attacker to bypass the oracle-mark divergence guardrails, thus the oracle is mocked to be invalid.

First the attacker creates two Drift User Accounts and deposits $875k for each account (bringing the total deposits to $11.75M).

2. Account 1 opens a 20x long position on SOL-PERP and pushes the price to $146. With $875k of collateral, this is a $17.5M notional long position on SOL-PERP.

3. Account 2 takes the same long trade of $17.5M notional long on SOL-PERP. This pushes the mark price of the contract to $285.45.

4. Account 1 closes their long position to realise a profit of $12M and then withdraws $11.75M (all user funds).

The instructions are shown in the below code snippet, where each attacker pushes the price using two accounts and is able to close at a huge profit and withdraw all realised profits, draining the rest of the user funds.

This POC demonstrates how an attacker can exploit the realised PNL accounting issue to withdraw all user funds (within a single transaction). Figures 1 and 2 shows how all innocent deposits within a system can be drained by a single attacker by manipulating the mark price of the system.

Figure 1 shows Mark Price vs Time as the attacker accounts open and closed highly levered positions at constant leverage. Figure 2 shows how this attack drains innocent users’ funds

To further visualise the attack, we also show how the attacker’s accounts change in terms of UPNL and collateral over time.

Figure 3 shows how the attacker’s first account (Account 1) realises a $12M gain by closing its 20x long position at T=5.

Figure 4 shows the attacker’s second account (Account 2) suffers a $12M loss at T=6 due to Account 1 closing their position. However, Account 2 only has $857K worth of collateral. When Account 1 withdraws their $12M gain, the funds are taken from innocent users’ collateral, draining the Insurance Fund.

Figure 3 shows the attacker’s UPNL and subsequent realised PNL by closing its 20x long. Figure 4 shows the attacker’s second account which suffers a loss of $12m despite its collateral being only $864K. As a result, the shortfall is directly withdrawn from innocent users’ funds.

The attack can be prevented by the solutions outlined in Section 2 below. Both immediate solutions and longer term solutions to mitigate any such attack are described in detail. Prior to launching Drift v2, the Drift core contributors will open-source test cases of this scenario and similar situations of the system under extreme stress to showcase its robustness.

e. Issue Consequences

The game theory behind the DAMM is that users will continue to hold positions amidst a long-short imbalance to receive funding payments and re-peg emissions subsidised by the collection of fees in the Insurance Fund.

As a result of this incident, without fees in the Insurance Fund, the game theory breaks down, with the most likely outcome being a race between users to exit their positions and withdraw their collateral. Once the core developers discovered the issues outline above, the exchange was paused such that the remaining collateral could be socialised and distributed across all users.

To socialise and distribute collateral across all users, the vAMM must enter its terminal state i.e. all users’ positions must be closed simultaneously. When closed simultaneously, all users take from fixed exit liquidity as the price reverts to its terminal price. This leads to users’ realised PNL being different than their unrealised PNL at the time the exchange was paused.

While users collectively had an unrealised PNL of $14.9M when the exchange paused, they have a realised PNL of -$10.7M in the terminal state (for a divergence of $25.6M). In the Luna Market, the unrealised PNL was $4.4M while the realised PNL is -$7.6M in the terminal state (for a divergence of $12M) [3].

With all users settled at the terminal state, users’ realised collateral balance is $5.75M. With $4.94M left deposited in the protocol, this means the system had a levered loss (or shortfall) of $810K* in the terminal state.

*Due to a miscalculation a previous post stated the levered loss was $10.4M. The correct and up-to-date calculation can be found here.

Abruptly settling the vAMM to its terminal state is an unexpected and unfavorable event for users.

Understanding this, Drift committed to settling the realised collateral balance to $20.89M based on the methodology outlined here.

After excluding team and investor accounts, the total settled collateral amount claimable by users is $19.49M. With only $4.94M left deposited in the protocol, the settled collateral shortfall is $14.5M.

Given the abrupt nature of the pause and settlement, Drift settled with intention to subsidise users such that they are able to settle at far above vAMM terminal price. Drift is doing this to ensure goodwill in the community .

A script to calculate the PNL divergence, levered loss and other stats can be found here.

2. Solutions

Drift v1 has now been sunset with all positions settled. Drift has secured $14.5m in financing to cover the full settlement shortfall to users, which will be redeemable on May 27, 2022.

The core developers have since been working on pushing out immediate fixes (Section 2a) that will resolve the withdrawal bug as well as improving the design and guardrails (Section 2b) around the DAMM mechanism.

Drift is also increasing the bounty available for any future mechanism exploits from $500k to $1m as per our Immunefi bounty.

Given the solutions outlined below, Drift is confident of a strong and robust v2 launch within the next two months.

a. Immediate Fixes

Immediate fixes can be made to prevent the shortfall in realised collateral and to block users from withdrawing funds budgeted for market fee pools.

During periods of market imbalance, to prevent the shortfall in realised collateral (and to block the exploit outlined above), one naive solution is for the protocol to only allow users to withdraw their realised gains if there has been an offsetting realised loss in the same market. A simple example of this change has been implemented here, where realised losses are tracked on the market account and realised gains only credited to a user’s collateral balance when there has been offsetting loss.

A more practical solution is to add an auxiliary pool of capital that lends capital to traders who want to withdraw positive realised PNL prior to there being offsetting losses. This pool of capital would earn premiums paid for by early withdrawers, as well as fees from the protocol, enabling users to withdraw gains instantly from the pool.

The core developers have also already added a way to block users from instantly withdrawing Insurance Funds budgeted for market fee pools, the PR for which can be found here. The vault for each market’s fee pool could also be siloed into their own token accounts to further isolate risk and prevent spillover from volatile markets.

b. Design Improvements

Future design improvements for Drift v2 must address the protocol’s Long-Short Imbalance, Unchecked Leverage, and the Long-Short-Leverage Imbalance. The issue with PNL accounting is amplified by leverage and long-short imbalance. The goal of the protocol has always been to reduce long-short imbalance to an acceptable range.

Design Issues

Long-Short Imbalance

Both the virtual automated market maker (vAMM) and orderbook (OB) are different implementations of a market. While an OB forces and matches longs and shorts, a vAMM allows for long-short imbalances. When longs and shorts are imbalanced, there is an implicit leverage multiplier in short positions when mark price is below 50% of the terminal price. For example, a user who is 1x short at $50, will have no collateral remaining if the price moves up more than 2x.

Unchecked Leverage

Since both OB’s and vAMM’s allow for the under-collateralisation of positions through leverage, long and short collateralisations can be unequal, creating an attack vector like the one described above. With this in mind, it’s clear to understand the benefits of a risk management system measuring the per side leverage and having pricing, realised PNL, and withdraw constraints. Even something simple but strict and/or bounded is better than 100% unchecked leverage and withdrawals.

To demonstrate the issue surrounding leverage, the core developers has also detailed ways where orderbook systems need to have constraints on the above, otherwise those collateral vaults are susceptible to leverage-based drain attacks by a single attacker [4]. We’ve worked closely with other members in the DeFi community to understand theses risks to user funds.

Long-Short-Leverage Imbalance

Combining these two risks (Long-Short Imbalance being vAMM specific and Unchecked Leverage applying to any exchange), it is clearly important to have a holistic measure of risks to solvency within the protocol. In addition to other planned improvements for Drift v2, a focus on complete solvency and built-in mechanisms to handle insolvency (i.e. socialised losses) is paramount.

Design Solutions

With that said, below are proposed features for Drift v2 that the core developers believe will solve the issues outlined in this article:

1. Passive Maker and Liquidity Provider System

While Drift v1 had the concept of 1) individual makers concentrating liquidity with high specificity to dampen volatility (pushed out in May) and 2) dynamic virtual reserves partially collateralised with a fee pool, Drift v2 will be exploring passive liquidity provisions to further increase the collateralisation of imbalanced positions and unused liquidity on the exchange.

By increasing the collateralisation of virtual liquidity, even before imbalances occurs, the implicit leverage within the protocol is reduced across multiple market scenarios. Passive makers can take the opposing side of takers to provide liquidity to collect a rebate and allow the designated maker (dAMM) maintain target leverage without increasing solvency risks. Drift has always been committed to adding an extensible user maker order system that can compose with its dynamic virtual liquidity.

2. Formulaic Market Parameters

Formulaic re-pegging / k-adjustment: The lack of formulaic parameters from Day 1 of the protocol being live has exacerbated long-short imbalance. In v2, formulaic parameters will be built in from the start (they were in review for v1).
Dynamic fee / spreads: Dynamic fees from Day 1 will help to reduce long-short imbalance in the long run, especially on volatile markets. Fix spreads, the precursor to dynamic spreads, were shipped in April.
Dynamic leverage: In Drift v1, leverage was extended to users on a constant basis, regardless of the long-short-leverage imbalance or the health of the market, even though constant leverage has potentially compounding effects on the health of the market. Leverage can be offered to users dynamically, scaling up or down based on the health of any individual market (based on current leverage, current mark-terminal divergence, or long-short-leverage imbalance).

3. Advanced Price Curves

Drift v1 used a simple constant product curve (xy = k) to describe liquidity. This was insufficient to protect the vAMM from quoting poorly during highly volatile price action, where a more advanced curve (e.g. a Gaussian function) with bids and asks would have been able to dampen the impact on the vAMM.

4. Advanced Protocol Circuit Breakers

Drift v1 would have benefitted from the ability to implement isolated circuit breakers without needing to shut the entire system down. This includes:

Ability to pause withdrawals without pausing trading,
Mechanism to close and auto-settle individual markets (see Suggestion 5),
Limit up/down circuit breakers (price bands) at volatility levels where risk measures exceed a set threshold.

5. Built in socialised loss mechanism when there is an Insurance Fund deficit

There are several design choices for socialised losses (after insurance fund depletion), each with tradeoffs on which users are targets:

Spread loss equally to all users in system
Spread loss equally to all users in market
Spread loss equally to winners in market
Auto-deleveraging mechanism (ADL), which spreads loss by function of leverage to winners in market

On top of the protocol design improvements, Drift is committed to improving trader understanding on vAMMs. Improving documentation of concepts is paramount. Drift also intends to improve its showcasing of data and context around data. For instance, precisions defined in a glossary are difficult to use, and the addition of visualisations through advanced analytics would best illustrate concepts in documentation, leading to improved showcasing of vAMM mechanics and solvency within the protocol.

Concluding Remarks

Drift sincerely thanks the community for their constant feedback and contributions as Drift continues to strive toward creating a reliable, secure, and robust derivatives DEX. While the incident has been tumultuous, it has been packed with key learnings and takeaways that will serve to significantly improve future iterations of the protocol. Drift’s core thesis on the value of virtual liquidity persists. Drift remains fully committed to making the outlined changes and to launching v2 as swiftly as possible within the next two months.

Related FAQs

How did user funds get drained?
Users withdrew large amounts of positive PNL without offsetting losses, leaving a large unrealised levered loss in the system. Drift provides a proof-of-concept to demonstrate how a malicious attacker could exploit the issue to drain the funds in a single transaction.
Why did trading get halted?
Trading was halted in the first pause so that the core developers could introduce emergency actions to reduce risk, by increasing market spreads and margin requirements.
The protocol did not have a kill-switch where only withdrawals were halted. The protocol was paused in the second pause to prevent a further drain of user funds and to preserve the remaining Insurance Fund to be able to socialise back to users.
Why did the bug not get patched immediately for trading to continue?
By the time the issues and solutions were fully understood, the volatility of market conditions, the duration of pause, large amount of missing funds and levered loss in the system led to the sunset of v1, settlement of positions in Stage 1 and 2, commitment to build v2.
Why is there a haircut on settlement price compared to mark price?Owing to the sequential nature of AMM liquidity, markets in an AMM do not typically settle without a haircut in the settlement price. There is limited liquidity, as can be seen in the local versus terminal PNL discussion above. Drift is doing its best with a Stage 2 to subsidise users such that they are able to settle at above the average vAMM settlement price. Drift is doing this to ensure goodwill in the community given the abrupt nature of the pause.

Appendix

Definitions

Terminology is also available in this documentation: https://docs.drift.trade/dYut-glossary.

1. Insurance Fund — Funds from protocol fees collected including liquidation penalties. These are added to “per market” Insurance Funds (or fee pool) that are earmarked and partitioned for certain operations (funding on imbalance, repeg, k adjustments) [link]

Drift’s Fee Pool — 6W9yiHDCW9EpropkFV8R3rPiL8LVWUHSiys3YeW6AT6S
Drift’s Insurance Fund (from Liquidation Penalties): Bzjkrm1bFwVXUaV9HTnwxFrPtNso7dnwPQamhqSxtuhZ

2. Local State PnL — Sum of all potential payouts of each user in the current state, with each user’s local slippage calculated by Drift’s SDK exit price.

3. Terminal State PnL — Sum of all payouts of users in aggregate in the end state, where all positions are closed with zero open interest.

4. Local Price — The local exit price for each user which marks a user’s total collateral.

5. Terminal Price — Terminal price is the mark price if all user’s closed their positions which is equal to the peg multiplier if all k adjustments were done during balanced market.

6. Long-Short Imbalance — The state of the vAMM where the notional value of shorts ≠ the notional value of longs.

7. Leveraged Long-Short Imbalance — The state of a leveraged market where the collateral backing open interest of shorts shorts ≠ the collateral backing the open interest of longs.

8. Total User Collateral — The sum of user account balances and unrealised PNL, whether calculated at local settlement prices or terminal prices.

9. User Realised Collateral — The sum of user realised collateral, whether calculated at local settlement price or terminal prices.

10. PNL Divergence — Total PNL Divergence = Total User Collateral (Local) — User Realised Collateral (Terminal)

References

[0] — Protocol Bug, https://driftprotocol.medium.com/exchange-outage-incident-smart-contract-bug-371d03a6d7b

[1] — Settlement Logic, https://driftprotocol.medium.com/drift-settlement-plan-2d1af1c1525

[2] — Redeem UI, https://driftprotocol.medium.com/redemption-ui-is-live-5fba00599448

[3] — Drift DAMM Deep Dive, https://driftprotocol.notion.site/Drift-dAMM-deep-dive-ff154003aedb4efa83d6e7f4440cd4ab[1]

[4] — PNL Divergence Per Market

[5] — Exploit on Orderbook-based Leveraged Trading Systems

Similar to Drift’s POC, an attacker can create two accounts (one to have levered gain, other to have levered loss) such that they come out with all the funds in the protocol.

This short example shows how an OB protocol that matches at mark price and settles at mark price.

Example 1: In an OB protocol that matches at mark price and settles at mark price

Account 1: Takes the entire side (asks) of the book (with leverage) (for example: from $50–120 for SOL is long)
Account 1: Places a bid at a MAX_PRICE (no asks in market)
Account 2: Sells short at the bid (leverages more)
Account 1: Withdraws the levered gain from the protocol

Example 2: In an OB protocol that matches at mark price and settles at oracle price

Account 1: Takes the entire side (asks) of the book (with leverage) (for example: from $50–120 for SOL is long)
Account 1: Places a bid at a MAX_PRICE (no asks in market)
Account 2: Sells short at the bid (leverage again) and settles the levered gain at the oracle and withdraws the levered gain from the protocol such that it depletes the vault

MAX_PRICE would be $1e10 or the max constraint imposed by the protocol. If constraint imposed is large enough such that levered gains can be retrieved, then the protocol is faulty. Any protocol to avoid this attack thus needs constraints on levered gains to avoid an exploit from a multiple account attacker. Its also well known that at large notional sizes, a flash-loan opens up the attack to any sophisticated adversary.

[6] Drift Incident Report Technical Postmortem on Notion (Alternative Formatting)— https://driftprotocol.notion.site/Drift-Incident-Report-Post-Mortem-9cd6856a80eb44e79fd1df65af6f1ae1