Neural-Learning ALM Strategy
The Reinforcement-Learning stra
Introduction
Teahouse recognizes that successful liquidity provision demands more than traditional static strategies. The key challenge is generating swap fees that consistently outweigh rebalancing costs and impermanent loss. Our neural-learning approach represents a breakthrough in addressing this fundamental market inefficiency by introducing adaptive, smart liquidity management.
TL;DR
This is Teahouse’s first machine learning-driven LP strategy, aiming to solve the persistent challenge of consistent crypto pool profitability.
We trained an RL model to dynamically adjust liquidity positions based on complex market data.
Backtesting reveals that the strategy's success is critically dependent on pools with substantial trading volume.
Results show an APR of 569.44% over a 6-month period.
Background
At Teahouse, we believe successful liquidity provision lies on a fundamental principle: swap fees earned must exceed the combined costs of rebalancing and impermanent loss (IL) to generate true profit.
Our analysis of consistently performing strategies revealed that pools with low or predictable IL — such as pegged token pairs (USDC.e/USDC, USDC/USDT) and ETH liquid staking/restaking pairs (WETH/ETH LST, WETH/ETH LRT) — show promising results. For instance, the wstETH/WETH pool on Arbitrum achieved a +13.5% PnL (10.35% APR) over 476 days.
Knowing that well-performing pools share a common trait — low IL (i.e., low price volatility), we wanted to explore how leveraging Concentrated AMM's core feature of maximizing swap fee earnings through strategic, frequent narrow position rebalancing can cover the associated costs. Could frequent position adjustments generate sufficient earnings to offset associated costs and maximize earnings?
While traditional liquidity provision strategies remain effective in many scenarios, they can face challenges in dynamic markets where conditions require frequent, intelligent rebalancing decisions. Static approaches may not fully capture the complexities of high-volume pools or rapidly changing price environments.
Our solution employs neural learning for sophisticated cost-benefit analysis, enabling position moves that optimize the reward-to-risk ratio over time. After extensive experimentation with various machine learning approaches, we identified Reinforcement Learning (RL) as the most suitable framework for this challenge.
Strategy Design
The core innovation of our Neural-Learning ALM Strategy lies in its ability to make intelligent decisions about liquidity positioning through a Reinforcement Learning (RL) model, implemented using the TDQN (Double Deep Q-Network) algorithm for its superior ability to learn optimal decision-making strategies. The RL model enables dynamic learning and adaptation, allowing the strategy to maximize swap fee earnings while minimizing impermanent loss by dynamically managing concentrated liquidity positions.
Data Formulation
Before diving into how Reinforcement Learning works, here are some of the key data components that are used in training our RL model:
Trading data set
We work with a sorted set of trading data D over a time interval I, consisting of T total data points: D={d1, . . . , dT }. Each data point di consists of four key elements:
Log return: ri ∈ R
Log volatility: σi ∈ Σ
Trading volume: vi ∈ V
Trading (swap) data: tri ∈ T
Formally, each data point can be expressed as: di = (ri , σi , vi , tri)
Rebalancing system
Observations (O): The data input at each time point
Each observation oi combines three elements: (ri, σi, vi)
We have T observations: O = {o1, . . . , oT }
Actions (A): There are three possible actions (rebalancing)
1: swap into a new position
-1: remove the current position
0: Hold/no action
Decision making components
Reward Function (r): = A × T → R
Policy Function (π): π : O → A,
Q-value Function (Qθ): Qθ : O × A → R
The Q-value is the key technique that enables the RL model to learn the potential value of taking specific actions in different states. Let’s go over how the RL works in the next section.
Reinforcement Learning Model
A Reinforcement Learning (RL) model powered by deep neural networks (DNN) is structured around two core components: the agent and the environment. The interaction between these two components drives the learning process. The agent learns through feedback from the environment, receiving rewards for positive outcomes. By continuously interacting with the environment, the agent accumulates experience, enabling it to make progressively more effective decisions.
In the context of providing liquidity, the rewards are influenced by factors such as the swap fees earned, the costs of rebalancing positions, and realized impermanent loss (IL). These factors collectively influence the agent's assessment of rewards for different actions, driving its decision-making process.
Here is a simplified flowchart of the RL model:
Note: DNN indicates the Deep Neural Network used to approximate the Q-function, mapping state-action pairs to predicted Q-values based on learned parameters (θ).
The TDQN Algorithm
The Double Deep Q-Network (TDQN) algorithm represents a sophisticated approach to deep reinforcement learning, building upon the foundational Deep Q-Network (DQN) methodology. At its core, DQN combines Q-Learning with Deep Neural Networks to predict potential future rewards (Q-values) for different actions within a given state.
Our strategy employs the TDQN algorithm, which introduces a critical improvement over traditional DQN by addressing Q-value estimation bias. Through a dual-network system, TDQN mitigates the overestimation problem inherent in standard Q-learning approaches, enabling more accurate and reliable decision-making.
Training Process and Learning Dynamics
The training process leverages a replay buffer to sample historical experiences, systematically updating Q-values through the Bellman equation. The agent dynamically balances the exploration of novel strategies with the exploitation of proven actions, enabling adaptive learning. By implementing the TDQN algorithm, the model can continuously refine its approach to liquidity provisioning on Uniswap v3, strategically optimizing for maximum rewards while minimizing potential risks.
At a given time (t), the environment (Token pair pool) provides:
The current state st (the LP position)
A reward rt (swap fee earned), which is a function of the previous action at −1
The reward relationship is formally expressed as:
The agent then determines the current action based on the current state st, expressed as:
The policy function is defined as:
Then further:
After the environment receives the action at, it returns the next state st+1, and the reward
rt+1. This process continues iteratively, allowing the agent to learn and adapt its strategy
over time.
To summarize, the agent:
Learns from the reward outcomes
Adapts decision-making policy for future actions
Balances between exploring new strategies and exploiting known successful actions
This means the model picks the action it thinks will produce the highest future reward. The interaction between these components creates a continuous learning cycle where the strategy improves its decision-making based on real-world outcomes.
Backtesting and Findings
Our Strategy Team backtested the Neural-Learning ALM Strategy by applying it to the WETH/USDC (0.05%) pool on Uniswap. Here are the backtesting parameters:
Pair pool: WETH/USDC (0.05%)
Network: Arbitrum
Data span: Aug. 31, 2021 to Sep. 04, 2024 (1,100 days)
Initial investment: 10,000 USDC
Backtested period: April 7 to September 9, 2024
Performance under different parameters
This figure presents the results for different sets of I = 1 hour and 3 hours, then under the rebalancing ranges R = ±0.5 %, ± 0.8 %, ± 1 %. For example, the blue line in the figure represents 1 hour under ±0.05%.
Findings
The figure below shows the results from one of our test sets, with green blocks representing the strategy's positions at different time points. In this example, the strategy achieved a remarkable 284.72% PnL, growing the initial 10,000 USDC to 38,472 USDC.
Key Observations from Backtesting:
Looking at the positions (green blocks), we can see that the RL model showed advanced decision-making by not just following the ETH price. The strategy held some positions even during price movement.
The positions held during periods of price fluctuation suggest that the strategy either anticipated a swing back of the price or was responding to liquidity constraints such as not having enough trading (swap) volume to cover the potential costs.
The strategy's performance showed a significant correlation with trading volume. Comparative tests of the WETH/USDC (0.05%) pool on Arbitrum consistently yielded positive results.
Conclusion
The Neural-Learning ALM Strategy represents a significant leap forward in Teahouse’s decentralized finance trading strategies. Our experiment using a machine-learning model to optimize liquidity provision strategies in Uniswap v3 demonstrates its potential beyond traditional algorithmic parameter-tuning methods.
Key Finding: Success is heavily dependent on trading volume – the strategy works best in high-liquidity pools with consistent trading activity.
While promising, our current implementation has clear limitations, as the strategy's effectiveness is directly tied to pool volume. Our next steps to enhancing the strategy in the future involve:
Expanding to high-volume pools
Refining our machine-learning models with more advanced reward functions to capture even more complex market dynamics.
As DeFi markets are relatively young compared to traditional financial markets like stocks and foreign exchange, we see significant potential for model improvement by exposing our strategy to diverse market environments and continuously expanding our dataset.
Reinforcement Learning approaches to liquidity provision offer notable advantages over static strategies, especially in markets with steady trading volumes. The success of this implementation lays a strong foundation for advancing automated market-making strategies in decentralized finance.
Last updated
Was this helpful?