The Polygon Staking bugs have been fixed: our post-mortem.
The Polygon bug has been fixed, and we learned a number of lessons to make sure we reduce mishappenings in the future. Here’s our post-mortem.
Table of Contents:
The Panther team has successfully finished fixing the Polygon Staking bugs as per the vote of the DAO. Users can now stake their tokens on the Polygon network to vote on protocol decisions and earn rewards.
We’ve learned some important lessons through this process, which we’d like to share with you in this post. These lessons, we trust, will improve how we make decisions, organize development work, and test future solutions to avoid further similar issues.
Arriving at the bug.
The fact that there were issues in our code became apparent before releasing Polygon staking, in Panther’s staging environment (which included a public beta group). In this environment, we first found two issues:
- A human configuration error set the staking period in the staging environment to around 50 years, as opposed to 56 days.
- We used time remaining as a variable instead of time elapsed in our rewards-calculating function.
After fixing this, assuming both these mistakes explained the incorrect reward amounts, we felt confident enough to deploy. Inadvertently to us, however, another, bigger problem had gone unfixed, masked by the two bugs.
Let’s see if you can spot the problem.
Before explaining everything that happened and how we fixed it, let’s see if you can spot the remaining issue by yourself as a sort of puzzle. We’ll give you all the information you need to solve the problem yourself:
The staking mechanisms in Ethereum and Polygon are, necessarily, to be designed differently. Ethereum’s staking Reward Pool (RewardPool) doesn’t temporarily hold tokens, but instead distributes the tokens coming from the protocol’s Rewards Vesting Pool, (VestingPool) as users stake and unstake. On the other hand, Polygon’s version of RewardPool, MaticRewardPool, holds the full 2M $ZKP that were bridged from Ethereum to be allocated as rewards.
All the non-distributed $ZKP (except for these 2M $ZKP) is controlled by a number of vesting pools in the Ethereum Mainnet, in this case, the Protocol Rewards vesting pool. For $ZKP to exist in other blockchains, as is the case on Polygon, it has to be previously bridged from the existing supply on the Ethereum network. As such, MaticRewardPool is necessarily a simpler contract, as it only needs to distribute 2M $ZKP, as opposed to also retrieving them from vesting pools.
Now that you have a background, let’s see if you can identify the issue. In the flawed contract, the percentage of the 2M $ZKP that is releasable at any time to stakers is dynamically computed by the _releasableAmount() function. The function is located in the MaticRewardPool component of the Polygon staking mechanism. As per Panther DAO Proposal #3, the rewards are to be released linearly over 56 days, computed every block (which in Polygon, means every 2.3 seconds on average).
The formula for this, as used in the code, is the following one:
In this formula, b stands for balance (remaining amount of tokens), te is time elapsed, and tt is total time.
If the nature of the issue is that rewards are being released too quickly, and MaticRewardPool — through _releasableAmount() — decides how many tokens are available, can you pinpoint why MaticRewardPool is releasing tokens earlier than intended?
If you calculate what happens when you run the formula you’ll see that, compounding by the second (or by any similar, short timeframe), the staking contract depletes the 2 million $ZKP pool in slightly over 24 hours.
This is because the math formula used was incorrect, as it didn’t subtract what was already vested from the total amount, as well as use a dynamic token balance instead of the total 2M $ZKP allocated. The correct formula, then, should look like this:
The correct formula introduces two new variables, 2M (the 2 million $ZKP allocated to the pool) and releasedAmount (the rewards that have already been released at any given point) to make the function linear.
To fix this, the Panther team had to redesign more than one component of the Polygon staking contracts collection, as some of them relied on the previous way MaticRewardPool calculated releasable rewards.
Just after the approval of the governance proposal, the first step before applying the fixes was to disallow unstaking through disabling the StakeRewardAdviser contract. MaticRewardPool was replaced with the new RewardTreasury contract, while StakeRewardAdviser got superseded by a more powerful version of it, aptly named StakeRewardController.
Distinctly from the previous layout that had MaticRewardPool vest the tokens directly to RewardMaster, the new layout has RewardTreasury interface through StakeRewardController to relay the rewards to RewardMaster. StakeRewardController assures the correct distribution of rewards, as it is the only contract with authority to move them from RewardTreasury. A third, much simpler contract, called StakesReporter, takes over RewardTreasury’s responsibility of reporting rewards balance to the web dApp.
To restart the staking rewards distribution, 2M extra $ZKP was taken out of the Protocol Rewards Pool on Ethereum and bridged to Polygon. These tokens serve as a “loan” of sorts to pay staking rewards immediately without waiting for the eventual reclaiming of the prematurely vested funds. The previously vested rewards will be reclaimed by StakeRewardController automatically as users unstake tokens from the web UI.
How did we make these mistakes?
Essentially, the Panther tech team made the mistake of letting a task’s simplicity (in this case, producing the Polygon contracts) become a distraction from properly reviewing its execution. Having successfully executed much more complicated Ethereum contracts, along with a long list of things, their Polygon counterparts seemed much easier, which led us to lower our guard.
Lessons learned and steps forward
If there is one takeaway from this situation, it is the importance of testing early, thoroughly, and for as long as needed.
Small, simple code can hide bugs in plain sight just as effectively as large, complex code. It’s always crucial that, despite the excitement and importance of deploying in time, our team continues to re-test after finding and fixing bugs, as there is always a possibility to have hiding issues. Similarly, using realistic sample data and testing in as many real-world scenarios, without relying on test helpers, which reimplement the functionality being tested, is critical to highlight any possible flaws. Three experienced and talented engineers can all be blind to the same bug, but thorough testing will always uncover any weaknesses.
Moving forward, the Panther team will certainly continue to live up to these standards in our mission to infuse DeFi and Web3 with compliant privacy.
Panther is a decentralized protocol that enables interoperable privacy in DeFi using zero-knowledge proofs.
Users can mint fully-collateralized, composable tokens called zAssets, which can be used to execute private, trusted DeFi transactions across multiple blockchains.
Panther helps investors protect their personal financial data and trading strategies, and provides financial institutions with a clear path to compliantly participate in DeFi.
Stay connected: Telegram | Twitter | LinkedIn | Website