OpenVM: Public Input Broken After ERE Update?
Hey guys! So, it looks like we've hit a snag with the OpenVM, and I wanted to share what I've found. After updating the workload to the latest ERE, I noticed something weird was happening during the tests. Specifically, there were failures related to verifying the public output from execution. Let's dive into the details and see what's up.
The Issue: Public Input Hash Mismatch
The error message that popped up was pretty clear: "Public inputs hash mismatch." It was expecting one hash but getting a string of zeros instead. This happened after simply updating the ERE dependency, which made me think it had to be something within those changes. To give you the specifics, the test stateless_validator::tests::execute_mainnet_blocks
was failing. The core of the problem was an output mismatch for rpc_block_22974580
. The expected public input hash was a sequence of bytes, but the actual output was just a bunch of zeros. This clearly indicates something went wrong in producing the public inputs during the execution.
I know that may seem a bit techy, but let me break it down in simpler terms. Imagine you're doing a math problem and expect the answer to be, say, 42. But when you run the calculation, you get 0. That's essentially what's happening here. The system is supposed to produce a specific hash (a unique identifier) as the output, but it's just giving us a blank slate. The frustrating part? The only change was updating the ERE dependency. This suggested the problem was introduced with the updated ERE code, affecting how the public inputs are being handled within the OpenVM environment. It’s like upgrading your favorite software and suddenly it starts giving you the wrong answers – super annoying, right? So the chase began to figure out what exactly was happening and why the new ERE update was causing this mismatch. The hunt involved a bit of code-diving and testing to pinpoint the exact commit that introduced this issue, which, as it turns out, was quite the adventure.
Bisecting to Find the Culprit
Okay, so here's where things got a bit more interesting. Since the current workload master was only a few commits behind, I decided to do a bisect to pinpoint the exact commit that introduced this issue. For those not familiar, bisecting is basically a process of elimination where you repeatedly divide the range of commits in half to find the problematic one. It's like playing a guessing game where you keep narrowing down the possibilities until you find the exact source of the issue. And guess what? I found it!
The problematic commit turned out to be this one from the ERE repository. This commit was the exact one that caused the public input verification to fail. To be precise, the workload was using a specific commit (046ad63) from the zkevm-benchmark-workload
repository, and upgrading past this ERE commit introduced the bug. What's particularly interesting is that this commit changed something related to execute_metered
. It seems this modification inadvertently broke the public input return mechanism in OpenVM. Understanding the specific changes in this commit and how they interacted with the OpenVM SDK was crucial to diagnosing the problem.
How to Reproduce the Issue
If you want to see this in action yourself, here’s how you can reproduce the issue:
- Clone the
zkevm-benchmark-workload
repository from this specific branch:https://github.com/eth-act/zkevm-benchmark-workload/tree/jsign-ere-bisect
- Pull the necessary Docker images:
docker pull ghcr.io/eth-act/ere/ere-base:0.0.13-4163db1
docker pull ghcr.io/eth-act/ere/ere-base-openvm:0.0.13-4163db1
docker pull ghcr.io/eth-act/ere/ere-cli-openvm:0.0.13-4163db1
- Tag the Docker images to the correct names:
docker tag ghcr.io/eth-act/ere/ere-base:0.0.13-4163db1 ere-base:0.0.13-4163db1
docker tag ghcr.io/eth-act/ere/ere-base-openvm:0.0.13-4163db1 ere-base-openvm:0.0.13-4163db1
docker tag ghcr.io/eth-act/ere/ere-cli-openvm:0.0.13-4163db1 ere-cli-openvm:0.0.13-4163db1
- Run the test with the following command:
ZKVM=openvm RUST_LOG=info,sp1_core_executor=warn cargo test -p integration-tests --release -- --test-threads=1 execute_mainnet_blocks
If all goes as expected (or rather, unexpectedly), you should see the same failure logs I shared above. This setup ensures that you're using the exact versions and configurations that trigger the bug. By reproducing the issue, you can verify that the problem is indeed related to the specified commit and environment. It's always a good practice to replicate issues locally to ensure that you're addressing the correct problem. Plus, being able to reproduce the bug is super helpful when discussing it with the team and figuring out a fix.
Possible Cause: OpenVM SDK Bug
Given that the problematic commit changed to execute_metered
, it looks like this might be a bug in the OpenVM SDK. This could be due to how the metered execution is handling the public input return or some other related mechanism. I'm not entirely sure yet, but it's definitely something to investigate further. The change to execute_metered
might have introduced a subtle incompatibility or error in how the public inputs are processed. Pinpointing this issue is essential for getting OpenVM back on track.
Next Steps and Possible Solutions
So, what's next? Here are a few things we should consider:
- Investigate the
execute_metered
changes: We need to dive deep into the commit that introducedexecute_metered
and understand exactly how it affects the public input return mechanism. - Review OpenVM SDK: It's worth reviewing the OpenVM SDK to see if there are any known issues or updates related to metered execution.
- Test with different configurations: Try running the tests with different configurations to see if the issue is specific to certain environments or settings.
- Collaborate with the ERE team: Reach out to the ERE team to get their insights and collaborate on a fix.
By working together and systematically investigating the issue, we can hopefully find a solution and get the OpenVM back to its reliable self. This is just part of the process, and I'm confident we'll get to the bottom of it!
Conclusion
Alright, that's the scoop on the OpenVM public input issue. It seems like a recent update to the ERE dependency is causing some problems with the public input verification. By bisecting the commits, I was able to pinpoint the exact change that introduced the bug. Now, it's time to roll up our sleeves and dig deeper to find a fix. I'll keep you all updated as we make progress. Thanks for reading, and stay tuned for more!
Cc: @han0110