#bitcoin-core-pr-reviews Log

17:00:44 <marcofleon> #startmeeting 
17:00:44 <corebot`> marcofleon: Meeting started at 2025-10-08T17:00+0000
17:00:45 <corebot`> marcofleon: Current chairs: marcofleon
17:00:46 <corebot`> marcofleon: Useful commands: #action #info #idea #link #topic #motion #vote #close #endmeeting
17:00:47 <corebot`> marcofleon: See also: https://hcoop-meetbot.readthedocs.io/en/stable/
17:00:48 <corebot`> marcofleon: Participants should now identify themselves with '#here' or with an alias like '#here FirstLast'
17:00:57 <marcofleon> hey everyone!
17:01:01 <stickies-v> hi!
17:01:03 <stringintech> Hi!
17:01:04 <kevkevin> hi
17:01:05 <dzxzg> hi
17:01:08 <monlovesmango> hello!
17:01:15 <danielabrozzoni> hi :)
17:01:37 <marcofleon> we're reviewing #33300 today
17:01:39 <corebot`> https://github.com/bitcoin/bitcoin/issues/33300 | fuzz: compact block harness by Crypt-iQ · Pull Request #33300 · bitcoin/bitcoin · GitHub
17:02:24 <marcofleon> feel free to jump in whenever as we go through the questions, or circle back to previous ones, etc
17:02:45 <marcofleon> okay first one
17:02:45 <yuvicc> hi
17:02:47 <marcofleon> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach? Were you able to get the fuzz test running?
17:02:51 <stickies-v> if anyone's new the review club today, feel free to say hi - even if you're just lurking!
17:03:00 <marcofleon> oh yeah^
17:03:08 <kevkevin> I did not read the PR but I am lurking today!
17:03:08 <marcofleon> lurkers are welcome
17:03:17 <yuvicc> same here
17:03:36 <stringintech> while trying to get the fuzz test running on mac, i ran into a couple of compile-time and runtime issues. small write-up for the compile fix here: https://stringintech.github.io/blog/p/fuzzing-bitcoin-core-using-afl-on-apple-silicon . i will try to enhance/extend it once i wrap my head around the runtime issues (reported one on the PR page).
17:03:55 <stringintech> and a very light review. mostly to see how random input bytes turn into meaningful data for this fuzz test.
17:04:02 <danielabrozzoni> Concept ACK, I am still finishing my review :) I looked at commits one by one, trying to understand why each change was needed. I was able to run the fuzz tests.
17:04:26 <dzxzg> Concept ACK, I lightly read `strc/test/fuzz/cmpctblock.cpp` and got the fuzz test running
17:04:45 <monlovesmango> i was able to review a bit run the fuzz test. generally ack but have sociwmae fuzz related conceptual questions.
17:04:56 <monlovesmango> a couple*
17:05:14 <stickies-v> also get in my usual cycle of not fuzzing for long enough and then no longer being able to get it to compile on macos arm
17:05:17 <marcofleon> stringintech: nice, yeah I saw your comments. I still need to look into it more to understand but I do wonder why that wasn't showing up on linux
17:05:51 <marcofleon> cool so we got people fuzzing!
17:06:07 <stringintech> marcofleon: yeah, that was also confusing for me...
17:06:21 <marcofleon> Did anybody get to trying the exercise?
17:06:32 <marcofleon> I just think it's fun to see a crash, that's why I ask
17:06:45 <stickies-v> concept ACK, spent too much time thinking about the (not very elegant, but maybe necessary) new startup option to copy datadirs
17:07:20 <marcofleon> monlovesmango: feel free to ask fuzz related questions whenever, I can try my best to answer
17:07:50 <marcofleon> stickies-v:  yeah agreed that part was a bit confusing to me too. We'll get into it
17:07:53 <marcofleon> okay second q
17:08:03 <marcofleon> Where in the codebase are the main compact block helpers and processing logic? Name some of the relevant classes and functions. (Hint: nothing like a little search for NetMsgType::CMPCTBLOCK to get you started)
17:08:10 <danielabrozzoni> I tried to see it crash, but couldn't reproduce! It might be that I haven't run the fuzzer for long enough, and my exec/sec are quite low
17:08:57 <monlovesmango> well I don't want to derail anything but usually with fuzz tests you see asserts everywhere, but here I didn't see any. so my question is where/how are we asserting that results are expected?
17:09:19 <stringintech> marcofleon: i could not unfortunately. blocked by the runtime issues. but had a question on this... how much the initial input we start with matters?
17:09:21 <marcofleon> danielabrozzoni: I had that thought after the notes went up unfortunately... "oh maybe it needs to run too long to actually reproduce"
17:10:20 <marcofleon> Okay in terms of the asserts, I think this harness is meant to be catching assumes/asserts we have in production code
17:10:53 <marcofleon> just generally testing compact block logic. But that's good that you're thinking about that monlovesmango because yeah in general you want good oracles for fuzz tests
17:11:00 <stickies-v> marcofleon: `PeerManagerImpl::NewPoWValidBlock` creates a `NetMsgType::CMPCTBLOCK` message and through `m_connman.ForEachNode` sends it to our high-bandwidth peers
17:11:11 <dzxzg> The `PartiallyDownloadedBlock` class for all of the compact block construction object stuff, and a lot of the logic is inside of the `ProcessMessage` NetMsgType::CMPCTBLOCK case
17:11:34 <marcofleon> and then stickies, initial input doesn't matter, the fuzzer will find its way
17:12:08 <monlovesmango> it seemed like class CBlockHeaderAndShortTxIDs in blockencodings.h was putting together the compact block message. is that the question?
17:12:12 <danielabrozzoni> marcofleon: Ah, got it! I'll try to run it for longer, just for fun. Maybe I could also modify the harness so that it will get to the crash quicker, as an exercise
17:12:18 <marcofleon> unless you need very specific strings for serialization or network things etc... so I guess it can matter. but not too often afaik
17:12:30 <monlovesmango> marcofleon: aahhh that makes a lot of sense thanks!
17:12:33 <stringintech> thanks!
17:13:01 <stringintech> for question 2: this might be way too simplified but seems we can divide main parts into 3 categories:
17:13:06 <stringintech> 1) SEND a peer a cmpct block in HIGH bandwidth mode: in `PeerManagerImpl::NewPoWValidBlock()` callback (triggered by `ChainstateManager::AcceptBlock()`) and in `PeerManagerImpl::SendMessages()` (in case not already sent the cmpct block in `PeerManagerImpl::NewPoWValidBlock()` to this peer)
17:13:14 <stringintech> 2) SEND a peer a cmpct block in LOW bandwidth mode: process peer cmpct block request in `PeerManagerImpl::ProcessGetBlockData()` and potentially read the block from disk and construct the cmpct block - if not already done - and send it; constructing the cmpct block happens through `CBlockHeaderAndShortTxIDs` constructor in blockencodings.cpp
17:13:20 <stringintech> 3) RECEIVE a cmpct block from a peer: in `PeerManagerImpl::ProcessMessage()` processing `NetMsgType::CMPCTBLOCK` msg type, we process the cmpct block sent to us and try to construct the full block using txns in our mempool (using `PartiallyDownloadedBlock::InitData()` in blockencodings.cpp) and potentially send a `NetMsgType::GETBLOCKTXN` msg to
17:13:20 <stringintech> the same peer for the txns we did not have where we can later process the response in `PeerManagerImpl::ProcessCompactBlockTxns()` and finally call `PartiallyDownloadedBlock::FillBlock()` to see if we have the full block now.
17:13:25 <marcofleon> Yeah that question was just to get people going through the compact block code
17:13:32 <stringintech> in (1) and (2) peer may respond with a `NetMsgType::GETBLOCKTXN` msg where we potentially send the requested txns of the block in `PeerManagerImpl::SendBlockTransactions()`
17:13:51 <marcofleon> seems like you all got it
17:14:04 <danielabrozzoni> Also `MaybeSetPeerAsAnnouncingHeaderAndIDs` takes care of switching a peer to high bandwidth mode, and if we have too many high bandwidth peers, downgrading a differernt one to low bandwidth
17:15:10 <marcofleon> Nice yes. Should we move on, I don't think I see any errors in the answers?
17:15:20 <marcofleon> The fuzz test sends SENDCMPCT messages with high_bandwidth randomly set. How many high bandwidth peers are allowed and does the fuzz harness test this limit? More generally, why would a peer choose to be high or low bandwidth?
17:16:39 <stringintech> there is no limit to how many peers select us as a HB peer, right?
17:16:40 <stringintech> i found a limit on the other side: how many peers WE can select as HB which is “3” and it is enforced when PeerManagerImpl::BlockChecked() calls`PeerManagerImpl::MaybeSetPeerAsAnnouncingHeaderAndIDs()` (maintains 3 fastest peers as HB).
17:16:59 <stickies-v> stringintech: yeah "high bandwidth peers" is a bit ambiguous i think
17:17:38 <marcofleon> hmm good point. Because it's a two way street in terms of selection
17:18:31 <monlovesmango> I think you would prefer to be high bandwidth to remove latency?
17:18:44 <dzxzg> A peer doesn't choose to be high or low bandwidth, we will ask all of our peers to be low bandwidth compact block peers initially, and then in MaybeSetPeerAsAnnouncing* we'll select high bandwidth peers and send them a SENDCMPCT asking them to be a high bandwidth peers
17:18:45 <marcofleon> Yeah so we can have 3 max. And then I guess we could be a hb peer to all of our peers... although I'd have to confirm
17:19:33 <marcofleon> dzxzg: nice got it
17:19:50 <dzxzg> If someone has asked you to be their high-bandwidth peer, you'll send them a CMPCTBLOCK message in `NewPoWValidBlock()` before you have even completetd block validation
17:20:30 <marcofleon> yeah the sending the block even before validation was new to me before stickies let me know
17:20:42 <marcofleon> monlovesmango: yes that sounds about right
17:20:57 <marcofleon> the block gets sent right away so latency is very low
17:21:26 <monlovesmango> dzxzg: that actually makes a lot of sense
17:21:53 <monlovesmango> (about choosing to be high/low bandwidth)
17:22:10 <marcofleon> Does the fuzz harness test the 3 peer limit in MaybeSetPeerAsAnnouncingHeaderAndIDs?
17:23:49 <stringintech> we are setting up only 3 peers; so no?
17:24:02 <marcofleon> correct
17:24:20 <marcofleon> Would need to be more than 3 to cover that logic
17:24:40 <marcofleon> Compare -testdatadir and the new -fuzzcopydatadir. Why is the latter useful for performance, and why isn’t a fresh TestingSetup that mines blocks on each iteration acceptable here?
17:24:55 <marcofleon> stickies-v: Go off
17:25:00 <dzxzg> lOL
17:25:14 <stickies-v> Wait is that an intentional choice? The choice to connect to 3 peers seemed unrelated to the number of high bandwidth peers?
17:25:40 <marcofleon> No Eugene told me it wasn't intentional
17:25:56 <marcofleon> it's from the processmessage fuzz test I think
17:26:02 <stickies-v> okay makes sense
17:26:18 <marcofleon> But in the coverage report you can see some of the lines not hit in MaybeSet
17:26:26 <marcofleon> so making that more than 3 should hit it eventually
17:26:56 <stickies-v> yeah but then we'd probably have to add assertions that our hb accounting invariants hold
17:27:01 <dzxzg> yeah, it would be fine to increase it past 3, we actually don't even enforce anything related to high bandwidth on the receiving side
17:27:10 <monlovesmango> fresh TestingSetup would take too much time for each fuzz iteration?
17:28:09 <dzxzg> (https://github.com/bitcoin/bitcoin/pull/32606)
17:28:45 <stringintech> is this performance related (considering that mining should not take so long when in fuzzing mode) or that we want all runs to have the same initial state?
17:28:48 <marcofleon> monlovesmango: exactly. Having to mine a new 200 block chain every iteration is just too slow
17:29:16 <marcofleon> stringintech: good question, I think it's mainly for performance
17:29:49 <marcofleon> the determinism is there too with using the same chain. But this whole commit was mainly done to significantly speed up the test
17:30:22 <stickies-v> but isn't the point of `FuzzTargetOptions::init` to do that kind of expensive initialization as a one-off?
17:30:40 <marcofleon> dzxzg: I'll have to check that PR out
17:32:04 <stringintech> stickies-v: i think that's what we are doing know. mining blocks and store in a static dir in init() and then copy that static dir in each run.
17:32:10 <marcofleon> stickies do you mean `initialize_cmpctblock` in the harness?
17:32:13 <stringintech> doing now*
17:33:25 <stickies-v> well so usually test suites have "fixtures" to do expensive one-off initialization that can be shared across tests, e.g. we have the same in our unit tests using boost
17:33:50 <stickies-v> it appeared to me that the `FuzzTargetOptions::init` option serves a similar purpose (but i don't know the fuzzing code well enough)
17:34:16 <stickies-v> if we just want to avoid mining a 200 chain for every iteration, why isn't the `.init` option enough for that?
17:35:14 <marcofleon> Wait is this in fuzz.cpp? why can't I find this
17:35:28 <marcofleon> the fuzztargetoptions::init
17:35:41 <marcofleon> oh I see nvm
17:35:50 <stickies-v> yes, it's in fuzz/fuzz.cpp, and it's what `.init=initialize_cmpctblock` gets passed to
17:37:24 <marcofleon> Yeah so that's what allows us to have the one time initialize functions in fuzz targets
17:37:35 <marcofleon> runs once before the iteration
17:37:47 <marcofleon> but what's your question?
17:38:04 <stickies-v> why don't we just generate the 200 blocks in the .init function and avoid copying datadirs?
17:38:45 <stickies-v> also don't mean to derail the review club with my lack of fuzzing knowledge so we can move on from this anytime
17:38:56 <marcofleon> I think beacuse the test is stateful
17:39:06 <stringintech89> does each run alter its data dir?
17:39:31 <marcofleon> the testingsetup is modified in the iteration
17:40:03 <marcofleon> Yeah I think so, usually the testing setup stuff would be done in the init as a one off. But for this test specifically, its done in the iteration which is unusual
17:40:20 <stickies-v> aha, okay, that's unfortunate
17:40:21 <dzxzg> Is the problem that blocks are being mined and then announced?
17:41:21 <dzxzg> Maybe there is an obvious reason why this is dumb, but it really seems like there should be a way to have a memory-only chainstate object, and then we could just reset to that at the end of each fuzz epoch(not sure if epoch is the right word)
17:41:47 <dzxzg> That would also make this fuzz test faster by avoiding hitting the disk at all.
17:42:09 <dzxzg> Or maybe it's not a bad idea, just easier said than done to implement that
17:42:10 <marcofleon> My first question when I saw this test was why can't we do this in memory. And Niklas was saying its just the validation code doesn't allow it. We need to do some refactoring
17:43:03 <stringintech89> just asking out of curiosity: wouldn't testing memory only, reduce coverage?
17:43:09 <dzxzg> marcofleon: Figures 😿
17:43:10 <marcofleon> This datadir commit is something I need to think about more too. For now let's move on. I think the approach might be okay though
17:43:36 <marcofleon> Look at create_block in the harness. How many transactions do the generated blocks contain, and where do they come from? What compact block scenarios might be missed with only a few transactions in a block?
17:44:01 <marcofleon> stringintech89: yeah it would, by quite a bit I believe
17:44:16 <stringintech89> thanks
17:44:44 <dzxzg> stringintech89: It would reduce coverage of this particular fuzz test, but that coverage is redundant I think this test should just fuzz compact block logic which doesn't do anything special with the disk, I think existing validation fuzzers should cover that stuff
17:45:24 <dzxzg> (IMO)
17:45:31 <stringintech89> ah makes sense! thanks
17:46:21 <stringintech89> for the next question: max 3 txns: coinbase, at most one from mempool, at most one not in mempool. was wondering if including more than one from mempool (or more than one not in mempool) would cover different code paths and lead to more coverage.
17:46:22 <monlovesmango> it seems like create_block only includes 2 txs max? one from mempool and other adhoc created
17:46:34 <monlovesmango> oh duh, forgot coinbase :)
17:46:53 <marcofleon> haha yes coinbase too. 1-3 txs
17:47:37 <dzxzg> Why not more?
17:48:42 <marcofleon> I think crypt-iQ was saying he might change that part of the test a bit
17:49:02 <marcofleon> Or just generally improve it, I don't see why there couldn't be some more
17:49:17 <dzxzg> marcofleon: Makes sense, thanks
17:50:02 <marcofleon> I'm gonna switch question 6 and 7 for now in the interest of time if that's okay. Becauase i like 7 better :)
17:50:11 <marcofleon> Commit ed813c4 sorts m_dirty_blockindex by block hash instead of pointer address. What non-determinism does this fix? The author notes this slows production code for no production benefit. Why can’t EnableFuzzDeterminism() be used here? How do you think this non-determinism should be best handled (if not the way the PR currently does)?
17:50:26 <monlovesmango> what compact block scenarios might be missed with this low tx count?
17:50:42 <dzxzg> the second half of question 5, this doesn't exercise some of the prefill index stuff, where prefills have an index counting from the last prefill
17:50:52 <dzxzg> for example
17:51:07 <marcofleon> oh yes! My bad
17:51:23 <marcofleon> I was also thinking of shortid collisions
17:51:34 <dzxzg> oh yeah that too
17:51:46 <monlovesmango> dzxzg: marcofleon: great answers thanks :)
17:52:28 <dzxzg> We definitely want to have coverage of shortid collisions!
17:54:21 <stringintech89> for question 7: it tries to make the order we write changes to block index db deterministic
17:55:43 <stringintech89> and if we want to include the new sort in the set type itself, EnableFuzzDeterminism() does not work and we would need macros perhaps.
17:55:53 <monlovesmango> why does block index db need to be deterministic?
17:56:13 <monlovesmango> or why do the writes need to deterministic?
17:56:35 <marcofleon> Yeah basically the iteration order won't be deterministic with just a normal std::set which orders based on memory address. so it's changed to block hash to make it the same order every time
17:57:38 <stringintech89> but as marcofleon mentioned on the PR page, maybe we can include a runtime sorting logic just when fuzzing; then EnableFuzzDeterminism() can be used perhaps
17:57:59 <eugenesiegel> hi, sorry I thought this review club was at a different time. runtime sorting doesn't work because the std::sort call itself will be non-deterministic
17:58:44 <eugenesiegel> different executions will have different memory addresses so the compare function will be called a different amount of times
17:59:20 <marcofleon> Oh that makes sense. so it doesn't actually solve the non-determinism
17:59:32 <dzxzg> Just drawing it out a bit more to make sure I understand why we care if it's in the same order every run, we write from the set of dirty block indexes in the order of the set, and if we don't write blockindexes in the same order every time, we would have different behavior running the same seed two times?
17:59:33 <marcofleon> also monlovesmango: I think it's just for general stability
18:00:18 <marcofleon> in gneral fuzz tests should be as deterministic as possible. But not sure if it's worth it in this case, because this non-determinism might not super impactful?
18:00:42 <monlovesmango> ok. so just for my understanding is there anything explicity comparing this between executions?
18:00:59 <monlovesmango> ohhh ok so when there is a failure it can be preproduced, got it
18:01:06 <eugenesiegel> dzxzg: we won't technically have different behavior since iirc the db is a key-value store. The issue is that the fuzzer will think it's gaining (or losing) coverage when nothing has actually happened
18:01:07 <monlovesmango> reproduced*
18:01:20 <marcofleon> But I could be wrong there, it could be for reproducing a potential crash, this would get in the way of reproducing it
18:01:43 <dzxzg> I wonder what the perf cost is to normal node running of adding the pointer accesses
18:01:45 <marcofleon> okay we've reached the end, thank you for coming! we can of course continue discussing
18:01:50 <marcofleon> #endmeeting