#bitcoin-core-pr-reviews Log

17:00:26 <glozow> #startmeeting 
17:00:26 <corebot`> glozow: Meeting started at 2025-06-18T17:00+0000
17:00:27 <corebot`> glozow: Current chairs: glozow
17:00:28 <corebot`> glozow: Useful commands: #action #info #idea #link #topic #motion #vote #close #endmeeting
17:00:29 <corebot`> glozow: See also: https://hcoop-meetbot.readthedocs.io/en/stable/
17:00:30 <corebot`> glozow: Participants should now identify themselves with '#here' or with an alias like '#here FirstLast'
17:00:33 <monlovesmango> heyy
17:00:53 <pseudoramdom> hi hi
17:01:08 <glozow> welcome to PR Review Club! we're looking at Improve TxOrphanage denial of service bounds today: https://bitcoincore.reviews/31829
17:01:10 <marcofleon> hola
17:01:27 <theStack> hi
17:02:13 <glozow> did anybody get a chance to review the PR or the notes?
17:02:34 <instagibbs> reviewing the PR
17:02:48 <marcofleon> yes, mostly notes and focused on new txorphanage
17:02:51 <monlovesmango> yes, mostly. haven't reviewed test commits
17:03:08 <glozow> instagibbs marcofleon monlovesmango: awesome!
17:03:12 <glozow> Why is the current TxOrphanage global maximum size limit of 100 transactions with random eviction problematic? Can you think of a concrete attack scenario that would affect 1-parent-1-child (1p1c) relay?
17:03:14 <pseudoramdom> Just read the notes as well
17:03:20 <marcofleon> i'm also sending the fuzz tests cmooon
17:04:34 <monlovesmango> bc it enables peers to evict other peer's orphans from your orphanage
17:04:55 <marcofleon> Attack scenario could be an attacker node spamming the orphanage to prevent a child paying for a low fee parent
17:05:22 <marcofleon> if the child keeps getting evicted, then i guess parent would be dropped from mempool?
17:06:18 <glozow> marcofleon: in a 1p1c scenario, the parent isn't in mempool yet. We opportunistically pair it with its child in orphanage if we find one. But if the child gets evicted, then we're out of luck
17:06:38 <glozow> Can you summarize the changes to the eviction algorithm at a high level?
17:06:39 <monlovesmango> if a malicious peer floods you with orphans, its pretty likely that you will evict a child that could have otherwise been resolved as 1p1c?
17:06:41 <pseudoramdom> +1. Low fee rate parent + CPFP scenario, attacker floods with orphan tx?
17:08:00 <marcofleon> well eviction is no longer random, it's based on the "worst behaving" peer, and its the oldest announcement
17:08:25 <marcofleon> highest Dos score peer will have their annoucement removed
17:08:49 <glozow> marcofleon: yep! and we'll get into how we calculate DoS score in a later question
17:08:55 <glozow> Why is it desirable to allow peers to exceed their individual limits while the global limits are not reached?
17:10:14 <marcofleon> Because there could be a peer that is sending a lot of orphans, not necessarily dishonestly
17:10:21 <instagibbs> in the non-adversarial case, it could allow a lot more "honest" CPFPs through
17:10:38 <pseudoramdom> Not all peers may be active actively broadcasting at the same time?
17:10:39 <marcofleon> just makes sense to not waste the space by having an inflexible limit per peer
17:10:41 <glozow> marcofleon: instagibbs: yes exactly. often, peers are using a lot of resources simply because they are the most helpful peer
17:11:15 <pseudoramdom> Is it possible for attacker to game the DDoS scoring?
17:11:31 <monlovesmango> rather than having a common pool of orphans, this pr will restructure orphanage to track orphanage counts and usage by peer. each peer will be subject to a orphan announcment count limit that is the global max announcement count divided by the number of peers, and allowed a set amount of weight for the orphans they announce
17:12:36 <glozow> This was why I originally thought of doing a "token bucket" approach where we'd allow peers resources based on an amount of tokens, and then either replenished tokens if the orphans were useful or destroyed them if it was just spam
17:13:05 <glozow> The new algorithm evicts announcements instead of transactions. What is the difference and why does it matter?
17:13:34 <monlovesmango> pseudoramdom: I was thinking about this too. I think if you flood a node with peers with counts that are just over the limit you could theoretically evict a high weight tx from a peer with high weight announcements but low announcement count
17:13:38 <glozow> marcofleon: yeah, you might as well use the space
17:14:31 <marcofleon> annoucements are wtxid, peer. so if a peer is misbehaving then the orphan will only be removed for that peer. So a peer can't affect the orphan announcments of other peers
17:14:32 <monlovesmango> but i'm not sure thats really too much of a concern..?
17:15:24 <glozow> pseudoramdom: do you mean for an attacker to try to get us to evict a specific orphan? or to appear less resource-intensive than another peer and get them chosen for eviction instead?
17:15:36 <glozow> marcofleon: yes bingo!
17:16:13 <pseudoramdom> I was thinking of latter. Staying just under the limit but still managing to evict certain orphans.
17:16:17 <glozow> I think this relates to pseudoramdom's question: if they did something tricky to try to get a particular orphan of theirs chosen for eviction, that's fine, because we'll keep the transaction as long as another peer has announced it
17:17:40 <marcofleon> as long as you have at least one honest peer
17:18:04 <marcofleon> the orphan should (hopefully) remain in the orphanage
17:18:15 <glozow> pseudoramdom: the other peer won't experience eviction unless they exceed the limits. This still presents a limitation - peers might get evicted if they're just sending stuff at a far faster rate than we manage to process them - but the point is you can't influence the eviction of another peer's announcements
17:18:36 <monlovesmango> any peer that stays under peer limits can't have their orphan evicted by another peer
17:18:47 <glozow> monlovesmango: correct
17:19:05 <glozow> Why is there an announcement “limit” but a memory “reservation”?
17:19:05 <sipa> hi!
17:19:31 <glozow> sipa: hello hello
17:20:52 <glozow> Actually, I feel like you can call them both reservations, haha
17:21:08 <monlovesmango> I think bc announcement count affects CPU usage? and its not much of a concern to allocate a certain amount of memory to each peer. guessing here, think I read something like that in the PR notes
17:21:32 <sipa> glozow: both have a global limit, and a per-peer reservation
17:21:38 <marcofleon> but one can be exceeded and the other is a decreasing share of the pie
17:21:52 <sipa> the difference is the announcement global limit is a constant, but the global memory limit is a function of the number of peers
17:22:17 <sipa> so the "constant" is a global announcement limit, and per-peer memory reservation
17:22:18 <marcofleon> or i think we make the assumption that there is more memory that can be used up to a certain point, but for announcements we're trying to figure out which peer is "overusing" their share
17:22:19 <glozow> sipa: I wouldn't call it an announcement "reservation" though, because you aren't guaranteed it. if more peers appear, your announcement limit decreases.
17:22:19 <monlovesmango> hahah so many ways to state the same things
17:22:30 <glozow> On the other hand, your memory reservation is guaranteed and constant no matter how the peer set changes
17:22:52 <sipa> glozow: that just means the reservation is dynamic :p
17:23:10 <sipa> but yeah, the term reservation is weird in that context
17:24:09 <marcofleon> memory reservation is guaranteed per peer yes?
17:24:20 <marcofleon> oh yeah you said it above
17:24:36 <sipa> "Yes sir, your reservation for 4 tonight at FancyDining is confirmed." - "What do you mean my reservation was dropped to 3, because another group made a reservation?!"
17:24:47 <glozow> marcofleon: yes. you're also guaranteed a certain number of announcements, but it's dynamic
17:25:03 <glozow> sipa: yeah that's what I mean is weird about "reservation"
17:25:13 <sipa> fair enough
17:25:25 <sipa> We shall commence the bikeshedding for a better term now.
17:25:30 <glozow> How does the per-peer memory usage reservation change as the number of peers increases? How does the per-peer announcement limit change as the number of peers increases?
17:25:53 <marcofleon> the per peer memory usage doesn't change iiuc
17:26:01 <monlovesmango> wait, why is one a function of the number of peers and the other not?
17:26:04 <glozow> marcofleon: yes 💯
17:26:11 <marcofleon> but the per peer annoucement limit decreases?
17:26:17 <glozow> yup
17:26:25 <glozow> monlovesmango: indeed, why?
17:26:57 <glozow> What is the purpose of the announcement limit?
17:27:06 <monlovesmango> is it bc anncouncement count affects CPU usage? and so we want to limit this globally?
17:27:31 <glozow> monlovesmango: how does our "budget" for CPU usage change with more peers sending us orphans?
17:28:16 <sipa> monlovesmango: more specifically, the *global* announcement limit directly affects the *latency* of trimming announcements - it's not because we have more peers that we can tolerate a longer latency in processing transactions
17:28:45 <sipa> monlovesmango: but for memory usage, it is normal and expected that your maximum memory usage goes up with more peers - if you're memory-constrained, you should limit your number of peers anyway
17:28:50 <glozow> (the answer is it doesn't. we can't tolerate more announcements when we have more peers)
17:29:15 <sipa> (sorry if i spoiled it?)
17:29:38 <marcofleon> in LimitOrphans we're removing announcements one by one right? and so that's why we're using that limit as a proxy for cpu usage
17:29:51 <monlovesmango> sipa: thank you that answered my question
17:30:04 <monlovesmango> marcofleon: that is also a good point
17:30:11 <sipa> monlovesmango: yup, the number of iterations that loop in LimitOrphans scales directly with the *global* announcement limit
17:30:30 <monlovesmango> cool cool we can move on thanks all :)
17:30:38 <glozow> Why is it ok to remove orphan expiration?
17:31:18 <marcofleon> because we take care of oldest orphans now whenever we start evicting
17:31:37 <glozow> marcofleon: yes exactly, that's the intuition for why the number of announcements is the number we are interested in. not the number of unique orphans (which is what we used to limit)
17:31:56 <sipa> glozow: FWIW, have you benchmarked how long LimitOrphans can take?
17:32:20 <monlovesmango> bc the orhpanage now has other concrete metrics by which we can reliable evict orphans which guarantee the oldest and evicted first and orphans that are no longer needed are removed
17:32:37 <glozow> marcofleon: yep! but wait, doesn't this mean we can be holding on to orphans for days, or weeks, etc?
17:32:57 <monlovesmango> yes..?
17:33:33 <glozow> sipa: not since the rewrite. I can find my old benchmarks and run them. IIRC the `AddTx`s is what takes a really long time
17:34:20 <marcofleon> hmm yeah i guess we can hold onto it for a while now. as long as there's no conflicting txs that arrive in a block or something
17:34:28 <sipa> glozow: sure, but the reason for the existence of the global announcement limit is the time that LimitOrphans can take, not AddTx... so perhaps it's worth benchmarking, and seeing if we can perhaps tolerate a higher global announcement limit (or, otherwise, be sad to find out it needs to be reduced)
17:34:47 <glozow> marcofleon: is it problematic?
17:35:55 <marcofleon> i don't think so, as long as limits aren't being exceeded, seems fine to me
17:36:02 <glozow> sipa: yeah definitely. I just wanted to add some context for anybody who looks at the old benchmarks. What do you think is an acceptable amount of time?
17:36:10 <marcofleon> unless i'm missing something...
17:36:17 <sipa> glozow: probably in the millisecond range?
17:36:18 <glozow> marcofleon: I agree with you
17:36:42 <glozow> sipa: 👍
17:37:39 <glozow> Should we also remove EraseForBlock and instead rely on eviction to remove orphans that confirm or conflict with a block? Why or why not?
17:38:58 <marcofleon> could maybe be worked out somehow, but feels like a separate thing
17:39:04 <marcofleon> so i would say no
17:39:34 <marcofleon> it's not the same reason that an orphan is being removed
17:39:53 <monlovesmango> I would also say no, bc otherwise the caller would have to have knowledge of what is in the orphanage and individually evict txs
17:40:18 <glozow> fwiw, I think the worst case for EraseForBlock is probably worse than LimitOrphans. But EraseForBlock happens on the scheduler thread so speed might not be as much of an issue
17:40:33 <sipa> glozow: it also costs an attacker mining a valid block
17:41:11 <glozow> sipa: is that true? You could look at what's in the projected next block and just create conflicting transactions with a lot of nonexistent utxos
17:42:13 <sipa> glozow: sure, but the victim will still never experience it more than once per block, which is expensive. good point though that it's not necessarily the attacker themselves that pay this cost
17:43:39 <glozow> right. it's not a very worthwhile attack imo
17:45:05 <glozow> And the benefit of freeing up this space is probably worth it
17:45:12 <glozow> What is the purpose of reimplementing TxOrphanageImpl using a boost::multi_index_container along with the eviction changes?
17:46:04 <marcofleon> it's easier on the eyes :)
17:46:08 <instagibbs> glozow we could just look for txid matches instead of scanning inputs :)
17:46:09 <monlovesmango> so that you can look up orphan announcements by either wtxid or peer?
17:47:13 <glozow> instagibbs: indeed. It would require adding a txid index, but maybe that's what we're evicting in practice anyway! Could measure what it looks like in the wild
17:47:31 <glozow> monlovesmango: we can already do that though!
17:47:31 <instagibbs> oh right, wtxid would be the thing on hand
17:48:05 <glozow> instagibbs: right but same thing, maybe we're always evicting exact block txns
17:48:13 <instagibbs> 👍
17:48:32 <sipa> i think it was me who suggested using a multi_index, and the reason was because i saw the older implementation was effectively implementing a multi-index inefficiently, by having separate data structures for per-peer and per-wtxid information about announcements
17:48:48 <monlovesmango> glozow: haha it was just my best guess, didn't actually get around to understanding boost::multi_index_container better
17:49:20 <glozow> yeah it is the more natural data structure. I was also pleasantly surprised to realize that we only needed 2 indexes
17:49:36 <sipa> yeah, i was assuming we'd need 3
17:49:41 <sipa> nice find
17:50:18 <glozow> I've also been told many times that the existing `TxOrphanage` is hard to review
17:50:35 <glozow> so good to hear from marcofleon that it's easier this way
17:50:41 <glozow> What is a peer’s “DoS Score” and how is it calculated?
17:51:51 <monlovesmango> its the max bettween announcement_count/peer_announcement_limit and announcement_usage/peer_announcement_usage_reservation
17:52:33 <sipa> monlovesmango: think of an SQL database with multiple indexes on various columns, but then in-memory entirely, and in a C++y way; it's more efficient (both in memory and CPU) than having multiple independent maps (one for each index), and much easier to keep consistent (because there is no way for the different indexes to contain different information)
17:52:38 <marcofleon> maximum of cpu score and memory score. cpu score is a peers number of announcments / their per peer limit. and memory score is sum of the weight of a peers announced tx weights / the reserved memory usage per peer
17:52:58 <glozow> monlovesmango: marcofleon: yep. how does this compare to having 2 separate scores, and trimming "while CPU score is exceeded or memory score is exceeded" ?
17:54:08 <marcofleon> hmmm well a peer can have a dos score of more than 1
17:54:17 <marcofleon> or maybe i'm confused with the q
17:54:44 <monlovesmango> sipa: that helps my conceptual understanding a lot thanks
17:55:06 <monlovesmango> glozow: this is much simplier for sure
17:55:23 <glozow> Er, my point was "it's the same thing"
17:55:25 <sipa> i think the question is: why do we have a *single* DoS score = max(mem_score, ann_score), as opposed to two different DoS scores that are never compared with one another, and a rule "trim the worst announcement offenders while there are any" + 'trim the worst memory offenders while there are any"
17:55:35 <monlovesmango> bc we only have to track one score rather than two
17:56:41 <marcofleon> wait this is actually a good question, i'm not immediately seeing what the benefit of one score is over two
17:56:48 <marcofleon> is it more gameable somehow?
17:56:51 <monlovesmango> i think this way also allows more the advantage of allowing peers to exceed limits/reservations
17:56:55 <sipa> i don't think the two approaches are equivalent, fwiw, but the difference is small
17:57:22 <monlovesmango> as long as global limits are not reached
17:58:06 <marcofleon> hmm so global limits reached, we get dos scores and target a peer based on that
17:59:01 <glozow> So we're comparing this approach to having 2 loops. "While global announcement limit is exceeded, pick the peer with the most announcements, evict from them. Then, while global memory limit is exceeded, pick the per with the most memory usage, evict from them."
17:59:56 <monlovesmango> well practically speaking, usually only one limit will be reached at a time so it would usually only be one loop no?
17:59:57 <glozow> This approach basically rolls them into 1 loop. "While global announcement or memory limit is exceeded, pick the peer with the highest score (max ratio of both) and evict from them."
18:00:02 <Emc99> What is dos?
18:00:12 <instagibbs> denial of service
18:00:29 <Emc99> Thanks
18:00:36 <glozow> oh oops we are out of time!
18:00:38 <marcofleon> is the two loop approach worse in some other way i'm not seeing other than it's two loops
18:01:43 <marcofleon> thanks for hosting and answering qs glozow! good stuff as usual
18:01:43 <glozow> fwiw, I think that having a ratio-based score is good if we want to consider giving different peers different reservation amounts
18:01:44 <monlovesmango> one small flaw with this approach is that if count limit is reached first, the highest DOS peer might actually be violating the memory reservation and removing it won't immediately resolve the global limit being exceeded
18:02:05 <monlovesmango> thanks for hosting glozow!!
18:02:37 <glozow> #endmeeting