17:00:26 <glozow> #startmeeting 17:00:26 <corebot`> glozow: Meeting started at 2025-06-18T17:00+0000 17:00:27 <corebot`> glozow: Current chairs: glozow 17:00:28 <corebot`> glozow: Useful commands: #action #info #idea #link #topic #motion #vote #close #endmeeting 17:00:29 <corebot`> glozow: See also: https://hcoop-meetbot.readthedocs.io/en/stable/ 17:00:30 <corebot`> glozow: Participants should now identify themselves with '#here' or with an alias like '#here FirstLast' 17:00:33 <monlovesmango> heyy 17:00:53 <pseudoramdom> hi hi 17:01:08 <glozow> welcome to PR Review Club! we're looking at Improve TxOrphanage denial of service bounds today: https://bitcoincore.reviews/31829 17:01:10 <marcofleon> hola 17:01:27 <theStack> hi 17:02:13 <glozow> did anybody get a chance to review the PR or the notes? 17:02:34 <instagibbs> reviewing the PR 17:02:48 <marcofleon> yes, mostly notes and focused on new txorphanage 17:02:51 <monlovesmango> yes, mostly. haven't reviewed test commits 17:03:08 <glozow> instagibbs marcofleon monlovesmango: awesome! 17:03:12 <glozow> Why is the current TxOrphanage global maximum size limit of 100 transactions with random eviction problematic? Can you think of a concrete attack scenario that would affect 1-parent-1-child (1p1c) relay? 17:03:14 <pseudoramdom> Just read the notes as well 17:03:20 <marcofleon> i'm also sending the fuzz tests cmooon 17:04:34 <monlovesmango> bc it enables peers to evict other peer's orphans from your orphanage 17:04:55 <marcofleon> Attack scenario could be an attacker node spamming the orphanage to prevent a child paying for a low fee parent 17:05:22 <marcofleon> if the child keeps getting evicted, then i guess parent would be dropped from mempool? 17:06:18 <glozow> marcofleon: in a 1p1c scenario, the parent isn't in mempool yet. We opportunistically pair it with its child in orphanage if we find one. But if the child gets evicted, then we're out of luck 17:06:38 <glozow> Can you summarize the changes to the eviction algorithm at a high level? 17:06:39 <monlovesmango> if a malicious peer floods you with orphans, its pretty likely that you will evict a child that could have otherwise been resolved as 1p1c? 17:06:41 <pseudoramdom> +1. Low fee rate parent + CPFP scenario, attacker floods with orphan tx? 17:08:00 <marcofleon> well eviction is no longer random, it's based on the "worst behaving" peer, and its the oldest announcement 17:08:25 <marcofleon> highest Dos score peer will have their annoucement removed 17:08:49 <glozow> marcofleon: yep! and we'll get into how we calculate DoS score in a later question 17:08:55 <glozow> Why is it desirable to allow peers to exceed their individual limits while the global limits are not reached? 17:10:14 <marcofleon> Because there could be a peer that is sending a lot of orphans, not necessarily dishonestly 17:10:21 <instagibbs> in the non-adversarial case, it could allow a lot more "honest" CPFPs through 17:10:38 <pseudoramdom> Not all peers may be active actively broadcasting at the same time? 17:10:39 <marcofleon> just makes sense to not waste the space by having an inflexible limit per peer 17:10:41 <glozow> marcofleon: instagibbs: yes exactly. often, peers are using a lot of resources simply because they are the most helpful peer 17:11:15 <pseudoramdom> Is it possible for attacker to game the DDoS scoring? 17:11:31 <monlovesmango> rather than having a common pool of orphans, this pr will restructure orphanage to track orphanage counts and usage by peer. each peer will be subject to a orphan announcment count limit that is the global max announcement count divided by the number of peers, and allowed a set amount of weight for the orphans they announce 17:12:36 <glozow> This was why I originally thought of doing a "token bucket" approach where we'd allow peers resources based on an amount of tokens, and then either replenished tokens if the orphans were useful or destroyed them if it was just spam 17:13:05 <glozow> The new algorithm evicts announcements instead of transactions. What is the difference and why does it matter? 17:13:34 <monlovesmango> pseudoramdom: I was thinking about this too. I think if you flood a node with peers with counts that are just over the limit you could theoretically evict a high weight tx from a peer with high weight announcements but low announcement count 17:13:38 <glozow> marcofleon: yeah, you might as well use the space 17:14:31 <marcofleon> annoucements are wtxid, peer. so if a peer is misbehaving then the orphan will only be removed for that peer. So a peer can't affect the orphan announcments of other peers 17:14:32 <monlovesmango> but i'm not sure thats really too much of a concern..? 17:15:24 <glozow> pseudoramdom: do you mean for an attacker to try to get us to evict a specific orphan? or to appear less resource-intensive than another peer and get them chosen for eviction instead? 17:15:36 <glozow> marcofleon: yes bingo! 17:16:13 <pseudoramdom> I was thinking of latter. Staying just under the limit but still managing to evict certain orphans. 17:16:17 <glozow> I think this relates to pseudoramdom's question: if they did something tricky to try to get a particular orphan of theirs chosen for eviction, that's fine, because we'll keep the transaction as long as another peer has announced it 17:17:40 <marcofleon> as long as you have at least one honest peer 17:18:04 <marcofleon> the orphan should (hopefully) remain in the orphanage 17:18:15 <glozow> pseudoramdom: the other peer won't experience eviction unless they exceed the limits. This still presents a limitation - peers might get evicted if they're just sending stuff at a far faster rate than we manage to process them - but the point is you can't influence the eviction of another peer's announcements 17:18:36 <monlovesmango> any peer that stays under peer limits can't have their orphan evicted by another peer 17:18:47 <glozow> monlovesmango: correct 17:19:05 <glozow> Why is there an announcement “limit” but a memory “reservation”? 17:19:05 <sipa> hi! 17:19:31 <glozow> sipa: hello hello 17:20:52 <glozow> Actually, I feel like you can call them both reservations, haha 17:21:08 <monlovesmango> I think bc announcement count affects CPU usage? and its not much of a concern to allocate a certain amount of memory to each peer. guessing here, think I read something like that in the PR notes 17:21:32 <sipa> glozow: both have a global limit, and a per-peer reservation 17:21:38 <marcofleon> but one can be exceeded and the other is a decreasing share of the pie 17:21:52 <sipa> the difference is the announcement global limit is a constant, but the global memory limit is a function of the number of peers 17:22:17 <sipa> so the "constant" is a global announcement limit, and per-peer memory reservation 17:22:18 <marcofleon> or i think we make the assumption that there is more memory that can be used up to a certain point, but for announcements we're trying to figure out which peer is "overusing" their share 17:22:19 <glozow> sipa: I wouldn't call it an announcement "reservation" though, because you aren't guaranteed it. if more peers appear, your announcement limit decreases. 17:22:19 <monlovesmango> hahah so many ways to state the same things 17:22:30 <glozow> On the other hand, your memory reservation is guaranteed and constant no matter how the peer set changes 17:22:52 <sipa> glozow: that just means the reservation is dynamic :p 17:23:10 <sipa> but yeah, the term reservation is weird in that context 17:24:09 <marcofleon> memory reservation is guaranteed per peer yes? 17:24:20 <marcofleon> oh yeah you said it above 17:24:36 <sipa> "Yes sir, your reservation for 4 tonight at FancyDining is confirmed." - "What do you mean my reservation was dropped to 3, because another group made a reservation?!" 17:24:47 <glozow> marcofleon: yes. you're also guaranteed a certain number of announcements, but it's dynamic 17:25:03 <glozow> sipa: yeah that's what I mean is weird about "reservation" 17:25:13 <sipa> fair enough 17:25:25 <sipa> We shall commence the bikeshedding for a better term now. 17:25:30 <glozow> How does the per-peer memory usage reservation change as the number of peers increases? How does the per-peer announcement limit change as the number of peers increases? 17:25:53 <marcofleon> the per peer memory usage doesn't change iiuc 17:26:01 <monlovesmango> wait, why is one a function of the number of peers and the other not? 17:26:04 <glozow> marcofleon: yes 💯 17:26:11 <marcofleon> but the per peer annoucement limit decreases? 17:26:17 <glozow> yup 17:26:25 <glozow> monlovesmango: indeed, why? 17:26:57 <glozow> What is the purpose of the announcement limit? 17:27:06 <monlovesmango> is it bc anncouncement count affects CPU usage? and so we want to limit this globally? 17:27:31 <glozow> monlovesmango: how does our "budget" for CPU usage change with more peers sending us orphans? 17:28:16 <sipa> monlovesmango: more specifically, the *global* announcement limit directly affects the *latency* of trimming announcements - it's not because we have more peers that we can tolerate a longer latency in processing transactions 17:28:45 <sipa> monlovesmango: but for memory usage, it is normal and expected that your maximum memory usage goes up with more peers - if you're memory-constrained, you should limit your number of peers anyway 17:28:50 <glozow> (the answer is it doesn't. we can't tolerate more announcements when we have more peers) 17:29:15 <sipa> (sorry if i spoiled it?) 17:29:38 <marcofleon> in LimitOrphans we're removing announcements one by one right? and so that's why we're using that limit as a proxy for cpu usage 17:29:51 <monlovesmango> sipa: thank you that answered my question 17:30:04 <monlovesmango> marcofleon: that is also a good point 17:30:11 <sipa> monlovesmango: yup, the number of iterations that loop in LimitOrphans scales directly with the *global* announcement limit 17:30:30 <monlovesmango> cool cool we can move on thanks all :) 17:30:38 <glozow> Why is it ok to remove orphan expiration? 17:31:18 <marcofleon> because we take care of oldest orphans now whenever we start evicting 17:31:37 <glozow> marcofleon: yes exactly, that's the intuition for why the number of announcements is the number we are interested in. not the number of unique orphans (which is what we used to limit) 17:31:56 <sipa> glozow: FWIW, have you benchmarked how long LimitOrphans can take? 17:32:20 <monlovesmango> bc the orhpanage now has other concrete metrics by which we can reliable evict orphans which guarantee the oldest and evicted first and orphans that are no longer needed are removed 17:32:37 <glozow> marcofleon: yep! but wait, doesn't this mean we can be holding on to orphans for days, or weeks, etc? 17:32:57 <monlovesmango> yes..? 17:33:33 <glozow> sipa: not since the rewrite. I can find my old benchmarks and run them. IIRC the `AddTx`s is what takes a really long time 17:34:20 <marcofleon> hmm yeah i guess we can hold onto it for a while now. as long as there's no conflicting txs that arrive in a block or something 17:34:28 <sipa> glozow: sure, but the reason for the existence of the global announcement limit is the time that LimitOrphans can take, not AddTx... so perhaps it's worth benchmarking, and seeing if we can perhaps tolerate a higher global announcement limit (or, otherwise, be sad to find out it needs to be reduced) 17:34:47 <glozow> marcofleon: is it problematic? 17:35:55 <marcofleon> i don't think so, as long as limits aren't being exceeded, seems fine to me 17:36:02 <glozow> sipa: yeah definitely. I just wanted to add some context for anybody who looks at the old benchmarks. What do you think is an acceptable amount of time? 17:36:10 <marcofleon> unless i'm missing something... 17:36:17 <sipa> glozow: probably in the millisecond range? 17:36:18 <glozow> marcofleon: I agree with you 17:36:42 <glozow> sipa: 👍 17:37:39 <glozow> Should we also remove EraseForBlock and instead rely on eviction to remove orphans that confirm or conflict with a block? Why or why not? 17:38:58 <marcofleon> could maybe be worked out somehow, but feels like a separate thing 17:39:04 <marcofleon> so i would say no 17:39:34 <marcofleon> it's not the same reason that an orphan is being removed 17:39:53 <monlovesmango> I would also say no, bc otherwise the caller would have to have knowledge of what is in the orphanage and individually evict txs 17:40:18 <glozow> fwiw, I think the worst case for EraseForBlock is probably worse than LimitOrphans. But EraseForBlock happens on the scheduler thread so speed might not be as much of an issue 17:40:33 <sipa> glozow: it also costs an attacker mining a valid block 17:41:11 <glozow> sipa: is that true? You could look at what's in the projected next block and just create conflicting transactions with a lot of nonexistent utxos 17:42:13 <sipa> glozow: sure, but the victim will still never experience it more than once per block, which is expensive. good point though that it's not necessarily the attacker themselves that pay this cost 17:43:39 <glozow> right. it's not a very worthwhile attack imo 17:45:05 <glozow> And the benefit of freeing up this space is probably worth it 17:45:12 <glozow> What is the purpose of reimplementing TxOrphanageImpl using a boost::multi_index_container along with the eviction changes? 17:46:04 <marcofleon> it's easier on the eyes :) 17:46:08 <instagibbs> glozow we could just look for txid matches instead of scanning inputs :) 17:46:09 <monlovesmango> so that you can look up orphan announcements by either wtxid or peer? 17:47:13 <glozow> instagibbs: indeed. It would require adding a txid index, but maybe that's what we're evicting in practice anyway! Could measure what it looks like in the wild 17:47:31 <glozow> monlovesmango: we can already do that though! 17:47:31 <instagibbs> oh right, wtxid would be the thing on hand 17:48:05 <glozow> instagibbs: right but same thing, maybe we're always evicting exact block txns 17:48:13 <instagibbs> 👍 17:48:32 <sipa> i think it was me who suggested using a multi_index, and the reason was because i saw the older implementation was effectively implementing a multi-index inefficiently, by having separate data structures for per-peer and per-wtxid information about announcements 17:48:48 <monlovesmango> glozow: haha it was just my best guess, didn't actually get around to understanding boost::multi_index_container better 17:49:20 <glozow> yeah it is the more natural data structure. I was also pleasantly surprised to realize that we only needed 2 indexes 17:49:36 <sipa> yeah, i was assuming we'd need 3 17:49:41 <sipa> nice find 17:50:18 <glozow> I've also been told many times that the existing `TxOrphanage` is hard to review 17:50:35 <glozow> so good to hear from marcofleon that it's easier this way 17:50:41 <glozow> What is a peer’s “DoS Score” and how is it calculated? 17:51:51 <monlovesmango> its the max bettween announcement_count/peer_announcement_limit and announcement_usage/peer_announcement_usage_reservation 17:52:33 <sipa> monlovesmango: think of an SQL database with multiple indexes on various columns, but then in-memory entirely, and in a C++y way; it's more efficient (both in memory and CPU) than having multiple independent maps (one for each index), and much easier to keep consistent (because there is no way for the different indexes to contain different information) 17:52:38 <marcofleon> maximum of cpu score and memory score. cpu score is a peers number of announcments / their per peer limit. and memory score is sum of the weight of a peers announced tx weights / the reserved memory usage per peer 17:52:58 <glozow> monlovesmango: marcofleon: yep. how does this compare to having 2 separate scores, and trimming "while CPU score is exceeded or memory score is exceeded" ? 17:54:08 <marcofleon> hmmm well a peer can have a dos score of more than 1 17:54:17 <marcofleon> or maybe i'm confused with the q 17:54:44 <monlovesmango> sipa: that helps my conceptual understanding a lot thanks 17:55:06 <monlovesmango> glozow: this is much simplier for sure 17:55:23 <glozow> Er, my point was "it's the same thing" 17:55:25 <sipa> i think the question is: why do we have a *single* DoS score = max(mem_score, ann_score), as opposed to two different DoS scores that are never compared with one another, and a rule "trim the worst announcement offenders while there are any" + 'trim the worst memory offenders while there are any" 17:55:35 <monlovesmango> bc we only have to track one score rather than two 17:56:41 <marcofleon> wait this is actually a good question, i'm not immediately seeing what the benefit of one score is over two 17:56:48 <marcofleon> is it more gameable somehow? 17:56:51 <monlovesmango> i think this way also allows more the advantage of allowing peers to exceed limits/reservations 17:56:55 <sipa> i don't think the two approaches are equivalent, fwiw, but the difference is small 17:57:22 <monlovesmango> as long as global limits are not reached 17:58:06 <marcofleon> hmm so global limits reached, we get dos scores and target a peer based on that 17:59:01 <glozow> So we're comparing this approach to having 2 loops. "While global announcement limit is exceeded, pick the peer with the most announcements, evict from them. Then, while global memory limit is exceeded, pick the per with the most memory usage, evict from them." 17:59:56 <monlovesmango> well practically speaking, usually only one limit will be reached at a time so it would usually only be one loop no? 17:59:57 <glozow> This approach basically rolls them into 1 loop. "While global announcement or memory limit is exceeded, pick the peer with the highest score (max ratio of both) and evict from them." 18:00:02 <Emc99> What is dos? 18:00:12 <instagibbs> denial of service 18:00:29 <Emc99> Thanks 18:00:36 <glozow> oh oops we are out of time! 18:00:38 <marcofleon> is the two loop approach worse in some other way i'm not seeing other than it's two loops 18:01:43 <marcofleon> thanks for hosting and answering qs glozow! good stuff as usual 18:01:43 <glozow> fwiw, I think that having a ratio-based score is good if we want to consider giving different peers different reservation amounts 18:01:44 <monlovesmango> one small flaw with this approach is that if count limit is reached first, the highest DOS peer might actually be violating the memory reservation and removing it won't immediately resolve the global limit being exceeded 18:02:05 <monlovesmango> thanks for hosting glozow!! 18:02:37 <glozow> #endmeeting