Increased b-msghand thread utilization due to runestone transactions on 2026-02-17

b10c · February 17, 2026, 9:04pm

I noticed my peer-observer nodes spending a lot of time in b-msghand thread today on 2026-02-17 starting around 1:30 am UTC. Note that the screenshots are UTC+1.

(red means we spend 1000ms out of 1000ms (i.e. 100%) of our time in the thread)

Around the same time, the mempools started to fill up from around 6% to 60%

and the sum(inv-to-send) sizes across all peers increased significantly. To about 1.7 M on some nodes.

with the maxium inv-to-send sizes being about 60k

edit: It was also visible in the Bitcoin protocol Ping-Pong time monitoring I do on localhost:

Randomly sampling from some of the transactions my nodes saw:

All include a UNCOMMON•GOODS rune, some also inscriptions, and NFT related content.

b10c · February 17, 2026, 9:06pm

Interestingly, the last time I noticed this was also on a 17th: Increased b-msghand thread utilization due to many 352 byte runestone inscriptions on 2025-11-17

b10c · February 17, 2026, 9:14pm

Another broadcast happend at 17:20 UTC

b10c · February 17, 2026, 9:18pm

Here’s a snapshot of the continues profiling flamegraphs I’ve been experimenting with from the 1:30am broadcast: Pprof.me

It doesn’t show anything too suspicious I think. The getrawaddrman RPC taking long is known to me, but unrelated to this.

b10c · February 17, 2026, 9:28pm

Looking on a different node at the broadcast at around 17:20 UTC shows that we spend about 20% of our time in CompareMiningScoreWithTopology() (i.e. the new name of CompareDepthAndScore).

Would one possibility be to have the inv-to-send sets always sorted from the beginning to avoid doing it every time we decide to INV something?

ajtowns · February 17, 2026, 11:50pm

Sorting before sending ensures that you’re not leaking info about the order in which you received the txs by the order in which you announce them.

I think have a global queue for rate limiting outbound announcements is the solution here, fwiw. Here’s a gist:

gist.github.com

https://gist.github.com/ajtowns/d61bea974a07190fa6c6c8eaef3638b9

invtosend.md

## Rework idea for p2p invtosend queues

### Problem

Currently, when we accept a transaction we pass it to `InitateTxBroadcastToAll()` to relay it to our peers. This then checks it against the peer's `m_tx_inventory_known_filter`, and if it wasn't found adds it to each peer's `m_tx_inventory_to_send` queue. That queue is serviced on average every 5s (2s for outbounds) with the following logic:

 * if the tx is no longer in the mempool, drop it from the queue
 * aim to include `70 + queue_size/1000*5` txs in the INV message
 * go from highest priority tx to least,
   * drop it if it matches the peer's known filter

This file has been truncated. show original

I’ve been running that code with a very low rate limit for inbounds (4tx/s instead of the 14tx/s target from core) to ensure it behaves well when backlogs occur, and it seems to.

b10c · February 18, 2026, 12:00am

To add: here are the transaction inv-sizes to a local spy-node-like peer (does not INV any transactions to the node).

And here is the rate at which my local spy-node-like peer received WTx invs.

Normally, one would expect the rate at which we send INVs to remain fairly constant since we’re on a fixed timer. However, it seems like for some nodes it actually slowed INV-sending down by about 50% while for others it (at least briefly) speed up. My theory for slowing down is: the node is overloaded to a point where it can’t send out INVs at a normal rate.

b10c · February 18, 2026, 12:14am

Sure. I think I wasn’t clear in my description: Don’t keep a set (as we do currently), but keep a (sorted) queue. When inserting a to-be-announced transaction, put it in the right place in the queue. This avoids having to re-sort the whole set every 5s (or 2s for outbounds) for each peer. The queue is already sorted, so we can just pick the top 70 + x transactions to INV. Then, do some clean up removing evicted/confirmed transactions by iterating the queue once.

Maybe that’s similar to you gloabl queue. I’ll have a look.

ajtowns · February 19, 2026, 1:46am

As far as data structures go, a sorted queue is best stored as a set anyway; problem is that with cluster mempool the sort order doesn’t stay consistent as txs are added – a cpfp tx can bump the ordering of the pre-existing parent, eg. That breaks set invariants, so risks UB if using std::set, and I couldn’t see a simpler way of dealing with it properly without just doing a re-sort.

ajtowns · February 20, 2026, 5:31am

Cleaned up the code, added tests etc, and PRed as:

github.com/bitcoin/bitcoin

p2p: Replace per-peer transaction rate-limiting with global rate limits

master ← ajtowns:202602-mempool-invtosend

opened 03:09AM - 20 Feb 26 UTC

ajtowns

+671 -105

Per-peer `m_tx_inventory_to_send` queues have CPU and memory costs that scale wi…th both queue size and peer count. Under high transaction volume, this has previously caused severe issues ([May 2023 disclosure][1]) and still can cause measurable delays ([Feb 2026 Runestone surge][2], with the msghand thread observed hitting 100% CPU and queue memory reaching ~95MB). This PR replaces the per-peer rate limiting with a global queue using dual token buckets (limiting transaction by both count and serialized size). Transactions that arrive within the bucket capacity still relay nearly immediately, but excess transactions queue in a global backlog and drain as the token buckets refill. Key parameters: - Count bucket: 14 tx/s, 420 capacity (30s buffer) - Size bucket: 20 kB/s (~12 MB/600s), 50 MB capacity - Outbound peers refill faster by a factor of 2.5 Per-peer queues are retained solely for privacy batching and are always fully emptied, removing the old `INVENTORY_BROADCAST_MAX` cap. This reduces the memory and CPU burden during transaction spikes when the queuing logic is engaged from O(queue * peers) to O(queue), as the queued transactions no longer need to be retained per-peer or re-sorted per-peer. Design discussion: https://gist.github.com/ajtowns/d61bea974a07190fa6c6c8eaef3638b9 [1]: https://bitcoincore.org/en/2024/10/08/disclose-large-inv-to-send/ [2]: https://bnoc.xyz/t/increased-b-msghand-thread-utilization-due-to-runestone-transactions-on-2026-02-17/81

b10c · March 12, 2026, 9:38am

There seems to have been a small mass-broadcast event a few hours ago. Here are some observations:

On bob, running #34628 with default txsendrate, the inv-to-send queues stayed a lot smaller. This is expected as we drain the per-peer queue immediately with 34628.

On bob, we spend a lot less time in the b-msghand thread:

And the response time for a localhost ping didn’t increase for bob, while it did for the other nodes.

The INVs sent by bob were larger than the ones by the other nodes. With 34628, we fully drain the per-peer queue, so INVs can be larger than before, were we capped it at 70 (unless the queue got large).

ajtowns · March 12, 2026, 12:51pm

Double-size INVs might not be verymeaningful – when there’s a backlog with #34628 the timing can be out of sync, acting like “send-inv, send-inv, populate-inv-queue, populate-inv-queue, send-inv”, so that the second send-inv is empty, and the third send-inv has to tranches of txs. Because populate-inv-queue targets a particular tx rate, send-inv over time should still average to the same 70tx-per-send-inv rate, even though individual send-inv’s can be two (or more) times that.

Are you able to do a rolling-average over 30s or 60s for that graph by any chance? I think that should bring the outgoing inv rates in line with each other.

b10c · March 12, 2026, 10:12pm

I have “number of WTx entries send per second, averaged over 60s”. Note that these are to an -addnode peer (outbound), so we send INVs every two seconds.

b10c · March 12, 2026, 10:29pm

Node jade also runs the patch and is using the new -txsendrate=4 arg. During the mass broadcast, we see it announcing 10 transactions per second to the outbound addnode peer. -txsendrate is configuring the relay rate to inbound connections. As we relay to oubounds at 2.5x the rate, we’d expect 2.5 * 4 = 10, which exactly matches our expectation.

It’s visible that node jade takes a lot longer than the other nodes to process the annoucements.

ajtowns · March 13, 2026, 2:52pm

Is jade also avoiding high b-msghand thread time? If so, it’s behaving as expected: 4tx/s is too slow for mainnet for inbound connections (5000tx/block is about 8 tx/s), so it’s mostly a good way of observing the backlog behaviour, rather than something sensible to run.

The ‘rate of WTx over 60s’ graph looks good to me. Interesting that the floor rose (~2tx/s between 3:10 and 3:20 vs 5tx/s between 3:40 and 3:50).

b10c · March 23, 2026, 11:00am

yes! Though jade also has a stronger and different CPU that the others, so the numbers aren’t directly comparable.

Looking at a recent event, now that the inbound slots on bob have filled up fully, it’s visible that the time we spend in b-msghand is mostly transaction validation, but not transaction propagation anymore. On the other nodes, we have a higher b-msghand utilization for longer, as we first do transaction validation and then the expensive transaction propagation.