Using independent counters should also speed up p2jb (and avoid mutex lock slowdown). My suggested changes to p2jb lua:
prepare_fds() function
local function prepare_fds()
-- 1. Expand the cores table to use more system threads (avoiding core 11)
local OVERFLOW_CORES = { 0...