Using independent counters should also speed up p2jb (and avoid mutex lock slowdown). My suggested changes to p2jb lua:
prepare_fds() function
local function prepare_fds()
-- 1. Expand the cores table to use more system threads...
In p2jb, does increasing the number of threads reduce the wait time?
I doubled the number of threads in the attached file if you want to try it out. The original is from the repo of the japanese games.