View Single Post
  #1  
Old 03-08-2026, 14:04
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 333
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Hi WhoCares!
Thank you so much for the detailed bug report! Both issues were spot-on.

v1.5.1 is now available from the dashboard download link with the following fixes:

1) Printf mixing (agent): The hb_failed status line was using \r without a terminating \n, causing it to overwrite normal output. Fixed — now uses \n delimiters so the "SERVER OFFLINE" message prints cleanly on its own line.

2) Heartbeat timeouts every ~300s (server): This was the more critical one. save_state() was holding the global lock during the entire disk write — serializing millions of DPs with struct.pack in a loop while every API endpoint waited. As the DP table grows, save time grows, and at ~15M+ DPs it was blocking long enough to trigger agent heartbeat timeouts.

Fix: new save_state_background() takes a fast snapshot of all data structures under lock (milliseconds), then releases the lock and writes to disk outside it. Agents no longer see any interruption during auto-save.

Fleet is currently at 28 workers / ~21 G/s, 9.3% progress, efficiency 99.9%. No more periodic disconnections.

Thanks again for catching these — the heartbeat timeout one in particular would have become worse as the DP table keeps growing toward collision.
Reply With Quote
The Following User Says Thank You to cjack For This Useful Post:
Jupiter (03-11-2026)