We’re having trouble with the automated health check hanging on a single tentacle (out of 396 tentacles ). The check was kicked off 2 days ago and seems to have got completely stuck on this one machine - it’s now holding up following health checks and tentacle upgrades, and the Cancel button seems to be having no effect.
Any ideas?
== Running: Check deployment target: B0775-WEB0158-S ==
11:41:21 Verbose | Performing health check on machine
In 3.8.8 we introduced this fix which might help in your scenario: https://github.com/OctopusDeploy/Issues/issues/3142 . Its worth upgrading and giving that a try. But please be aware that the issue you are seeing might still be related to one of the open issues listed in the above links.
Once again, apologies for these issues. We are def working hard to solve them.
I’m currently working on this kind of issue. In addition to Dalmiro’s suggestions, could you capture a process dump of the affected tentacle process? If so, you can upload it at https://file.ac/vNOYB89Fu_k/
If you restart the tentacle service, that should cause health check task to stop.
Unfortunately we had to get the issue fixed quickly, and didn’t have time to capture the process dump.
What seems to have happened is that we had (somehow) two instances of the tentacle service running, and the “rogue” one seemed to be locking up port 10933, which prevented communication with the correct instance from the server.
We solved it by doing a tentacle reinstall.
Would be nice to know why the health check didn’t time out, but unfortunately we can’t now provide you with the evidence you need to look into it.
Which version are you running? Are they polling tentacles? We recently fixed a problem with the health check not cancelling, so that might be the underlying cause.
If it is polling, it makes sense that the health check did finish as the right tentacle did not pick up the health check message.