In the past week we had 2 occurrences of this problem, with different tentacles. Octopus was unable to connect to a tentacle and perform a health check.
There was nothing wrong with the tentacle and it could be reached from a browser on the Octopus server. Rebooting the server which hosts Octopus solves the problem and seems to be the only solution. Restarting and upgrading the Octopus service didn’t solve the problem. We are currently running on 3.10.1 and the problem occurred once using this version.
Hi Arnoud,
Thank you for getting in touch. It sounds similar to some of the issues I’m trying to get to the bottom of.
Does restarting just the Tentacle resolve the issue?
Could you please send me the Health Check task log (Step 6&7) that failed along with the Tentacle and Server log files?
Also if you could create a process dump of the Tentacle and Server processes when the issue occurs that would be great. No need to send them in yet, it just may save time later.
Robert W
Hi Robert,
Restarting the Tentacle doesn’t resolve the issue. The first time we thought it had to do with the client. We restarted the Tentacle. Reinstalled the Tentacle. Upgraded the Tentacle. Rebooted the server hosting the Tentacle. Nothing worked.
Checking the Tentacle ip and port in a browser on the Octopus server gave a message everything was fine. Restarting and upgrading Octopus Deploy didn’t help. Only rebooting the server hosting Octopus Deploy helped.
The second time on another server I tried restarting the Tentacle, but that didn’t work so I directly went to rebooting the server hosting Octopus.
The Octopus log is from the first issue. I edited this file to only show the log of two servers in a cluster. One had the problem, the other was fine. Same configuration. The Tentacle log is from the second time it occured. (Around 14:16)
OctopusServer-20-02.txt (7 KB)
OctopusTentacle.0.txt (1 MB)
Hi Arnoud,
Thank you for the logs. The health check task log would be useful too.
Could you describe what you mean by cluster? Have you got Octopus Server setup in a HA configuration? Or are they sharing a database? or completely separate?
How many polling and how many listening tentacles do you have?
Robert W
Hi Robert,
I attached the health check task log.
The machine is behind a load balancer with one other webserver, sharing a database. We have 13 listening tentacles and no polling tentacles.
Kind regards,
Arnoud
ServerTasks-26311.log.txt (8 KB)
Hi Arnoud,
Ah, I think I understand the setup, a single octopus server with two tentacles. The tentacle machines are setup in a load balanced cluster (which should make no difference to Octopus).
From the tentacle log I can see that connections are being made multiple times per second, which is quite strange. I’ll look into this.
Now that the tentacles are updated, the next time this happens, would you be able to send in:
- The octopus and tentacle log files for the period
- Any tasks that are running and whether they are stuck
- A process dump of the
Octopus.Server.exe
. An analysis of the file may reveal what is going on, so that might be enough
I’ve created a secure upload location: https://file.ac/5wNQ25brGjI/
Robert W
Hi Robert,
This morning the problem occurred again. I collected the logs and a process dump analysis; they are uploaded to the secure upload.
The problem occurred when trying to make a deployment to a development server for 2 (of 2) tenants.
Again restarting the tentacle didn’t solve the problem. Also restarting the Octopus service didn’t help. Rebooting the Octopus server machine solved the problem again.
Kind regards,
Arnoud Dekker
Hi Arnoud,
Thank you for that info. I can see from the logs that the constant stream of connection requests is still occurring. I’m going to try and reproduce it, but in the mean time, could you scan back in the logs to see when it started and send in that section of the log? I’m looking for the Accepted TCP client
(Tentacle) and Opening a new connection
(Server) log messages that occur every second.
Unfortunately the analysis file didn’t show anything. Do you mind uploading the dump file itself?
Robert W
Hi Robert,
I’m afraid the log doesn’t go back further than a week. It seems it was already happening at that time. I uploaded the oldest log files and also the dump file.
Kind regards,
Arnoud
Hi Arnoud,
Do you by any chance have a setting named Halibut.TcpClientPooledConnectionTimeout
in your Octopus.Server.exe.config
file (C:\program files\Octopus Deploy\Octopus
? If so, what is it’s value? Could you comment it out and see if the problem persists?
Robert W
Hi Robert,
I forgot about this setting. Sorry for not mentioning it. I think I added it last week, as I read it on the Octopus forum it might offer a solution to the connection problem.
The settings was:
I removed it and restarted Octopus.
It looks like the number of connections has returned to normal now.
Please also let me know when you would like me to upgrade Octopus. For now I’ll stay on 3.10.1, unless you want me to upgrade.
Met vriendelijke groet,
Arnoud Dekker
So you were having the non responsive Tentacle problem before adding this
setting or was it a different problem?
It occurred before and after changing this setting.
After it occurred the first time I searched around and came to this “Halibut.TcpClientPooledConnectionTimeout “ setting.
But yesterday we had the problem again while this setting was active.
So it seems not to be of influence, other than making more connections than is necessary.
Hi Arnoud,
Ok, thank you. Initially I thought the 0 setting was causing a connection to be created and immediately discarded. However I checked the code and I don’t think that is the case. This gives us a clue, the server is sending a constant stream of something to the server.
I couldn’t find anything significant in the dump file, which is still something as it means there isn’t a stray task running blocking the connection.
Could you sent up Tentacle Ping between the octopus server and tentacle server and see if it shows any errors when the problem does occur.
Also, when this occurs again, could you please turn on trace logging for a few minutes (no need to restart the service). This will reveal what kind of messages are being sent (the log message looks like TRACE https://localhost:10933/ 48 Sent: IScriptService::StartScript[1]
).
Also, is it always one of the two tentacles that are in the load balanced set?
Is there anything between the octopus server and tentacle server? eg Proxy, Firewall appliance?
Robert W