We have a (admittedly busy) Dev server with a Tentacle installed that deploys a number of sites (20-30) and services (8-10) and a Cloud hosted server instance.
For no particular reason the service loses connection with our Cloud instance and never reconnects. We simply get the 01:00:00 timout messages repeatedly on the health-checks and it constantly fails on the Deploy Site step when acquiring the package.
Our other Tentacles seem fine and fast.
A service restart does not fix the issue.
There is nothing in the logs indicating an issue with connectivity around the time.
A Reinstall via the Octopus app does not seem to fix it.
A Delete and then Install New Tentacle did appear to finally fix something.
We checked the firewall rules on the server and network and everything appeared correct and nothing had changed in either respect.
We donât know what to do about this (this is not the first time it has happened on this server) however we cannot see any reasons or factors in causing the problem. We deployed to that server successfully earlier today and yet 6 hours later the Tentacle is not responding.
Without any further leads or information we donât know what to do, and at the same time we understand that you might not be able to do anything either! I just wanted to submit this report that there might be an issue somewhere that might need looking at if someone can repro it? Vague I know.
If anyone however has seen this behaviour before then any quicker workarounds would be much appreciated.
I would recommend giving that document a thorough read-through. If none of the suggestions in the above link help you, could you please provide me with the answers to the questions below so I can get some more context and provide you with better answers.
What type of tentacle are you deploying?
Is it on-prem or a cloud server that the tentacle is on? (if cloud, which cloud?)
Is the server the tentacle on behind a Proxy?
What OS is on the server the tentacle is on?
From your experience, does this seem to happen at a specific time everyday or perhaps a set amount of time after the install or completely random? Does it coincide with a scheduled task on that server?
Can you provide a quick run-down of what you tried from the aforementioned troubleshooting links and what were the results
Iâm looking forward to hearing back from you so we can solve this issue.
Thanks for getting back to me, and I totally appreciate this is a vague report. We are not sure how to reproduce currently but I can let you know the answers to your queries.
We only use Listening Tentacles
The Tentacles are on on-prem (HyperV) VMs in our hosting centre
The server is behind a proxy for serving public requests but we have a specific deployment DNS entry and firewall rule to allow 10933 traffic from our Octopus Cloud instance directly.
All our Tentacles are currently on Windows Server (2016/2019).
I will see if I can narrow down some circumstances around the failure. So far it has seemed random. No set time after install and we have other Tentacles running that have not so far failed. I will see about coinciding scheduled tasks
We havenât seen the issue occur again this week so I will go through the troubleshooting guide(s) next time it happens and feedback the result.
Thanks again for looking at this for us. I will let you know if we discover anything else about this issue.
Our DB jump-box is currently the biggest culprit.
This time there wasnât the issue with it being unable to restart.
We deployed a version of our software at 10:05 this morning and that was all green.
Went to deploy the next version at 16:36 (BST) and the Acquire Packages stepped failed with a timeout connecting to the tentacle
An error occurred when sending a request to âhttps://tentacle domain:10933/â, before the request could begin: The client was unable to establish the initial connection within the timeout 00:01:00.
15:42:02 Verbose | at Halibut.Transport.SecureListeningClient.HandleError(Exception lastError, Boolean retryAllowed)
| at Halibut.Transport.SecureListeningClient.ExecuteTransaction(Action1 protocolHandler, CancellationToken cancellationToken) | at Halibut.HalibutRuntime.SendOutgoingHttpsRequest(RequestMessage request, CancellationToken cancellationToken) | at Halibut.HalibutRuntime.SendOutgoingRequest(RequestMessage request, CancellationToken cancellationToken) | at Halibut.ServiceModel.HalibutProxy.Invoke(MethodInfo targetMethod, Object[] args) | --- End of stack trace from previous location where exception was thrown --- | at System.Reflection.DispatchProxyGenerator.Invoke(Object[] args) | at generatedProxy_1.StartScript(StartScriptCommand ) | at Octopus.Server.Orchestration.Targets.Tentacles.TentacleRemoteEndpointFacade.ExecuteCommand(StartScriptCommand command) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\Targets\Tentacles\TentacleRemoteEndpointFacade.cs:line 61 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionExecution.Immediate.ExecutionTargets.TentacleExecutionTarget.Execute(ScriptCollection bootstrapperScripts, IReadOnlyList1 boostrapperArguments, IReadOnlyList1 files, Nullable1 forceIsolationLevel, Boolean raw, Nullable1 isolationMutexTimeout) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\ActionExecution\Immediate\ExecutionTargets\TentacleExecutionTarget.cs:line 47 | at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionExecution.Immediate.ImmediateExecutor.RunOnTarget(ScriptCollection finalBootstrapperScripts, IReadOnlyList1 bootstrapperArguments, IReadOnlyList1 allFiles, Nullable1 isolation, Nullable1 isolationMutexTimeout, ICalamariFlavour flavour, IReadOnlyList1 deploymentTools) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\ActionExecution\Immediate\ImmediateExecutor.cs:line 183
| at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionExecution.Immediate.ImmediateExecutor.ExecuteCalamari(ICalamariFlavour calamariFlavour, String calamariCommand, IReadOnlyList1 calamariArguments, IReadOnlyList1 files, IReadOnlyList1 deploymentTools, VariableCollection extraVariables, TargetManifest targetManifest, Nullable1 isolationMutexTimeout) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\ActionExecution\Immediate\ImmediateExecutor.cs:line 138
| at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionExecution.CommandBuilders.CalamariCommandBuilder.Execute() in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\ActionExecution\CommandBuilders\CalamariCommandBuilder.cs:line 143
| at Octopus.Server.Orchestration.ServerTasks.Deploy.Steps.Acquire.PackageAcquisitionExecutionHandlers.<>c__DisplayClass3_0.b__0(IActionHanderContext ctx) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\Steps\Acquire\PackageAcquisitionExecutionHandlers.cs:line 32
| at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.AdHocActionDispatcher.InvokeActionHandler(Machine target, ActionHandlerInvocation actionHandler, ActionAndTargetScopedVariables actionAndTargetScopedVariables, IExecutor executor, TargetManifest targetManifest) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\ActionDispatch\AdHocActionDispatcher.cs:line 167
| at Octopus.Server.Orchestration.ServerTasks.Deploy.ActionDispatch.AdHocActionDispatcher.Dispatch(Machine machine, ActionHandlerInvocation actionHandler, VariableCollection variables) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\ActionDispatch\AdHocActionDispatcher.cs:line 62
| at Octopus.Server.Orchestration.ServerTasks.Deploy.Steps.Acquire.PackageAcquisitionExecutionHandlers.FindPackage(String packageId, IVersion version, String packageHash, PackageSearchOptions packageSearchOptions) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\Steps\Acquire\PackageAcquisitionExecutionHandlers.cs:line 40
| at Octopus.Server.Orchestration.Targets.Common.PackageStagingMediator.TryFindPackageInCache(StoredPackage packageToPush, PackageAcquisitionExecutionHandlers execs, StagedPackage& stagedPackage) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\Targets\Common\PackageStagingMediator.cs:line 118
| at Octopus.Server.Orchestration.Targets.Common.PackageStagingMediator.PushPackage(PackageIdentity packageIdentity, StoredPackage packageToPush, IDeploymentPackageDownloader deploymentPackageDownloader, DeltaCompressionOptions deltaCompressionOptions, PackageCacheOptions packageCacheOptions, PackageAcquisitionExecutionHandlers execs) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\Targets\Common\PackageStagingMediator.cs:line 99
| at Octopus.Server.Orchestration.Targets.Common.PackageStagingMediator.DownloadPackageAndPushToAgent(PackageIdentity packageIdentity, IDeploymentPackageDownloader deploymentPackageDownloader, DeltaCompressionOptions deltaCompressionOptions, PackageCacheOptions packageCacheOptions, PackageAcquisitionExecutionHandlers execs) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\Targets\Common\PackageStagingMediator.cs:line 68
| at Octopus.Server.Orchestration.ServerTasks.Deploy.Steps.Acquire.AcquireMachinePackageTask.Acquire(AcquiredPackageMap acquiredPackageMap, ITaskContext taskContext, LogContext logContext) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\Steps\Acquire\AcquireMachinePackageTask.cs:line 101
| at Octopus.Server.Orchestration.ServerTasks.Deploy.ExecutionTaskController1.ExecuteWithTransientErrorDetection(Action action, DeploymentTarget deploymentTarget) in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\Deploy\ExecutionTaskController.cs:line 636 | at Octopus.Server.Orchestration.ServerTasks.OctoThreadClosure1.Execute() in C:\buildAgent\work\47b5b27d3625d6ca\source\Octopus.Server\Orchestration\ServerTasks\OctoThreadClosure.cs:line 71
We logged into the server and found that the service was not running.
There is nothing logged in the Tentacle logs regarding an error or shutdown. The server uptime is over 9 days so itâs not a restart. Nothing in the Windows Event log.
Just⌠stopped.
Itâs becoming frustrating that the tentacles just seem unreliable.
We have enough work to do without constantly babysitting Windows services. However we do also understand that this really is quite a vague issue to deal with. Thereâs no network interruptions from what we can see. Is there anything else we might be missing? Or that we can try?
I suspect the issue youâre seeing is most likely going to be an environmental issue, as we havenât been getting reports of this behaviour elsewhere. So at the moment, the best I can do is give you some ideas around where else to look. If a service has stopped, you would expect there to be some details around what terminated it, whether itâs in the Octopus Logs, server logs or elsewhere.
Common culprits that I have seen affect Tentacle services include Anti-virus and DSC configurations.
I can propose a workaround, however I fully understand this is not the advice you are after.
I too have been slow getting back to you so apologies for that.
The watchdog service might be useful⌠and the document you linked is actually very telling!
âThis typically occurs during a restart after Windows Updates have been installed.â
This would be our most likely scenario as we are mainly noticing this issue after updates, and we have recently improved our Windows Update schedule.
The âWhy Does It Happenâ section is also ringing bells for us, not only in the fact that we donât really know why it happens, but the suspicion that it might be to do with slow CLR initialisation might also lead us to the cause of other issues we also have.
We recently switched to Net Core for our main service worker, and I have been investigating why this service too does not start or stop reliably. It could all be related.
I feel that this might be the best solution to our problem for now so we are very grateful to you for pointing out this hidden gem of functionality.
Another day another timeout.
Itâs not an issue that can be solved with Watchdog it seems.
That is running normally. However our build server (TeamCity) has tried to contact the Tentacle and recieved a timeout error.
[14:36:13] Failed: Acquire packages
[14:36:13] Acquiring packages
[14:36:13] Making a list of packages to acquire
[14:36:13] The step failed: Activity failed with error 'An error occurred when sending a request to âhttps://âtentacle urlâ:10933/â, before the request could begin: The client was unable to establish the initial connection within the timeout 00:01:00.
[14:36:13] The client was unable to establish the initial connection within the timeout 00:01:00.â.
Iâve checked the machine and it is running, the service is running, and the logs show the client authenticating at that time. No other log lines until a watchdog check shows up 4 minutes later.
I say contact the Tentacle. I of course mean TC is talking to Octopus Server (Cloud) and that in turn canât talk to the Tentacle
Is there anything we might be missing in network gateway setup that might be causing issues with the Server talking to our Tentacle?
If the Server Azure worker restarts and / or moves IP is it going to mess up a routing table somewhere?
Iâm just guessing as network engineering is not my first discipline and my colleague set up our firewalls and routers. Is there anything I should be checking there?
I might get them to look at the logs at least and see if there is any unusual blocking occurring.
Looks like I might have to reopen this issue.
Our DB jump box seems ok for now, but our new linux tentacles are falling apart.
They seem to fail every day.
Server logs show it being unable to communicate with the tentacle. The tentacle just seems to stop listening. Itâs logs match the times shown. One hour itâll be there and the next hour it wonât
Iâve already enabled Trace logging on one of them to see if we can catch the error.
This is causing us issues as it is failing deploys for no real reason.
2020-09-17 14:10:45.5140 86913 5 INFO listen://[::]:10933/ 5 Accepted TCP client: [::ffff:13.66.133.169]:7360
2020-09-17 14:10:45.5140 86913 5 TRACE listen://[::]:10933/ 5 Performing TLS server handshake
2020-09-17 14:10:45.8390 86913 58 TRACE listen://[::]:10933/ 58 Secure connection established, client is not yet authenticated, client connected with Tls12
2020-09-17 14:10:45.9698 86913 58 TRACE listen://[::]:10933/ 58 Begin authorization
2020-09-17 14:10:45.9698 86913 58 INFO listen://[::]:10933/ 58 Client at [::ffff:13.66.133.169]:7360 authenticated as 4C808322E57AAD30A9B009817540F5F118A54FC0
2020-09-17 14:10:45.9698 86913 58 TRACE listen://[::]:10933/ 58 Begin message exchange
2020-09-17 14:10:46.3056 86913 58 TRACE listen://[::]:10933/ 58 Received: IScriptService::StartScript[1] / c6426903-685c-44f5-8d95-62de6f6fc742
2020-09-17 14:10:46.3056 86913 59 TRACE [ServerTasks-33555] [RunningScript] [Read Lock] [no locks] Trying to acquire lock.
2020-09-17 14:10:46.3056 86913 59 TRACE [ServerTasks-33555] [RunningScript] [Read Lock] ["ServerTasks-33555" (has a read lock)] Lock taken.
2020-09-17 14:10:46.3155 86913 58 TRACE listen://[::]:10933/ 58 Sent: Halibut.Transport.Protocol.ResponseMessage
2020-09-17 14:10:46.6080 86913 59 TRACE [ServerTasks-33555] [RunningScript] [Read Lock] ["ServerTasks-33555" (has a read lock)] Releasing lock.
2020-09-17 14:10:46.8122 86913 58 TRACE listen://[::]:10933/ 58 Received: IScriptService::GetStatus[2] / 495b22bb-7906-41bd-8675-da97e3a1bbf2
2020-09-17 14:10:46.8243 86913 58 TRACE listen://[::]:10933/ 58 Sent: Halibut.Transport.Protocol.ResponseMessage
2020-09-17 14:10:47.1229 86913 58 TRACE listen://[::]:10933/ 58 Received: IScriptService::CompleteScript[3] / 2dd72212-8e7e-4f9a-a4e1-b028e584839d
2020-09-17 14:10:47.1247 86913 58 TRACE listen://[::]:10933/ 58 Sent: Halibut.Transport.Protocol.ResponseMessage
2020-09-17 14:20:47.1265 86913 58 TRACE listen://[::]:10933/ 58 No messages received from client for timeout period. Connection closed and will be re-opened when required
Logs are showing it closing itâs connection and then it doesnât respond to the next call. Both our linux targets are currently hanging.
A Check Health connectivity call is also failing.
At the moment only a restart of the target service seems to bring it back but this is a poor solution. We also cannot run a watchdog as this is Ubuntu