Uploading packages to Tentacle hangs over slow network

raoulus · May 28, 2014, 10:09am

Hello,

deploying a project over a slow network (350kb/s) is almost impossible.
The “Acquire packages” step never ends because the packages hangs in the “Uploading” operations.
Some packages are transferred and some hangs randomly. I’m able to deploy only canceling the deploy and retrying (many times) to permit the tentacle to use the cached package.

The process il made of 48 steps and almost all are package installations. Sizes are 5-10mb, only 1 is almost 100mb.
We have 5 environments. 2 of them are on local network and package uploading works properly. 3 environment are over 3 different vpn (transfer rate about 400kb/s) and have the same problems.

Octopus Server and Tentacle versions: 2.4.5.46 (but same problems on 2.3.*)

I attach the Raw Log of the deploy.
If you need I have the octopus server and tentacle logs with trace log level.

Any ideas?

Thank you very much!!!

ServerTasks-8228.log.txt (67 KB)

Vanessa_Love · May 29, 2014, 12:40am

Hi,

Thanks for getting in touch!

It is likely in this situation that the following might help with the bandwidth:.

As you can see from the second screenshot, in the project -> process when you are defining the package step, select “Each tentacle will download the package directly from the NuGet Server”.

Can you try this setting and let us know if it helps.

Thanks!

Vanessa

anton_s · June 2, 2014, 4:24am

Hi,

There is definitely something wrong with the last 2 releases of Octopus relating to uploads. I previously logged an issue about the Progress bar no longer indicating change in upload progress in the prev version.

In this current version, uploads over slower (mine is at least 5 to 10Mbit) are now often just hanging. Not failing (like many versions back) but just hanging.
A cancel immediately stops the deployment (no waiting required) as is usually the case when upload is actually doing something.

I suspect a link!

raoulus · June 4, 2014, 8:45am

I modified each step of my project setting “Each tentacle will download the package directly from the NuGet Server”. We use klondike as a nuget server.

Good news: the download of the packages from each tentacle worked without problem.

Bad news: this approach leads to formal and network reconfiguration.
The network has to be configured for bidirectional comunication (server --> tentacle:10933 AND tentacle --> nuget server).
This is a big issue because customers are not always willing (for security polices) to adapt and extremely slow to change network configuration (months).
3 of our 4 production environment are very difficult to adapt to the new configuration and the “octopus server sends the package to the tentacle” is the right option.
Is it possibile to change the nuget server url depending on the deployment environment? Every environment has the nuget server on a different IP (because of the VPN network map).

Nicholas_Blumhardt · June 5, 2014, 3:45am

Thanks for all of the info. We’ve had a few more reports now of problems with this version, so we’re going to try reproducing this locally in different ways and post results here.

Regards,
Nick

raoulus · June 5, 2014, 10:36am

We had the same problem on versions 2.3.*.
If you want next week I can try a “failing” deploy and share server and tentacle trace logs.

anton_s · June 6, 2014, 9:00am

This is VERY frustrating.

It happens pretty much on every release. I just did one now, was getting no feedback, cancelled deployment, retried and it immediately went to unpacking the uploaded files… so it did complete the upload but got stuck.

I am willing to bet as in a previous report that this is related to the fact that 2 versions ago, the UI would show the progress bar moving as the upload progressed, but this no longer happens - there is no feedback in the UI for upload progress… the problems are related!

Tx

Nicholas_Blumhardt · June 9, 2014, 10:31pm

Thanks for the follow up Anton. We’re working on a patch currently - waiting for some early confirmation that it is effective before sending it on. I’ll post details here as soon as it is ready.

Regards,
Nick

Nicholas_Blumhardt · June 10, 2014, 5:33am

Okay - thanks for hanging on folks! There are some new builds at:

Octopus (x86 and x64):

Tentacle (auto-update is also fine):

An early test of these on an affected site indicates that the progress bar issue is also resolved, though we’re still waiting for confirmation that the upload issue is completely fixed.

If you’re able to try these out and let us know your experience we’d be very grateful.

These are identical to 2.4.7 except for four other minor issues being fixed as well:

Importing invalid JSON in step template generates a blank error message bug
Make it clear that API-only users need to be assigned to teams enhancement
Remove the “Add Windows Firewall exception” checkbox on Windows 2003 bug
Pushed package does not appear in built-in repository (failure not reported)

(Via: https://github.com/OctopusDeploy/Issues/issues?milestone=32&page=1&state=closed)

Thanks again for the help and patience, looking forward to your feedback.

Nick

anton_s · June 10, 2014, 7:17am

Hi Nick,

The update looks good! I am getting the upload progress bar showing and the
upload worked fine - so it looks good.

Will let you know if something goes wrong.

Cheers

raoulus · June 10, 2014, 11:36am

Hi Nick,

I can see the improvement that Anton is talking about.
I have still some problems:

the start of some uploads fails to start with the message because of a kind of a timeout: “The message expired before it was picked up by the remote machine.”
Consider that I have almost 50 packages to transfer. It’s right that some packages will start later. Is it possibile to increase the timeout? Maybe it’s enough.
when the above problem occurs the “Task summary” tab contains one yellow row for each failed upload requiring failure guidance (yes I selected the guided mode). And that’s ok. When I click “assign to me” or “show details” the step will correctly expand, but at every autorefresh the step closes and another one opens, and it goes on alternating this way.

Attached is my “Task log”.

Raoul

ServerTasks-8741.log.txt (106 KB)

anton_s · June 10, 2014, 12:34pm

I am also interested in this scenario.

Previously in Octopus v1 I had to split 60 package uploads/steps into 3
different projects because of a similar sounding timeout.
Its about 60 items and growing with a total size of 500Mb (also growing!).

I will try it soon.

anton_s · June 10, 2014, 1:24pm

Unfortunately the following just happened:

Three deployments had some form of upload issue.

Event viewer has messages like this:

2014-06-10 15:20:05.5588 FATAL
Pipefish.Errors.PipefishCommunicationException: The message expired while waiting on disc.

Then the Octopus serivce stopped. I had to manually start it again.

Any logs I can provide?

anton_s · June 10, 2014, 1:25pm

Update: I have done other deployments and it seems okay, but there have been issues which i leave below from earlier:

"
Also on a local network deployment, a step fails:

“The message expired while waiting on disc.”

and another remote deployment - stuck for 20 minutes on a failed step:

Upload of file C:\Octopus\OctopusServer\XXX with hash ee2f500a4b9de64a48ac701d6acb65b301ceb2cd to NNNN failed
The message expired while waiting on disc.
"

Nicholas_Blumhardt · June 10, 2014, 10:14pm

Thanks Anton and Raoul, seems like we’ve made progress. I suspect I have still set the timeout too low - there’s a delicate balance to strike between supporting slow uploads, and being responsive to connection problems under normal conditions. We may need to make it configurable.

In this case however I think the timeout on the file transfer is probably not the main issue - the errors are appearing before it would expire - but the extra congestion is causing timeouts in other control messages that form part of the process. I’ll work through this today and see what kind of solution is possible. I’m glad that we’re at least getting error messages to work with now, rather than silent “hangs”, so there’s something!

Anton, the crashing Octopus server’s unexpected; there’s an OctopusServer.*.txt log file on the server machine that should provide some details - would you please be able to send it to me (via the forum, or nblumhardt at the Octopus domain)?

Regards,
Nick

Nicholas_Blumhardt · June 10, 2014, 10:59pm

Possible breakthrough courtesy of Paul: Octopus already supports a setting to manage this scenario!

In your project, create a variable called Octopus.Acquire.MaxParallelism and set the value to 1.

This will perform one download at a time (per Tentacle), hopefully improving the latency issue.

When this succeeds, it may be possible to increase the value (to 2, 3, 4…) to gain some throughput, but it isn’t necessary.

Note you’ll need to create a new release for it to take effect.

Can you please try this out and let me know what you find? If it solves the issue we’ll consider providing first-class UI support for this feature.

Many thanks!
Nick

anton_s · June 11, 2014, 1:50am

Awesome! I have set this and will let you know how things go in future. I think I requested this in version 1

Yesterdays log is 60mb and has a ton of issues… but i will save you a log inspection for future deployments with the above Parallelism set to 3.

Interestingly, the bulk of the size of the log is taken up by messages like:

`2014-06-10 17:03:48.7318 ERROR Undeliverable message detected: Pipefish.Messages.Delivery.DeliveryFailureEvent from: FileReceiver-AaM-_Lss_Ivo@SQ-PAX-DB-3-B4D4163A to: FileSender-BsA-_MvuIUOC@SQ-VMSERV-124AD339

Body is: {“FailedMessage”:{“Id”:“1c91602a-49f0-4fc2-8783-567f8c342d29”,“From”:{“SerializedValue”:“FileSender-BsA-_MvuIUOC@SQ-VMSERV-124AD339”},“Headers”:{“In-Reply-To”:“fca7149d-7687-40e7-8530-1cfe5df65537”,“Supports-Eager-Transfer-Receipt”:“True”,“Is-Tracked”:“True”,“Is-Ephemeral”:“True”,“Expires-At”:“635380100123016422”},“To”:{“SerializedValue”:“FileReceiver-AaM-_Lss_Ivo@SQ-DB-3-B4D4163A”},“MessageType”:“Octopus.Shared.FileTransfer.SendNextChunkReply”,“Body”:{"$type":“Octopus.Shared.FileTransfer.SendNextChunkReply, Octopus.Shared”,“Data”:"E4ncbg0iPX7gkWJCfAIbPASQUpDCT4CC7DaPuxKI87riPxV2SYua1OaeCFP0fvyrXQjTnuRYEjEPYWhg2AMNV7XHJ07r5u7P13G+kN4CFD/3TB/2m/27arlsuMLQr5lII1pCwW2B+Axxxxxxxxxx CONTINuES…`

looks like its serializing part of the uploaded file to the log… making it very big.

Semi related to all of this:

I can appreciate the complexities of this all, but it would be nice if somehow a tentacle could prioritise completion of steps in a project:

At the moment, if you deploy 2 projects at the same time, to the same Tentacle then the Tentacle is heavily bogged down by processing the acquisition of packages.
So i’ve had cases where Project 1 has been acquired and the steps start. I then start Project 2, and while the tentacle is acquiring the packages of Project 2, the processing of steps in Project 1 effectively grinds to a halt.

Just something to consider with all this

raoulus · June 11, 2014, 7:57am

I tryied Octopus.Acquire.MaxParallelism but nothing seems changed.
(this is what I did just in case I misunderstood: selected the project from the UI, selecteded “Variables” tab, added the variable, created new release)

I have 4 nodes (4 tentacles) but I saw a dozen of uploads running at the same time.
The message expiration happens on the upload that is still in the wait list, not in those uploading.

anton_s · June 11, 2014, 9:11am

I concur:

I have it set to 3,

but all my packages are uploading same time.

and it is a new release…

Paul_Stovell · June 12, 2014, 12:36am

Hi,

Do you see a message like this in the deployment log before the uploads?

Parallelism limited to 3 tentacles concurrently

I’ve just tried this and it seems to work fine for me - I set it to 1, created a release and deployed it, and watched the upload - it waited for the first to upload before starting the second.

The variable on my variables tab looks like the attached screenshot. Note that it isn’t scoped to anything - not to a step, not to an environment, etc. This is important as I don’t believe we take scope into consideration when working out this variable.

Would it be possible to get a full deployment log?

Paul