I would like my deployment process and/or runbook install some patches on my server. These patches require a server restart. However, when I restart my server the deployment fails because Octopus Deploy loses connection to the server.
If the restart process takes longer than the server timeout then the deployment will fail. The problem is the process is invoking the restart command directly on the server. Once that command is issued then Octopus will wait for the command to finish.
To get around this, we recommend leveraging the hypervisor restart functionality. All cloud platforms, as well as tools such as Hyper-V or VMWare provide a CLI tool to restart a VM.
What you’re process would look like is:
- Install patches on specific targets.
- On the hypervisor (or using a cloud run a script step) restart the VM(s).
You can see an entire process on our samples instance. In it the process is spinning up a VM. While it is spinning up it installs .NET core, as well as the Azure CLI on the VM. On step 7 the process tells Azure to restart the VM.
This also provides a bit more control over the process. If a server doesn’t come back online after a specific time period then a notification could be sent out. Or a failure could be triggered.
That looks fine in if tentacle configured in Push mode, but it won’t work in Pull mode
I struggled with this same issue of not being able to reboot targets during configuration or for graceful maintenance orchestration without losing the server. While it’s not a perfect solution, I did manage to come up with something. I basically came up with a series of 4 reboot step templates that do the following
Step #3 runs from a worker and is the key to holding focus while the server shuts down and reboots. I immediately start polling the tentacle endpoint so I can catch the state change (shutdown) and keep polling until the state changes back to a positive response (startup). I give it a slight sleep before moving along to reboot validation.
That is very clever, I do the same thing when I am spinning up a new server, I didn’t think about doing that for a restart
It works pretty well except on occasion where windows patching takes too long to shutdown and the poller times out. I’ve actually set up rolling reboot projects for each of my server roles that include all the graceful service handling, reboot steps, and validations for post startup. I then use an orchestration project that kicks them all off in parallel and we reboot the entire production stack at the same time in a non-intrusive rolling fashion. Makes patching and rebooting several dozen servers much faster and easier.