We have commenced the migration of the mail server. All mail services are currently stopped and mail is being sent to our backup mail server. Access to Helm is now restricted until the migration is complete.
Mail Server Update 18:39 – Migration commenced
April 26th, 2010Mail Server Update 18:26 – Upgrading software
April 26th, 2010We have bought the migration forward slightly as it is necessary for us to upgrade the old SmarterMail server to the latest release prior to the move.
Mail services will be unavailable whilst we complete this upgrade.
Update 18:30 – This is now complete, the migration will begin in the next few minutes.
Windows Mail Server Upgrade
April 26th, 2010We shall this evening be upgrading the primary Windows mail server and relocating all user mailboxes to a new dedicated server. We apologise for the short notice for this work but it has become necessary as the current server is starting to show signs of instability and we would rather act now to resolve the issue quickly thus avoiding any unnecessary disruption to your service.
The work will involve moving all user account currently hosted on server IS-3712 to our new server IS-11329. There will be no changes required by you as we will be migrating the mailboxes and data as well as moving the current mail IP address to the new server. We have already synchronised the mailbox data to the new box to ensure the migration is as smooth as possible.
In addition, whilst the migration is being carried out, mail will be delivered to our backup mail server and once complete, will be delivered as normal to your mailbox to ensure no mail is lost in the transition.
The anticipated time line and updates is shown below. You can also follow updates via out twitter account http://twitter.com/openmindhosting
Please note that you will only be affected by this work if you have a shared or reseller Windows hosting plan. All dedicated mail server clients and Linux plans will NOT be affected.
Anticipated Schedule
Please note this is a planned time line for this work and may be subject to change. All actions will be carried out this evening, 26th April 2010:
- 19:00 – The mail server will be stopped and access to the Helm control panel/webmail access disabled for the duration of the move. We shall then begin a final synchronisation of the mailbox data. This synchronisation will take approximately 30 minutes to complete.
- 19:30 – We will carry out some final checks on the data and reconfigure the new mail server to use the existing mail IP address.
- 19:45 – Once the mail data has been synchronised we will begin to restore accounts on the new server. Each domain takes between 5 and 20 seconds and the domains are restored alphabetically. There are approximately 1,300 domains to restore on this particular mail server. As each domain is restored, mail will start flowing into the new mail server and will also start arriving from the backup mail server. We anticipate, and this is a very rough estimate, that the restoration of accounts will take about 4 hours to complete.
- c. 23:45 – The mail restoration should be complete. Helm will be restarted and access to webmail will be restored.
Please note that the above timings are subject to change but we will keep you updated throughout the move both here and on our twitter account at http://twitter.com/openmindhosting
Windows Cluster Updates
April 21st, 2010We are currently installing new software on each of the Windows cluster servers. This process does involve stopping IIS for about 2 minutes whilst the software installs.
Currently we are installing on the primary mail server and this should be completed shortly.
Update 10:57 – Mail server installation completed.
Update 11:32 – MySQL4 database server installation in progress
Update 12:53 – This work is now complete.
Windows Mail Server
April 20th, 2010We are aware of a delay in receiving email. This has been caused by a backlog in the mail server queue which we are resolving at present.
Update 14:29 – We have identified the issue and working to resolve it now. It looks like emails are sitting in the spool ready to be delivered but not actually leaving the server.
Update 14:47 – Mail is starting to work it’s way through now but this may take some time to complete due to the previous backlog.
Update 15:12 – We have found the issue to be associated with the spam check we carry out on all email coming into the server. As a temp. measure to clear the backlog, we have removed the spam checks and will re-enable them once the queue has been cleared.
This will result in a higher level of spam being received for a short period.
Update 15:25 – The backlog has now been cleared. We will now start re-enabling the anti-spam services we have in place one by one whilst monitoring the delivery spool until we are satisfied all is OK.
All Systems Stable
April 20th, 2010We are pleased to report that all systems are currently stable.
Restoration Error
April 16th, 2010For reasons unknown at this point, the backups that were restored through the night are actually from February of this year. This is of course a major blow to the hard work the team put in through the night as well as causing further inconvenience for users.
We are presently downloading the offsite backups which are up to date and we shall then have to start the restoration of accounts from scratch.
As the offsite backups are already in tar format, the process shouldn’t take as long as it did last night.
Obviously this is a major setback but we would appreciate your patience and understanding whilst we try and resolve this asap.
Please note that to keep accounts in sync we have deliberately stopped all services on the Linux server whilst we restore from the recent backups
Update 09:35 – Backups from 15th April are now being downloaded t the server. We expect this to be complete by approximately 10:45
Update 10:11 – Approximately 5GB of data left to download before we can start restoring. This should start at around 10:30
Update 10:50 – Currently terminating accounts so that they can be recreated with the correct data.
Update 12:10 – The restoration of accounts has now begun in alphabetical order. Each account takes between 30-60 seconds to fully restore so it is hard for us to give an ETA of the completion. As each account is restored however, it will immediately become active again for web traffic and email.
Update 12:18 – Restoration 15% complete
Update 12:31 – Restoration 25% complete
Update 12:45 – Restoration 30% complete
Update 13:11 – We’re just resolving a minor issue with cPanel and we will then be able to continue with the account restoration.
Update 13:27 – Earlier problems with cPanel have been resolved and we are continuing with the restoration albeit in blocks of 10 domains at a time to prevent server overloads.
Update 13:52 – We are now well over half way through the restoration process.
Update 14:22 – We are now restoring the last batch of accounts and this should be completed within the hour. Thank you for your patience during this very frustrating period.
Update 14:51 – There are currently around ten accounts left to restore…
We are aware of an issue where PHP files are not being parsed rather the browser is trying to download the file. This is being treated as a high priority and will be dealt with as soon as the restoration task completes.
Update 15:19 – All accounts have now been full restored. We are currently recompiling Apache/PHP to resolve the above issue.
MySQL Issue on Linux Server One
April 15th, 2010We are aware of an issue with the MySQL server on Linux Server One and we are working to resolve this as quickly as possible.
Update 18:36 – The server is failing to respond due to an excessively high CPU usage. Wea re attempting to reboot the server now.
Update 18:40 – It looks like the reboot has not resolved the issue, technicians are on-site and are investigating further.
Update 19:29 – We have gained access to the server but it does look like a major issue with the boot disk. We are running a disk repair at the moment to try and resolve the boot issue.
Update 20:24 – Unfortunately it does look like the primary hard disk on the server is fatally corrupted. The procedure is now that we will have to replace the disk, reinstall the OS and cPanel and then start restoring accounts. Note that this process will take several hours to complete but we will give you constant updates as to the progress.
Update 21:05 – The faulty drive has now been replaced and the operating system re-installed. We are currently installing cPanel/WHM
Update 22:33 – cPanel/WHM has now been re-installed and we are in the process of securing it for the restoration of accounts.
Update 02:14 – We have now commenced restoring user account. This will be done alphabetically and we shall proceed as quickly as possible.
Update 02:38 – Account zero-a restored
Update 02:58 – “B” accounts restored
Update 03:27 – “C” accounts restored
Update 03:42 – “D” accounts restored
Update 03:53 – “E” accounts restored
Update 04:34 – “F” accounts restored
Update 05:00 – “G” & “H” accounts restored
Update 05:10 – “I” accounts restored
Update 05:12 – “J” accounts restored
Update 05:19 – “K” accounts restored
Update 05:25 – “L” accounts created
Update 05:41 – “M” accounts restored
Update 05:47 – “N” accounts restored
Update 06:00 – “O” accounts restored
Update 0612 – “P” accounts restored
Update 06:14 – “Q” accounts restored
Update 06:18 – “R” accounts restored
Update 06:47 – “S” accounts restored
Update 07:04 – “T” accounts restored
Update 07:06 – “U” accounts restored
Update 07:18 – “W” accounts restored
Update 07:23 – “Z” accounts restored
All user accounts have nw been restored from the backups taken at around 2am yesterday morning. Would allusers please carefully check their account and report any issues through the helpdesk: https://www.openmindhosting.co.uk/support/
A RFO (Reason For Outage) will be issued shortly.
Network Issue at Linx
March 17th, 2010The London Internet Exchange (LINX) is currently suffering from a network issue that is affecting traffic. We immediately shut down our connections to LINX until they resolve this problem. Traffic has been rerouted over our other peering points until the situation is resolved.
Date: 16/03/2010
Time: 22:30
Effect on service: Increased latency for traffic that would normally reach us via LINX
Duration: Ongoing
The only information that we have from LINX at this time is that they have acknowledged an issue and are investigating the problem. This problem has only affected traffic that reaches our network via LINX, which due to our diverse peering points with many of the major ISPs has meant minimal impact on incoming and outgoing traffic. The Open Mind Hosting network has remained stable throughout.
Traffic that would normally reach us via LINX will now be rerouted and connect to us via our other peering points. This may result in a slight increase in latency for that traffic, if it involves additional hops. We are now waiting on LINX for updates as to the nature of the problem, likely resolution time and ultimately for an all clear notification.
If anyone is experiencing problems with their service, please do not hesitate to contact us.
MySQL4 Database Server
February 21st, 2010It looks like we have a disk failure on this server resulting in MySQL4 databases becoming non-operational.
Technicians are currently working on the issue and we shall post an update as soon as we have one…
UPDATE 15:19 – Unfortunately it is confirmed that the drive has failed and we are currently waiting for technicians to replace it. Once this is done we shall be able to restore data from the backups.
UPDATE 16:18 – The disk has now been replaced and we have started to restore data from the backups. This should take no more than 2 hours to complete.
UPDATE 16:47 – Data has now been restored, we are in the process of reconfiguring the database server.
UPDATE 19:18 – Unfortunately our batch restore script is failing to restore databases correctly so to ensure zero data loss we are having to restore databases one by one. There are currently 181 databases on the server and each one takes 30-60 seconds to restore. Databases will be restored in alphabetical order so we appreciate your patience whilst we carry out this work.
UPDATE 20:44 – We are now restoring databases beginning with “H” and are more than 50% through the restoration work.
UPDATE 23:03 – All databases have now been restored, if you experience any further problems then please do not hesitate to get in touch with our support team. We shall be issuing a full RFO (Reason For Outage) tomorrow.