Archive for the ‘Linux Servers’ Category

Network Power Systems Maintenance

Thursday, September 16th, 2010

On 2nd October 2010 between the hours of midnight and 8am, the data centre housing our shared, reseller and dedicated server clients will be carrying urgent and substantial maintenance to the primary power supply system and this is advanced notification of the expected downtime schedule that will affect the majority of our client services. A full breakdown, schedule and client action is given below:

Summary

* The data centre will be enacting our procedures for a controlled power down and power restore to RSH North. This will happen between the hours of 00:00 and 08:00 on Saturday 02/10/10.
* These procedures are being put into action to allow essential repairs to the main power systems in RSH North. For further details on these repairs please see ‘Technical explanation of cause’.
* As part of procedures for restoration of power there will additional staff on site. They will be checking that all services have power restored and pro-actively investigating any issues reported by our monitoring system.

Sequence of events

00:00:

* Our monitoring will begin automatically executing a soft shutdown on affected servers.

01:00

* Mains power will be systematically shut down.
* Your service will not be available during this time.

05:00

* We aim to begin restoring power systematically across the site, in stages.
* Technical staff will begin pro-actively investigating and restoring service to any problematic servers based on the monitoring system.

08:00

* we aim to have all servers back online.

Technical explanation of cause

* The data centre recently installed a fourth 500kva UPS to add resilience and capacity to our power systems.
* The data centre was designed with this in mind, so the appropriate connections exist on the Schneider UPS Output Panel (that feeds all servers from a common busbar) to connect this UPS.
* During the installation of the UPS, engineers discovered there was a fault with the panel which meant the additional UPS could not be connected.
* To repair this fault, we need to electrically isolate the panel, as engineers cannot work on the panel while it is live for safety reasons.
* The manufacturer of the panel, Schneider, is sending a team of engineers to repair the panel so it can accept the fourth 500kva UPS.

Scheduling an automated shutdown on a server
We are strongly advising all unmanaged dedicated server clients to put in place on of the following methods dependent on the OS installed.

Shared/reseller hosting clients and managed dedicated server owners do NOT need to take any action as we will be powering the server down for you. VPS servers without a “VS” prefix will NOT be affected by this maintenance window and do NOT need to take any action.

Windows Server 2003

1. Click Start, All Programs, Accessories, System Tools, Scheduled Tasks
2. Double-click on Add Scheduled Task, and Click Next
3. Click Browse… to select the program to run
4. Enter C:\Windows\System32\Shutdown.exe, and click OK
5. Under “Perform this Task”, click “One Time Only”, and press Next
6. Enter the date and time to shutdown your computer: 00:00 on 02/10/2010, and click Next (BE SURE TO ALTER THE TIME IF YOU’VE CHANGED YOUR TIME ZONE!)
7. Enter your password in both fields and click Next
8. Check “Open advanced properties for this task when I click Finish”, and click Finish
9. Add “/s /t 30″ to the end of the Run box, so it looks like the following:
C:\WINDOWS\system32\shutdown.exe /s /t 30
10. Click OK, Reenter your password, and click OK
Windows Server 2008

1. Click Start, All Programs, Accessories, System Tools, Task Scheduler
2. Click “Create Basic Task” on the actions pane on the right-hand side
3. Enter “Shutdown” as the Name, and click Next
4. Under “When do you want the task to start”, select “One Time”, and click Next
5. Enter the date and time to shutdown your computer: 00:00 on 02/10/2010, and click Next (BE SURE TO ALTER THE TIME IF YOU’VE CHANGED YOUR TIME ZONE!)
6. Select “Start a Program”, and click Next
7. Enter “shutdown” (without quotes) in the “Program/Script” box, and enter “/s /t 30″ (without quotes) in the “Add Arguments” box, and click Next
8. Check “Open the Properties dialog for this task when I click Finish”, and click Finish
9. Click “Run whether the user is logged on or not”, and click OK
10. Enter your password, and click OK
Linux

Ensure you have the “at” service installed, for running tasks on a schedule. This can be installed in the following way:
Debian 5: aptitude install at
Debian 4: apt-get install at
CentOS: yum install at
Fedora: yum install at

1. Log into your server using SSH
2. Enter: “at 00:00 Oct 02″ (without quotes) and press Enter
3. After the at> prompt, enter “/sbin/shutdown -h now” (without quotes) and press Enter
4. Enter CTRL-D
5. Enter “atq” and press Enter to check that

DNS services

* During the maintenance the DNS name server services provided by us will cease to operate with the exception of ns2, ns5 and ns6.nixcontrolpanel.com name servers.
* It will not be possible to change your DNS records during the maintenance.
* Please ensure any DNS changes for holding pages are made prior to 00:00.

We realise that this is a large scale maintenance window and you will have received notification on how to claim a credit for your account despite our network SLAs having no provision for scheduled maintenance such as this.

Linux Server IS-08532 Maintenance

Wednesday, June 30th, 2010

Users may experience a slight slowdown in performance and/or network connectivity issues for IS-08532 for the next 20 minutes.

This is because we are carrying out essential server maintenance that could not be scheduled due to it’s importance.

This work should be completed shortly…

Planned Network Maintenance

Tuesday, June 29th, 2010

We will be carrying out maintenance which is applicable to some of your services with us.

Maintenance Type: Network
Expected effect on your service: Small Degradation in Performance
Expected downtime duration: 0 minutes
This will occur between 08:00 and 17:00 on 30/06/2010 (UK Time)

The following servers are relevant to this message:

IS-03093, IS-03384, IS-03587, IS-03711, IS-03712, IS-03713, IS-03714, IS-03715, IS-03716, IS-03717, IS-03718, IS-03719, IS-05965, IS-06110, IS-07420, IS-07759, IS-07931, IS-08077, IS-08313, IS-08532, IS-08817, IS-08859, IS-08985, IS-09830, IS-09906, IS-10952, IS-11329, IS-11743, IS-11959

This is an emergency maintenance required to change the route for one of our fibre uplinks. During the maintenance period there will be a small degradation in performance but there will be no connectivity interruption.

We apologise for any inconvenience this may cause, please do not hesitate to contact us if you have any queries or questions regarding this maintenance window.

Scheduled UPS Maintenance

Tuesday, June 15th, 2010

Maintenance Type: Infrastructure
Expected effect on your service: No Effect
Expected downtime duration: 0 minutes
This will occur between 08:00 and 18:00 on 05/07/2010 (UK Time)

The following servers are relevant to this message:

IS-03093, IS-03384, IS-03587, IS-03711, IS-03712, IS-03713, IS-03714, IS-03715, IS-03716, IS-03717, IS-03718, IS-03719, IS-05965, IS-06110, IS-07420, IS-07759, IS-07931, IS-08077, IS-08313, IS-08532, IS-08817, IS-08859, IS-08985, IS-09830, IS-09906, IS-10952, IS-11329, IS-11743

On the 5th of July 2010 there will be a maintenance window to expand the capacity of the UPS systems. An additional UPS control unit and its associated batteries will be added to the existing cluster. This is to ensure the UPS system continues to provide fully redundant power backup as the site grows. This work creates an “at risk” period for your service. This maintenance is not service-affecting, but during the maintenance period services are at an increased risk.

During the maintenance period the critical load will be transferred seamlessly from the UPS to the raw mains. When the maintenance is complete it will be transferred back seamlessly.

We have confirmed with Southern Electric that they have no maintenance scheduled during this period.

Notification of Scheduled Maintenance

Monday, June 7th, 2010

Please note we will be carrying out minor upgrade to our server network which will affect shared/reseller clients and certain dedicated server clients. The details are:

Maintenance Type: Network
Expected effect on your service: Connectivity Interruption
Expected downtime duration: 15 minutes
This will occur between 05:30 and 08:30 on 15/06/2010 (UK Time)

The following servers are relevant to this message:

IS-03093, IS-03384, IS-03587, IS-03711, IS-03712, IS-03713, IS-03714, IS-03715, IS-03716, IS-03717, IS-03718, IS-03719, IS-05965, IS-06110, IS-07420, IS-07759, IS-07931, IS-08077, IS-08313, IS-08532, IS-08817, IS-08859, IS-08985, IS-09830, IS-09906, IS-10952, IS-11329, IS-11743

This maintenance is to enable a service on the VSS-1440 cluster in order to upgrade the available number of VLANS.

This will restart one of the services on the VSS-1440 cluster, which will cause a short network interruption.

We apologise for any inconvenience this may cause, please do not hesitate to contact us if you have any queries or questions regarding this maintenance window.

Unscheduled Network Outage

Tuesday, May 25th, 2010

The data centre our dedicated servers are located within suffered an unscheduled network outage last night.  Repost as follows:

Date: 25/05/2010
Time: 00:20
Outage Type: Connectivity Interruption
Duration: < 75 minutes
Affected Service(s): All dedicated servers

At approximately 00:20 a network issue was detected affecting connectivity to the North cluster in RS Spectrum House. The cause of this appears to have been a very large amount of malicious traffic directed at the RS network. This traffic has unfortunately caused a routing issue within the RSH.North cluster.

Our on call engineers were contacted and the traffic has been removed from the network. Normal service was restored at approximately 01:35. This problem will have affected different clients in different ways. It will have ranged from no effect, to an increase in latency and some packet loss, to a more substantial loss of connectivity.

We would like to express our sincere apologies for the inconvenience this will have caused. If you are experiencing any problems as a result of this please do not hesitate to contact our support team immediately and we will look into this for you.

At this time we do not have any further updates from the data centre but we will post them here once they arrive.

Migration of Open Mind Hosting

Wednesday, May 19th, 2010

We are currently migrating the primary Open Mind Hosting domain to a new server so you may experience minor issues accessing your account whilst the IP changes propagate.

This should be fully resolved within 24 hours.

All Systems Stable

Tuesday, April 20th, 2010

We are pleased to report that all systems are currently stable.

Restoration Error

Friday, April 16th, 2010

For reasons unknown at this point, the backups that were restored through the night are actually from February of this year.  This is of course a major blow to the hard work the team put in through the night as well as causing further inconvenience for users.

We are presently downloading the offsite backups which are up to date and we shall then have to start the restoration of accounts from scratch.

As the offsite backups are already in tar format, the process shouldn’t take as long as it did last night.

Obviously this is a major setback but we would appreciate your patience and understanding whilst we try and resolve this asap.

Please note that to keep accounts in sync we have deliberately stopped all services on the Linux server whilst we restore from the recent backups

Update 09:35 – Backups from 15th April are now being downloaded t the server.  We expect this to be complete by approximately 10:45

Update 10:11 – Approximately 5GB of data left to download before we can start restoring. This should start at around 10:30

Update 10:50 – Currently terminating accounts so that they can be recreated with the correct data.

Update 12:10 – The restoration of accounts has now begun in alphabetical order.  Each account takes between 30-60 seconds to fully restore so it is hard for us to give an ETA of the completion. As each account is restored however, it will immediately become active again for web traffic and email.

Update 12:18 – Restoration 15% complete

Update 12:31 – Restoration 25% complete

Update 12:45 – Restoration 30% complete

Update 13:11 – We’re just resolving a minor issue with cPanel and we will then be able to continue with the account restoration.

Update 13:27 – Earlier problems with cPanel have been resolved and we are continuing with the restoration albeit in blocks of 10 domains at a time to prevent server overloads.

Update 13:52 – We are now well over half way through the restoration process.

Update 14:22 – We are now restoring the last batch of accounts and this should be completed within the hour.  Thank you for your patience during this very frustrating period.

Update 14:51 – There are currently around ten accounts left to restore…

We are aware of an issue where PHP files are not being parsed rather the browser is trying to download the file.  This is being treated as a high priority and will be dealt with as soon as the restoration task completes.

Update 15:19 – All accounts have now been full restored.  We are currently recompiling Apache/PHP to resolve the above issue.

MySQL Issue on Linux Server One

Thursday, April 15th, 2010

We are aware of an issue with the MySQL server on Linux Server One and we are working to resolve this as quickly as possible.

Update 18:36 – The server is failing to respond due to an excessively high CPU usage.  Wea re attempting to reboot the server now.

Update 18:40 – It looks like the reboot has not resolved the issue, technicians are on-site and are investigating further.

Update 19:29 – We have gained access to the server but it does look like a major issue with the boot disk. We are running a disk repair at the moment to try and resolve the boot issue.

Update 20:24 – Unfortunately it does look like the primary hard disk on the server is fatally corrupted.  The procedure is now that we will have to replace the disk, reinstall the OS and cPanel and then start restoring accounts.  Note that this process will take several hours to complete but we will give you constant updates as to the progress.

Update 21:05 – The faulty drive has now been replaced and the operating system re-installed. We are currently installing cPanel/WHM

Update 22:33 – cPanel/WHM has now been re-installed and we are in the process of securing it for the restoration of accounts.

Update 02:14 – We have now commenced restoring user account.  This will be done alphabetically and we shall proceed as quickly as possible.

Update 02:38 – Account zero-a restored

Update 02:58 – “B” accounts restored

Update 03:27 – “C” accounts restored

Update 03:42 – “D” accounts restored

Update 03:53 – “E” accounts restored

Update 04:34 – “F” accounts restored

Update 05:00 – “G” & “H” accounts restored

Update 05:10 – “I” accounts restored

Update 05:12 – “J” accounts restored

Update 05:19 – “K” accounts restored

Update 05:25 – “L” accounts created

Update 05:41 – “M” accounts restored

Update 05:47 – “N” accounts restored

Update 06:00 – “O” accounts restored

Update 0612 – “P” accounts restored

Update 06:14 – “Q” accounts restored

Update 06:18 – “R” accounts restored

Update 06:47 – “S” accounts restored

Update 07:04 – “T” accounts restored

Update 07:06 – “U” accounts restored

Update 07:18 – “W” accounts restored

Update 07:23 – “Z” accounts restored

All user accounts have nw been restored from the backups taken at around 2am yesterday morning. Would allusers please carefully check their account and report any issues through the helpdesk: https://www.openmindhosting.co.uk/support/

A RFO (Reason For Outage) will be issued shortly.