CIO - Job Description - Salary - IT - News
DRP versus BCPMay 17th, 2012
Disaster recovery planning is one of the most important jobs of the IT professional. It includes working with upper management and winning the cooperation of all departments to make a working recovery plan. The two main parts are the Business Continuity Plan (BCP) and the Disaster Recovery Plan (DRP). These have to go hand-in-hand procedurally. The BCP focuses more on the schedule and timing of the DRP, so that in the event of a disaster the business can function normally. The three stages of a DRP are Prevent, Detect and Correct.
A disaster recovery is a response to a declared disaster or a regional disaster. It is the restoration or recovery of an entire agent computer. A disaster recovery plan describes how an organization is to deal with potential disasters.- more info
Disaster Recovery budgets remain stableApril 29th, 2012
A report into business continuity and disaster recovery budgets finds:
- According to a IT budget survey, 32 percent of enterprises had planned to increase spending on business continuity and disaster recovery by at least 5 percent in 2011. The reality is that budgets have stayed constant rather than increased as anticipated.
- Business continuity and disaster recovery budgets in 2011 have been an average of six percent of IT operating and capital budgets.
- The likely culprit in stalled business continuity and disaster recovery spending is the continuing economic uncertainty. Even in the best of economic times, it's difficult to build the business case for an initiative such as business continuity that's primarily about cost avoidance rather than return on investment. In tough economic times, it's almost impossible.
Business Continuity PlanningApril 13th, 2012
Horizon scanning is essential to avoid surprises in business continuity planning, but identifying the most likely thing to bite you next is tricky.
Looking beyond the imminent plannin risks contained in in every day events the top 3 worries are:
- more info
- Supply Chain - Will an economic or political crisis mean disruption to this as a result of protest and civil unrest or even secession from monetary union?
- Severe weather - Most enterprises are geared up for "average" weather. As we see extremes of drought, cold and storm will the strain on the infrastructure become a major cause of business interruptions?
- Social Media - Increasingly organizations believe that these are essential to their businesses, yet they are provided externally, funded through advertising and beyond the control of the organization. How can we provide resilience/continuity for these? Should we?
Social media a disaster planning toolsApril 2nd, 2012
Government agencies are turning to social media technology to manage disasters and improve public safety.
A growing number of agencies are tapping into Facebook and Twitter to monitor events and provide near real-time notifications. And some are now taking social media a step further by communicating internally or sharing information and comments across offices or agencies.
A September Congressional Research Service report, Social Media and Disasters: Current Uses, Future Options, and Policy Considerations, noted that social media already plays an important role in disasters, but the use of the technology for emergency management is growing.
In Fort Worth and Tarrant County in Texas, for instance, a joint emergency operations center has switched on social media tools that improve communication across dozens of agencies and departments throughout the state. Police, firefighters, healthcare providers and others use push-to-talk radio, cellular telephony, and text messaging (including text documents and file sharing) to interact with an IP telephony infrastructure located in a response center. This allows teams to coordinate immediate responses, regardless of the underlying communications technology.
- CIO IT Infrastructure Policy PDF (All of the policies below which come as individual MS Word files)
- Backup and Backup Retention Policy
- Blog and Personal Web Site Policy (Includes electronic Blog Compliance Agreement Form)
- BYOD Access and Use Policy (Includes electronic BYOD Access and Use Agreement Form)
- Incident Communication Plan Policy (Updated to include social networks as a communication path)
- Internet, e-Mail, Social Networking, Mobile Device, Electronic Communications, and Record Retention Policy (Includes 5 electronic forms to aid in the quick deployment of this policy)
- Mobile Device Access and Use Policy
- Patch Management Policy
- Outsourcing Policy
- Record Management, Retention, and Destruction Policy
- Sensitive Information Policy (HIPAA Compliant and includes electronic Sensitive Information Policy Compliance Agreement Form)
- Service Level Agreement (SLA) Policy Template with Metrics
- Social Networking Policy (includes electronic form)
- Telecommuting Policy (includes 3 electronic forms to help to effecively manage work at home staff)
- Travel and Off-Site Meeting Policy
- IT Infrastructure Forms
Disaster Recovery Business Continuity BasicsMarch 1st, 2012
The basics of a Disaster Recovery Business Continuity Plan are defined in the Janco Disaster Recovery Business Continuity Template. They are:
- more info
- Develop the contingency planning policy statement. A formal department or agency policy provides the authority and guidance necessary to develop an effective contingency plan.
- Conduct the business impact analysis (BIA). The BIA helps to identify and prioritize critical IT systems and components.
- Identify preventive controls. Measures taken to reduce the effects of system disruptions can increase system availability and reduce contingency life cycle costs.
- Develop recovery strategies. Thorough recovery strategies ensure that the system may be recovered quickly and effectively following a disruption.
- Develop an IT contingency plan. The contingency plan should contain detailed guidance and procedures for restoring a damaged system.
- Plan testing and training exercises. Testing the plan identifies planning gaps, whereas training prepares recovery personnel for plan activation; both activities improve plan effectiveness and overall agency preparedness.
- Plan maintenance. The plan should be a living document that is updated regularly to remain current with system enhancements.
Disaster Planning is Required for Virtual ApplicationsFebruary 27th, 2012
A number of customers using the Microsoft-hosted Dynamics CRM Online and its Office 365 cloud service were reporting performance problems.
One CRM Online customer said problems began in the morning. The @MSCloudUS twitter account acknowledged the Office 365 problems, starting in the afternoon (EST).
The Disaster Planning Template addresses these issues. On the CRM Online front, "performance is slow for most users, to the point that some cant use CRM at all," one Microsoft CRM user said. His company is based in the U.S., he said, but international users of the system were affected, as well.
A Microsoft spokesperson said, "We were made aware of a few customers experiencing difficulty using their Microsoft Dynamics CRM Online service this morning. The customer impact was limited to some organizations in North America and has been resolved. Microsoft takes any downtime seriously, and customers will be reimbursed service charges per the terms of our SLA which guarantees 99.9% uptime."- more info
Disasters impact companies of all sizesFebruary 24th, 2012
The list of natural and manmade disasters with which businesses have had to contend early in the 21st century is long. Many organizations have felt the devastating effects of the September 11 terrorist attacks, acts of bioterrorism involving anthrax, and bombings in London, Madrid and Bali. The severe acute respiratory syndrome (SARS) outbreak, the South Asian tsunami and Hurricane Katrina also have had costly, far-reaching impacts on businesses.
Disruptions resulting from these and other disasters have rippled across supply chains, shaken entire industries and taken their toll on employee, customer and partner relations. Not surprisingly, organizations of all types and sizes are making crisis preparedness and response a key focus of their business continuity planning. Chances are, your organization is taking a proactive approach and continually looking at ways to minimize the impact that potential crises can have on your business processes and technology systems. Yet, even though your company's business continuity plan most likely serves to protect your company's physical assets, such as its data, network(s), core business applications and facilities, how well does it address the human side of disasters?more info
Weather and climate disasters impact the South East the mostFebruary 15th, 2012
Disaster Recovery and Business Continuity plans need to consider natural weather and events. The effects that natural events have on the environment directly and indirectly may be harmful to people. Forest fires and volcanoes harm air quality. Hurricanes and floods can contaminate water supplies and damage wastewater facilities. Any of these can spread contaminated materials into the environment.
2011 went into the books as a year of environmental disasters on an unprecedented global scale that have affected the lives and livelihoods of billions of people. The United States alone set a record with 12 separate billion-dollar weather/climate disasters in 2011, with an aggregate damage total of approximately $52 billion, according to the National Oceanic and Atmospheric Administration. Thus, data backup and recovery has become a hot topic among IT managers.
- more info
Major Disaster Recovery Failure with an Outsource ProviderFebruary 11th, 2012
Virginia's Department of Motor Vehicles along with 25 other state agencies hasn't been able to process requests for licenses and ID cards. These systems are supposed to be up and running six days after the outages started to appear.Northrop Grumman manages Virginia's IT infrastructure under a $2.3 billion IT services contract.
The Virginia Information Technologies Agency (VITA) said in a statement that teams have been working throughout the weekend to restore data. In a nutshell, the IT infrastructure of the state of Virginia was reportedly crushed by an EMC storage area network failure. The Richmond Times-Dispatch reports that several systems are still down. The same paper said that Northrop Grumman will have to pay a fine for the failure. And the real kicker is that recently revised its contract with Northrop Grumman and extended the deal for three years. The state paid an additional $236 million for better service from Northrop Grumman.
Highlights of the Revised Contract - Operational Efficiencies
- Consolidates and strengthens Performance Level Standards with a 15% increase in penalties across the board if Northrop Grumman fails to perform on clearly identified and measured performance standards. - PAY-UP
- Improves Incident Response teams to determine technology failures and expedite repair - FAILED
- Institutes clear performance measurements for Northrop Grumman that agencies can easily track - FAILED
- Adds new services to contract such as improved disaster recovery and enhanced security features - FAILED
Among the key parts of the VITA statement:
Successful repair to the storage system hardware is complete, and all but three or possibly four agencies out of the 26 agency systems have been restored. Agencies continue to perform verification testing.
Progress continues, but work is not yet complete for the three or four agencies that have some of the largest and most complex databases. These databases make the restoration process extremely time consuming. The unfortunate result is the agencies will not be able to process some customer transactions until additional testing and validation are complete.
According to the manufacturer of the storage system (EMC), the events that led to the outage appear to be unprecedented. The manufacturer reports that the system and its underlying technology have an exemplary history of reliability, industry-leading data availability of more than 99.999% and no similar failure in one billion hours of run time.
The outage was blamed on the failure of two circuit boards installed and maintained by EMC. It is a big disconcerting that two circuit boards can bring down a states IT infrastructure for nearly a week.
Among the things that don't add up in the Virginia IT outage:
- more info
- Why wouldn't these boards be replaced quickly?
- Why was there a single point of failure?
- Service was restored for 16 agencies, but 10 require a lengthy restoration of data. Where was the disaster planning? After all, Northrop Grumman touted its disaster recovery for the state just two years ago.
- Where did the IT management fail?
Tools for Disaster Recovery planingFebruary 2nd, 2012
When it comes to disaster recovery, rapidly growing data volumes, distributed computing models, and new technologies all combine to present an ever-changing playing field. Safe recovery distances can also mean painfully slow replication and backup across the WAN in addition to the costs to accomplish this.
Janco's "Disaster Recovery and Business Continuity Template" leads the way to implementation of the latest disaster recovery technologies and cost savings strategies. Enterprise of all sizes can build a functional disaster recovery plan with this tool and make your own disaster recovery efforts more efficient.- more info
Business Continuity Plan is more than just paperJanuary 20th, 2012
The Business Continuity Planning is about more than the IT components. Though the CEO and executive staff must define what business processes need protection and the appropriate response.
IT has several innate characteristics that make them well suited to disaster planning and implementation.
- more info
- Project planning: IT is accustomed to implementing new technology in a controlled fashion, giving IT staff experience in understanding and planning for the impact of change for maximum success.
- People/Process/technology relationship understanding: Two areas in which having an understanding of this relationship are key to success. The implementation of new technology often changes process. Changes in process change the ways people interact with information systems. From advanced computers and applications to systems that allow physical building access, IT understands the people/process/technology relationship better than any other team in the company. In addition, IT also has a deep understanding of how supporting systems are critical to the delivery of, and access to primary information systems. From Active Directory and DHCP to routers and firewalls, IT understands the key systems and the order in which they must be restored to deliver a complete service. This understanding facilitates business continuity and restoration.
- Experienced in disaster management: In complex IT environments, something is usually broken or has a problem. IT has the experience to quickly identify the problem, understand the impact and respond appropriately to the issue. This experience is vital in the high stress and dynamic environment of managing a disaster event.
Disaster Recovery and Business Continuity a critical part of enterprise operationsJanuary 8th, 2012
Disaster recovery is becoming an increasingly important aspect of enterprise computing. As devices, systems, and networks become ever more complex, there are simply more things that can go wrong. As a consequence, recovery plans have also become more complex. According to Janco Associates (the author of the Disaster Recovery Business Continuity Template). For example, fifteen or twenty years ago if there was a threat to systems from a fire, a disaster recovery plan might consist of powering down the mainframe and other computers before the sprinkler system came on, disassembling components, and subsequently drying circuit boards in the parking lot with a hair dryer. Current enterprise systems tend to be too large and complicated for such simple and hands-on approaches, however, and interruption of service or loss of data can have serious financial impact, whether directly or through loss of customer confidence.
Appropriate plans vary from one enterprise to another, depending on variables such as the type of business, the processes involved, and the level of security needed. Disaster recovery planning may be developed within an organization or purchased as a software application or a service. It is not unusual for an enterprise to spend 25% of its information technology budget on disaster recovery.
Nevertheless, the consensus within the DR industry is that most enterprises are still ill-prepared for a disaster. According to the Janco Associates Disaster Recover Business Continuity web site, Despite the number of very public disasters since 9/11, still only about 50 percent of companies report having a disaster recovery plan. Of those that do, nearly half have never tested their plan, which is tantamount to not having one at all.- more info
eCommerace mandates business continuity managementDecember 14th, 2011
There's little doubt that business continuity management (BCM) must be front and center for today's payment card issuers : the potential cost implications of an unmanaged catastrophic incident within the supply chain for payment card issuers can run into millions of Euros and cause wide-ranging reputational issues that may impact customer growth.more info
Lost data is critical to usersNovember 10th, 2011
The general lack of preparedness for disasters and business interuptions is surprising in light of the fact that 40% of users feel like they would never be able to recover, recreate or repurchase all of their documents and files if their personal computer crashed. Its even more surprising considering the insights that the study uncovered regarding the significant value many assign to their digital content, including:
- It is More Valuable Than Vacation Time
- It is Even More Precious Than My Wedding Ring
- I would Pay Dearly to Get My Data Back
- I would Sacrifice Something I Love to Save My Data
Users Place Too Much Trust in Their Hard Drives
Users are surprisingly trusting of their computer hard drives, particularly taking into account that over half have lost all of their personal files in a computer crash at some point. According to study, 82% of users keep electronic files only and the majority of these files are nowhere else but on their computer hard drive. The most popular files people store digitally are photos (55%), music (46%), resumes (42%), addresses (28%), phone numbers (27%), and financial documents (22%). Notably, the average user surveyed has more than $400 of digital music and movies on their computers and that, for one in four, the music and movies are worth more than the computer itself.- more info
How does ISO 27031 impact your disaster plan?October 18th, 2011
ISO 27031:2011, the information and communications technology (ICT) continuity management standard developed originally by the British Standards Institution (BSI), was accepted as an ISO standard in 2011. It represents a management systems-based implementation of an IT disaster recovery program. It has six key principles:
- Protecting the ICT environment from incidents, failures and disruptions;
- Detecting incidents at the earliest possible time;
- Reacting to incidents as efficiently as possible;
- Recovering by identifying and implementing appropriate recovery strategies;
- Operating in disaster recovery mode.
- Returning to normal operations.
While ISO 27031 is intended for use in the larger context of a business continuity program, organizations have successfully implemented this standard and then later grew into business continuity.
Structured as a management systems-based standard, ISO 27031 has two main components: the management system and the process. The management system is intended to ensure that an organization has a documented process to execute ICT continuity management. It utilizes the plan-do-check-act (PDCA) cycle consistent with ISO and other management system based standards. The process details the necessary components to provide the recovery capability. While the management system described in ISO 27031 can be established solely for IT disaster recovery, there are elements of the process that assume the existence of an overall business continuity program. As you can see below, ICT requirements are established by business continuity requirements typically determined during a business impact analysis.
The process of developing, maintaining, and improving an ICT capability are defined as five high level components:
- Understanding the ICT requirements for business continuity - with the purpose of determining the ICT continuity services needed to support the business continuity requirements. The process requires understanding the components of critical services in production, their current continuity capability and the gap between current capabilities and business continuity requirements. The analysis should also focus on actions that can be taken to improve the resiliency of the production environment;
- Determining ICT continuity strategies - with the purpose of developing both an overall ICT continuity management strategy and strategies for each critical ICT service that closes gaps identified during the previous phase;
- Developing and implementing ICT strategies - with the purpose of implementing the chosen strategies, including establishing the necessary organizational structure, plans and procedures;
- Exercising and testing - with the purpose of ensuring that the strategies and plans work as intended;
- Maintenance, review and improvement - with the purpose of ensuring that ICT continuity strategy remains current and appropriate.
For those familiar with BS 25999-2:2007, the business continuity management standard, the structure above is consistent with sections four through six of that standard.
Given the similarities to BS 25999, ISO 27031 is the logical choice for implementing a disaster recovery capability in organizations that either utilize BS 25999 for business continuity or have other management systems-based programs. It also provides solid guidance for organizations that have no business continuity or other structure in place to serve as a basis for disaster recovery development. Establishing a management system as part of an ISO 27031 implementation will provide the necessary governance and provide a platform for the development of a more comprehensive business continuity program.- more info
Disaster recovery done in place should use outside expertsOctober 16th, 2011
Many organizations simply do not have the luxury of being able to move to an alternative recovery site following a physical disruption. In these cases disaster recovery plans should include the support of a disaster recovery company that will aid the internal recovery and incident team to mitigate against secondary damage, administer triage to the affected areas and expedite the correct equipment, methods and manpower to restore their facility as quickly as possible to a suitable working environment, so that service can be resumed.
Such disaster recovery responders will be on 24/7 standby to attend the client site. The responder will have conducted a survey of the site in advance of an incident, noting critical information so that any recovery and restoration objectives will be expedited without delay.
Speed of response is vital: in order to reduce the level of disruption and physical secondary damage; and to limit the time in which function is lost. Dealing with an incident within the first few hours may reduce the total time of the disruptive event by weeks.- more info
Europe is more vulnerable to natural disastersOctober 12th, 2011
The significant increase in thenumber of natural hazards taking place in Europe according to the United Nations disaster risk reduction agency. The are warning that the region's governments need to implement prevention platforms to significantly reduce the danger they pose to their populations.
In 2010, Europe saw an 18.2 percent increase in disaster events compared to the decade's averages according to the chief of the UN International Strategy for Disaster Reduction (UNISDR).
In terms of economic damages, Europe accounted for 14.3 percent of reported global disaster losses in 2010, with most of the damages caused by climatological and hydro meteorological events.
Although this is cause for concern, there is evidence that European governments are slowly implementing adequate disaster risk reduction measures:
National reports demonstrate a gradual evolution from a mindset of crisis and response to one of proactive risk reduction and safety. Countries who have or are going to establish national platforms (NPs) for disaster reduction are reporting significant and ongoing success in addressing cross cutting risk reduction issues - more than double compared to those countries without NPs.
Also highlighted Europe's participation in the 2010-2011 World Disaster Reduction Campaign - Making Cities Resilient:
Europe is the most active region in embracing the campaign: 378 European cities have joined the campaign to improve their resilience and to exchange their experiences and challenges.- more info
Developing a Disaster Recovery Testing ProcessOctober 10th, 2011
Most real disasters are much less well-structured than a test - so if you can't make the test work when you can plan for it in advance and stage everything just right, what chance will you have if the big one hits?
One way to get a workable DR plan is to do some up-front scenario analysis after the BIA is done and build up a set of layered responses to incidents of increasing severity. For the least serious impacts you can engineer high availability solutions - essentially disaster avoidance strategies. For disasters you cant avoid, you can build routine operational processes (things like rolling cluster upgrades, managed application failover, deliberate load shifting) that let you practice for a real problem, so your people are familiar with most of the work theyll need to do in a disaster. That will also exercise most of the technologies youll need and ensure theyre working reliably - and that the disaster wont be their first use.- more info
Wiring meltdown can be a disasterOctober 1st, 2011
The design of data centers and large computer rooms always includes a cooling system. Yet many IT devices are located in distributed spaces outside of the computer room in closets, branch offices, and other locations that were never designed with provisions for cooling IT equipment. The power density of IT equipment has increased over time and the result is that distributed IT equipment such as VoIP routers, switches or servers often overheat or fail prematurely due to inadequate cooling.
To properly specify the appropriate cooling solution for a wiring closet, the temperature at which that closet should operate must first be specified. IT equipment vendors usually provide a maximum temperature under which their devices are designed to operate. For active IT equipment typically found in a wiring closet, this temperature is usually 104°F (40°C). This is the maximum temperature at which the vendor is able to guarantee performance and reliability for the stated warranty period. It is important to understand that although the maximum published operating temperature is acceptable per the manufacturer, operating at that temperature will not generally provide the same level of availability or longevity as operating at lower temperatures. Because of this, some IT equipment vendors also publish recommended operating temperatures for their equipment in addition to the maximum allowed. Typical recommended operating temperatures from IT equipment vendors are between 70°F (21°C) and 75°F (24°C).- more info
Hardware solutions for business continuitySeptember 16th, 2011
A recent survey of 500 US companies finds more than 217 million person-hours were lost to unplanned interruption events: equivalent to approximately 65,000 workers idle and non-productive for an entire year! The same survey found that IT outages are considered substantially damaging to companies reputations, staff morale and customer loyalty.
The response of the technology industry has focused on high availability (HA) as a hedge against downtime. Common memes include:
- more info
- The substitution of hypervisor-based server virtualization with on-the-fly guest machine re-hosting for conventional server computing
- The introduction of complex physical machine clustering in platforms supporting mission critical applications
- The adoption of Wide Area Network-based disk-to-disk data replication rather than traditional tape backup