Dynamic Partitioning in Windows Longhorn Santosh Jodh Software Design Engineer Windows Kernel Platform Group santoshj @ microsoft.com Microsoft Corporation Mike Tricker Program Manager Windows Kernel Platform Group miketri @ microsoft.com Microsoft Corporation Session Outline Introduction to Dynamic Partitioning (DP) Clarifying the terminology Reliability, Availability & Serviceability (RAS) Capacity on Demand (CoD) Resource Management (RM) Hot Add, Replace & Remove Goals and non-goals for DP on Windows codenamed “Longhorn” What we’re expecting others to do to support this Session Goals Attendees should leave this session with a good understanding of the following: What Microsoft means by Dynamic Partitioning DP-related terminology and acronyms Microsoft’s goals and non-goals for DP in Windows Longhorn Knowledge of where to find resources for DP An Introduction to Dynamic Partitioning A hardware partitionable server has the ability to create one or more isolated hardware partitions comprising processors, memory and I/O, each supporting a single Windows instance A dynamically partitionable server has the ability to add, replace or remove hardware within a partition without needing to reboot the OS instance within the partition Why is this interesting? Hardware partition support has been available on some large servers for a number of years Windows is supported on hardware partitionable systems today, but does not support dynamic hardware partitioning With the projected increase in processor performance Microsoft expects a number of these features to become available on mid-range systems Microsoft plans to add support for dynamically partitionable hardware in Windows Longhorn Why You Should Care About DP Microsoft believes that the capabilities that have previously been limited to expensive high end systems are moving into the mainstream Together with the introduction of multi-core processors this will make relatively small and inexpensive systems as powerful and reliable as today’s high end systems This will push highly fault-tolerant enterprise-critical applications such as large databases and management information applications onto less expensive platforms Which means that a range of hardware that has not previously had to consider some of the issues with dynamic hardware will now need to In the same way that RAID 5 changed the way in which we considered disks in the 1990’s What Do We Mean By a Partition? SQL Exchange VM1 VM2 VM3 SQL Resource Management Virtual Server OS OS OS Cell 1 Cell 2 One scale-up application, e.g., Database Cell 3 Cell 4 Multiple applications running on one OS All running on a single system Cell 5 Cell 6 Multiple Virtual Machines running on one OS Reliability, Availability and Serviceability Minimizing unplanned downtime due to failing hardware E.g. if a processor starts to show signs of failing (increasing number of corrected errors or thermal events) swap it with one that’s on standby without needing to reboot the computer (similar to a hot spare disk in RAID 5) Capacity on Demand The ability to enable processors that are physically present in the computer but not enabled by default E.g. buy a system with 8 processors, only 4 of which are initially paid for, enabled and used by the OS, and then when the workload grows pay to enable 2 or 4 more Resource Management Sharing resources between two or more partitions E.g. If the load on partition 1 is increasing whilst the workload on partition 2 is decreasing move processors and/or memory from partition 2 to partition 1 to better handle the increasing workload More Terminology Socket A physical socket into which a processor and/or memory may be plugged mechanically Sockets may also be independently powered Partition Unit (PU) A collection of system resources that form the smallest building blocks that can be assigned to a partition E.g. processors, memory and I/O host bridges More than one PU may be required to boot a partition Yet More Terminology Hot Add Adding a socket or cell to a running partition Hot Remove Removing a socket or cell from a running partition Hot Replace Replacing a socket or cell in a running partition with one that is already physically present in the system but offline before the operation is started Note that Hot Replace is NOT the same as Hot Remove followed by Hot Add And Yet More Terminology Hot Swap Some vendors support a model that does not require the stand-by hardware to be physically present before the Replace operation is started, and thus IS equivalent to a Hot Remove followed by a Hot Add Hot Plug A term typically covering Hot Add and Hot Remove Assumptions We’re Making About Hardware Future partitionable machines will contain PUs which comprise Processors and memory together Processors Memory I/O host bridges The ACPI tables in those systems will be updated to expose specific methods required to support changing the hardware configuration without needing to reboot The firmware will be able to assist the OS during Hot Add and Hot Replace operations More Hardware Assumptions Systems will include a Service Processor (SP) or Baseboard Management Controller (BMC) PUs can be electrically isolated when not in use No hardware assigned to a specific PU can be shared with other partitions, ensuring that a single failure cannot affect more than one partition Dynamic Hardware Partitioning Core Memory Core Cache Core Memory … Core … Core Memory Cache … Core Core … Core Memory Cache Cache Longhorn dynamic hardware partitioning features are focused on improving server RAS Service Processor IO Bridge IO Bridge IO Bridge IO Bridge ... ... ... ... Partition Manager PCI Express Future Hardware Partitionable Server 1. Partition Manager provides the UI for partition creation and management 3. Platforms partitionable to the socket level. Virtualization used for sub socket partitioning 2. Service Processor controls the inter processor and IO connections 4. Support for dynamic partitioning and socket replacement Goals For Windows Longhorn Support the Hot Add of: Processors Memory I/O host bridges Support the Hot Replace of: Processors Memory OS support only x64 and Itanium only – no 32-bit support will be provided Server SKUs only - for SKUs supporting 4 processors or more only Non-Goals For Windows Longhorn Hot Remove Windows Longhorn will not support the Hot Remove of processors or memory However tools will be supplied to allow both device driver and application developers to validate that they behave correctly in the case of a Hot Remove operation for either processors or memory Partition Manager Today’s Partition Managers are proprietary to each major OEM’s platform, and Microsoft will not be providing equivalent functionality in Windows Longhorn Microsoft will work with the system vendors to enable Windows DP support via their partition management tools SP & BMC “drivers” SPs and BMCs are devices that can be accessed from Windows via a device driver Windows Longhorn will include an IPMI driver which can communicate with SPs and BMCs via a standard interface, but will not provide specific drivers for any vendor’s SP or BMC Supporting Technologies Windows Hardware Error Architecture (WHEA) Error infrastructure designed to support (amongst other things) DP, especially Hot Replace operations Making hardware error information more easily available for management applications to analyze and make failure predictions Extends the Machine Check Architecture available with the Intel Itanium platform Multi-level rebalance Windows Longhorn offers more sophisticated and extensive rebalance operations when hardware is added or removed This is not specific to DP, but will be leveraged by DP to make these operations as efficient as possible PCI Express and specifically Advanced Error Reporting The PCI bus is unable to report many errors, and most end up as NMIs PCI Express introduces AER and supports error correction, which will be exposed by WHEA for error prediction by management applications Status of the Various Components Hot Add of memory is already supported by Windows Server 2003 x86 support shipped in Windows Server 2003 RTM x64 & Itanium support was added in Windows Server 2003 Service Pack 1 Hot Add of I/O Various device classes supporting Hot Plug are already available With Windows Longhorn the extended support for PCI Express devices makes this a very compelling feature Hot Add Processor support is now in test On x64 and Itanium Hot Replace for processors and memory is under development What DP Implies to an Application Developer Add: applications can register for plug & play notifications of new hardware arriving Application developers with hard dependencies on memory or number of threads should watch for these notifications and update their behavior accordingly Resource management software, such as Microsoft’s WSRM, can abstract these changes such that the majority of applications do not need to explicitly handle these notifications Replace: applications will be unaffected and will see no change in the system Application developers need do nothing Remove: applications cannot make hard assumptions about memory or thread affinity Application developers cannot make assumptions about memory being fixed that they may do today, and should not rely upon thread affinity or the size of thread pools being static What DP Implies to a Driver Developer Add: drivers can register for plug & play notifications of new hardware arriving Driver developers have fewer memory size limitations than application developers, and pool sizes will not change even if overall memory grows. The addition of processors and the related interrupt routing changes should also be invisible to drivers So in the Add case most drivers will not do anything new Replace: drivers will be unaffected and will see no change in the system There are implications around device timeouts, as it will be necessary to quiesce the system whilst the replace operation completes Remove: drivers cannot make hard assumptions about memory or thread affinity Drivers cannot make any assumptions around thread affinity, or even that the affinity mask will remain contiguous as it is today Logo Requirements and Testing NOTE: DP is a Server-only feature, so there are no new Client requirements arising from this feature A number of new requirements are being proposed for the Microsoft logo program for Server to support DP Most apply to either platform firmware or device drivers Specific ACPI method support Device drivers must not assume that the processor affinity mask is contiguous We will also be providing test tools to ensure that you’re ready for Hot Remove support in a subsequent Windows release These will apply to both applications and device drivers Other Implications of DP What about NUMA? What happens to the System Resource Affinity Table (SRAT) or System Locality Distance Information Table (SLIT) when new hardware gets added? Nothing happens to the SRAT as it’s a static table updated (and read by Windows) only at boot time So it will be updated the first time the system reboots after hardware is added For Windows Longhorn we’re not making use of the SLIT nor supporting the _SLI method to update locality information dynamically, so again nothing needs to be done here Summary Windows Longhorn is planned to contain support for: Hot Add of processors, memory and I/O host bridges Hot Replace of processors and memory Windows Longhorn will not contain support for: Hot Remove of memory and processors An in-box Partition Manager There are things you’ll need to do to: Enable DP on your systems If your application is hardware-aware you may make use of the benefits offered by DP, and to not fail when hardware changes underneath you Ensure that your device drivers work correctly on DP-capable systems Call to Action Application developers can benefit from DP if they make their application DP-aware Driver developers need to make their drivers DPaware to work well on DP-capable systems Any may fail completely if they are badly behaved when hardware changes beneath them You may already be talking to us if you’re interested in DP If you’re interested and aren’t yet talking to us then please do! Community Resources Windows Hardware & Driver Central (WHDC) www.microsoft.com/whdc/default.mspx Technical Communities www.microsoft.com/communities/products/default.mspx Non-Microsoft Community Sites www.microsoft.com/communities/related/default.mspx Microsoft Public Newsgroups www.microsoft.com/communities/newsgroups Technical Chats and Webcasts www.microsoft.com/communities/chats/default.mspx www.microsoft.com/webcasts Microsoft Blogs www.microsoft.com/communities/blogs Additional Resources Email: dpfb @ microsoft.com Related Sessions Windows Hardware Error Architecture Error Management Solutions Synergy with WHEA © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.