Non-uniform_memory_access

By Wikipedia

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). The benefits of NUMA are limited to particular workloads, notably on servers where the data are often associated strongly with certain tasks or users.[1]

NUMA architectures logically follow in scaling from symmetric multiprocessing (SMP) architectures. They were developed commercially during the 1990s by Burroughs (later Unisys), Convex Computer (later Hewlett-Packard), Honeywell Information Systems Italy (HISI) (later Groupe Bull), Silicon Graphics (later Silicon Graphics International), Sequent Computer Systems (later IBM), Data General (later EMC), and Digital (later Compaq, now HP). Techniques developed by these companies later featured in a variety of Unix-like operating systems, and to an extent in Windows NT.

The first commercial implementation of a NUMA-based Unix system was the Symmetrical Multi Processing XPS-100 family of servers, designed by Dan Gielan of VAST Corporation for Honeywell Information Systems Italy.

Basic concept[edit]

One possible architecture of a NUMA system. The processors connect to the bus or crossbar by connections of varying thickness/number. This shows that different CPUs have different access priorities to memory based on their relative location.

Modern CPUs operate considerably faster than the main memory they use. In the early days of computing and data processing, the CPU generally ran slower than its own memory. The performance lines of processors and memory crossed in the 1960s with the advent of the first supercomputers. Since then, CPUs increasingly have found themselves "starved for data" and having to stall while waiting for data to arrive from memory. Many supercomputer designs of the 1980s and 1990s focused on providing high-speed memory access as opposed to faster processors, allowing the computers to work on large data sets at speeds other systems could not approach.

Limiting the number of memory accesses provided the key to extracting high performance from a modern computer. For commodity processors, this meant installing an ever-increasing amount of high-speed cache memory and using increasingly sophisticated algorithms to avoid cache misses. But the dramatic increase in size of the operating systems and of the applications run on them has generally overwhelmed these cache-processing improvements. Multi-processor systems without NUMA make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access the computer's memory at a time.[2]

NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for servers and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks).[3] Another approach to addressing this problem, utilized mainly by non-NUMA systems, is the multi-channel memory architecture; multiple memory channels are increasing the number of simultaneous memory accesses.[4]

Of course, not all data ends up confined to a single task, which means that more than one processor may require the same data. To handle these cases, NUMA systems include additional hardware or software to move data between memory banks. This operation slows the processors attached to those banks, so the overall speed increase due to NUMA depends heavily on the nature of the running tasks.[3]

Intel announced NUMA compatibility for its x86 and Itanium servers in late 2007 with its Nehalem and Tukwila CPUs.[5] Both CPU families share a common chipset; the interconnection is called Intel Quick Path Interconnect (QPI).[6] AMD implemented NUMA with its Opteron processor (2003), using HyperTransport. Freescale's NUMA for PowerPC is called CoreNet.

Cache coherent NUMA (ccNUMA)[edit]

Topology of a ccNUMA Bulldozer server.

Nearly all CPU architectures use a small amount of very fast non-shared memory known as cache to exploit locality of reference in memory accesses. With NUMA, maintaining cache coherence across shared memory has a significant overhead. Although simpler to design and build, non-cache-coherent NUMA systems become prohibitively complex to program in the standard von Neumann architecture programming model.[7]

Typically, ccNUMA uses inter-processor communication between cache controllers to keep a consistent memory image when more than one cache stores the same memory location. For this reason, ccNUMA may perform poorly when multiple processors attempt to access the same memory area in rapid succession. Support for NUMA in operating systems attempts to reduce the frequency of this kind of access by allocating processors and memory in NUMA-friendly ways and by avoiding scheduling and locking algorithms that make NUMA-unfriendly accesses necessary.[8]

Alternatively, cache coherency protocols such as the MESIF protocol attempt to reduce the communication required to maintain cache coherency. Scalable Coherent Interface (SCI) is an IEEE standard defining a directory-based cache coherency protocol to avoid scalability limitations found in earlier multiprocessor systems. For example, SCI is used as the basis for the NumaConnect technology.[9][10]

As of 2011, ccNUMA systems are multiprocessor systems based on the AMD Opteron processor, which can be implemented without external logic, and the Intel Itanium processor, which requires the chipset to support NUMA. Examples of ccNUMA-enabled chipsets are the SGI Shub (Super hub), the Intel E8870, the HP sx2000 (used in the Integrity and Superdome servers), and those found in NEC Itanium-based systems. Earlier ccNUMA systems such as those from Silicon Graphics were based on MIPS processors and the DEC Alpha 21364 (EV7) processor.

NUMA vs. cluster computing[edit]

One can view NUMA as a tightly coupled form of cluster computing. The addition of virtual memory paging to a cluster architecture can allow the implementation of NUMA entirely in software. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater (slower) than that of hardware-based NUMA.[1]

Software support[edit]

Since NUMA largely influences memory access performance, certain software optimizations are needed to allow scheduling threads and processes close to their data.

  • Microsoft Windows 7 and Windows Server 2008 R2 add support for NUMA architecture over 64 logical cores.[11]
  • Java 7 added support for NUMA-aware memory allocator and garbage collector.[12]
  • The Linux kernel 2.5 already had basic support built-in,[13] which was further extended in subsequent releases. Linux kernel version 3.8 brought a new NUMA foundation which allowed more efficient NUMA policies to be built in the next kernel releases.[14][15] Linux kernel version 3.13 brought numerous policies that attempt to put a process near its memory, together with handling of cases such as shared pages between processes, or transparent huge pages; new sysctl settings are allowing NUMA balancing to be enabled or disabled, as well as various NUMA memory balancing parameters to be configured.[16][17][18]
  • OpenSolaris models NUMA architecture with lgroups.

See also[edit]

References[edit]

  1. ^ a b Nakul Manchanda; Karan Anand (2010-05-04). "Non-Uniform Memory Access (NUMA)". New York University. Retrieved 2014-01-27. 
  2. ^ Sergey Blagodurov; Sergey Zhuravlev; Mohammad Dashti; Alexandra Fedorov (2011-05-02). "A Case for NUMA-aware Contention Management on Multicore Systems" (PDF). Simon Fraser University. Retrieved 2014-01-27. 
  3. ^ a b Zoltan Majo; Thomas R. Gross (2011). "Memory System Performance in a NUMA Multicore Multiprocessor" (PDF). ACM. Retrieved 2014-01-27. 
  4. ^ "Intel Dual-Channel DDR Memory Architecture White Paper" (PDF, 1021 KB) (Rev. 1.0 ed.). Infineon Technologies North America and Kingston Technology. September 2003. Archived from the original on 2011-09-29. Retrieved 2007-09-06. 
  5. ^ Intel Corp. (2008). Intel QuickPath Architecture [White paper]. Retrieved from http://www.intel.com/pressroom/archive/reference/whitepaper_QuickPath.pdf
  6. ^ Intel Corporation. (September 18th, 2007). Gelsinger Speaks To Intel And High-Tech Industry's Rapid Technology Caden[Press release]. Retrieved from http://www.intel.com/pressroom/archive/releases/2007/20070918corp_b.htm
  7. ^ "ccNUMA: Cache Coherent Non-Uniform Memory Access". slideshare.net. 2014. Retrieved 2014-01-27. 
  8. ^ Per Stenstromt; Truman Joe; Anoop Gupta (2002). "Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures" (PDF). ACM. Retrieved 2014-01-27. 
  9. ^ David B. Gustavson (September 1991). "The Scalable Coherent Interface and Related Standards Projects". SLAC Publication 5656. Stanford Linear Accelerator Center. Retrieved January 27, 2014. 
  10. ^ "The NumaChip enables cache coherent low cost shared memory". Numascale.com. Retrieved 2014-01-27. 
  11. ^ NUMA Support (MSDN)
  12. ^ Java HotSpot™ Virtual Machine Performance Enhancements
  13. ^ "Linux Scalability Effort: NUMA Group Homepage". sourceforge.net. 2002-11-20. Retrieved 2014-02-06. 
  14. ^ "1.8. Automatic NUMA balancing". Linux 3.8. kernelnewbies.org. 2013-02-08. Retrieved 2014-02-06. 
  15. ^ Jonathan Corbet (2012-11-14). "NUMA in a hurry". LWN.net. Retrieved 2014-02-06. 
  16. ^ "1.6. Improved performance in NUMA systems". Linux 3.13. kernelnewbies.org. 2014-01-19. Retrieved 2014-02-06. 
  17. ^ "Documentation/sysctl/kernel.txt". Linux kernel documentation. kernel.org. Retrieved 2014-02-06. 
  18. ^ Jonathan Corbet (2013-10-01). "NUMA scheduling progress". LWN.net. Retrieved 2014-02-06. 

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

External links[edit]

";}s:9:"langlinks";a:20:{i:0;a:5:{s:4:"lang";s:2:"ar";s:3:"url";s:146:"http://ar.wikipedia.org%D8%B0%D8%A7%D9%83%D8%B1%D8%A9_%D8%BA%D9%8A%D8%B1_%D9%85%D9%88%D8%AD%D8%AF%D8%A9_%D8%A7%D9%84%D9%88%D8%B5%D9%88%D9%84";s:8:"langname";s:6:"Arabic";s:7:"autonym";s:14:"العربية";s:1:"*";s:41:"ذاكرة غير موحدة الوصول";}i:1;a:5:{s:4:"lang";s:2:"ca";s:3:"url";s:46:"http://ca.wikipedia.orgArquitectura_NUMA";s:8:"langname";s:7:"Catalan";s:7:"autonym";s:7:"català";s:1:"*";s:17:"Arquitectura NUMA";}i:2;a:5:{s:4:"lang";s:2:"cs";s:3:"url";s:54:"http://cs.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:5:"Czech";s:7:"autonym";s:9:"čeština";s:1:"*";s:25:"Non-Uniform Memory Access";}i:3;a:5:{s:4:"lang";s:2:"de";s:3:"url";s:54:"http://de.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:6:"German";s:7:"autonym";s:7:"Deutsch";s:1:"*";s:25:"Non-Uniform Memory Access";}i:4;a:5:{s:4:"lang";s:2:"es";s:3:"url";s:33:"http://es.wikipedia.orgNUMA";s:8:"langname";s:7:"Spanish";s:7:"autonym";s:8:"español";s:1:"*";s:4:"NUMA";}i:5;a:5:{s:4:"lang";s:2:"fa";s:3:"url";s:157:"http://fa.wikipedia.org%D8%AD%D8%A7%D9%81%D8%B8%D9%87_%D8%AF%D8%B3%D8%AA%D8%B1%D8%B3%DB%8C_%D8%BA%DB%8C%D8%B1%DB%8C%DA%A9%D9%BE%D8%A7%D8%B1%DA%86%D9%87";s:8:"langname";s:7:"Persian";s:7:"autonym";s:10:"فارسی";s:1:"*";s:44:"حافظه دسترسی غیریکپارچه";}i:6;a:5:{s:4:"lang";s:2:"fr";s:3:"url";s:54:"http://fr.wikipedia.orgNon_Uniform_Memory_Access";s:8:"langname";s:6:"French";s:7:"autonym";s:9:"français";s:1:"*";s:25:"Non Uniform Memory Access";}i:7;a:5:{s:4:"lang";s:2:"ko";s:3:"url";s:113:"http://ko.wikipedia.org%EB%B6%88%EA%B7%A0%EC%9D%BC_%EA%B8%B0%EC%96%B5_%EC%9E%A5%EC%B9%98_%EC%A0%91%EA%B7%BC";s:8:"langname";s:6:"Korean";s:7:"autonym";s:9:"한국어";s:1:"*";s:30:"불균일 기억 장치 접근";}i:8;a:5:{s:4:"lang";s:2:"it";s:3:"url";s:54:"http://it.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:7:"Italian";s:7:"autonym";s:8:"italiano";s:1:"*";s:25:"Non-Uniform Memory Access";}i:9;a:5:{s:4:"lang";s:2:"hu";s:3:"url";s:54:"http://hu.wikipedia.orgNon-uniform_memory_access";s:8:"langname";s:9:"Hungarian";s:7:"autonym";s:6:"magyar";s:1:"*";s:25:"Non-uniform memory access";}i:10;a:5:{s:4:"lang";s:2:"mk";s:3:"url";s:54:"http://mk.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:10:"Macedonian";s:7:"autonym";s:20:"македонски";s:1:"*";s:25:"Non-Uniform Memory Access";}i:11;a:5:{s:4:"lang";s:2:"nl";s:3:"url";s:54:"http://nl.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:5:"Dutch";s:7:"autonym";s:10:"Nederlands";s:1:"*";s:25:"Non-Uniform Memory Access";}i:12;a:5:{s:4:"lang";s:2:"ja";s:3:"url";s:33:"http://ja.wikipedia.orgNUMA";s:8:"langname";s:8:"Japanese";s:7:"autonym";s:9:"日本語";s:1:"*";s:4:"NUMA";}i:13;a:5:{s:4:"lang";s:2:"pl";s:3:"url";s:54:"http://pl.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:6:"Polish";s:7:"autonym";s:6:"polski";s:1:"*";s:25:"Non-Uniform Memory Access";}i:14;a:5:{s:4:"lang";s:2:"pt";s:3:"url";s:68:"http://pt.wikipedia.orgAcesso_n%C3%A3o_uniforme_a_mem%C3%B3ria";s:8:"langname";s:10:"Portuguese";s:7:"autonym";s:10:"português";s:1:"*";s:31:"Acesso não uniforme a memória";}i:15;a:5:{s:4:"lang";s:2:"ru";s:3:"url";s:54:"http://ru.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:7:"Russian";s:7:"autonym";s:14:"русский";s:1:"*";s:25:"Non-Uniform Memory Access";}i:16;a:5:{s:4:"lang";s:2:"sv";s:3:"url";s:54:"http://sv.wikipedia.orgNon-Uniform_Memory_Access";s:8:"langname";s:7:"Swedish";s:7:"autonym";s:7:"svenska";s:1:"*";s:25:"Non-Uniform Memory Access";}i:17;a:5:{s:4:"lang";s:2:"tr";s:3:"url";s:33:"http://tr.wikipedia.orgNUMA";s:8:"langname";s:7:"Turkish";s:7:"autonym";s:8:"Türkçe";s:1:"*";s:4:"NUMA";}i:18;a:5:{s:4:"lang";s:2:"uk";s:3:"url";s:33:"http://uk.wikipedia.orgNUMA";s:8:"langname";s:9:"Ukrainian";s:7:"autonym";s:20:"українська";s:1:"*";s:4:"NUMA";}i:19;a:5:{s:4:"lang";s:2:"zh";s:3:"url";s:92:"http://zh.wikipedia.org%E9%9D%9E%E5%9D%87%E5%8C%80%E8%AE%BF%E5%AD%98%E6%A8%A1%E5%9E%8B";s:8:"langname";s:7:"Chinese";s:7:"autonym";s:6:"中文";s:1:"*";s:21:"非均匀访存模型";}}s:10:"categories";a:2:{i:0;a:2:{s:7:"sortkey";s:0:"";s:1:"*";s:18:"Parallel_computing";}i:1;a:2:{s:7:"sortkey";s:0:"";s:1:"*";s:15:"Computer_memory";}}s:5:"links";a:152:{i:0;a:3:{s:2:"ns";i:10;s:1:"*";s:27:"Template:Parallel computing";s:6:"exists";s:0:"";}i:1;a:3:{s:2:"ns";i:11;s:1:"*";s:32:"Template talk:Parallel computing";s:6:"exists";s:0:"";}i:2;a:3:{s:2:"ns";i:0;s:1:"*";s:11:"AMD Opteron";s:6:"exists";s:0:"";}i:3;a:3:{s:2:"ns";i:0;s:1:"*";s:11:"Alpha 21364";s:6:"exists";s:0:"";}i:4;a:3:{s:2:"ns";i:0;s:1:"*";s:12:"Amdahl's law";s:6:"exists";s:0:"";}i:5;a:3:{s:2:"ns";i:0;s:1:"*";s:25:"Application checkpointing";s:6:"exists";s:0:"";}i:6;a:3:{s:2:"ns";i:0;s:1:"*";s:33:"Application programming interface";s:6:"exists";s:0:"";}i:7;a:3:{s:2:"ns";i:0;s:1:"*";s:26:"Asymmetric multiprocessing";s:6:"exists";s:0:"";}i:8;a:3:{s:2:"ns";i:0;s:1:"*";s:8:"Ateji PX";s:6:"exists";s:0:"";}i:9;a:3:{s:2:"ns";i:0;s:1:"*";s:26:"Barrier (computer science)";s:6:"exists";s:0:"";}i:10;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Beowulf cluster";s:6:"exists";s:0:"";}i:11;a:3:{s:2:"ns";i:0;s:1:"*";s:21:"Bit-level parallelism";s:6:"exists";s:0:"";}i:12;a:3:{s:2:"ns";i:0;s:1:"*";s:21:"Boost (C++ libraries)";s:6:"exists";s:0:"";}i:13;a:3:{s:2:"ns";i:0;s:1:"*";s:29:"Bulldozer (microarchitecture)";s:6:"exists";s:0:"";}i:14;a:3:{s:2:"ns";i:0;s:1:"*";s:23:"Burroughs large systems";s:6:"exists";s:0:"";}i:15;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"C++ AMP";s:6:"exists";s:0:"";}i:16;a:3:{s:2:"ns";i:0;s:1:"*";s:9:"CPU cache";s:6:"exists";s:0:"";}i:17;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"CUDA";s:6:"exists";s:0:"";}i:18;a:3:{s:2:"ns";i:0;s:1:"*";s:30:"Cache-only memory architecture";s:6:"exists";s:0:"";}i:19;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Cache coherence";s:6:"exists";s:0:"";}i:20;a:3:{s:2:"ns";i:0;s:1:"*";s:18:"Cache invalidation";s:6:"exists";s:0:"";}i:21;a:3:{s:2:"ns";i:0;s:1:"*";s:12:"Cache memory";s:6:"exists";s:0:"";}i:22;a:3:{s:2:"ns";i:0;s:1:"*";s:10:"Cache miss";s:6:"exists";s:0:"";}i:23;a:3:{s:2:"ns";i:0;s:1:"*";s:30:"Cache only memory architecture";s:6:"exists";s:0:"";}i:24;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"Charm++";s:6:"exists";s:0:"";}i:25;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"Chipset";s:6:"exists";s:0:"";}i:26;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"Cilk";s:6:"exists";s:0:"";}i:27;a:3:{s:2:"ns";i:0;s:1:"*";s:9:"Cilk Plus";s:6:"exists";s:0:"";}i:28;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Cloud computing";s:6:"exists";s:0:"";}i:29;a:3:{s:2:"ns";i:0;s:1:"*";s:17:"Cluster computing";s:6:"exists";s:0:"";}i:30;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Coarray Fortran";s:6:"exists";s:0:"";}i:31;a:3:{s:2:"ns";i:0;s:1:"*";s:6:"Compaq";s:6:"exists";s:0:"";}i:32;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Computer cluster";s:6:"exists";s:0:"";}i:33;a:3:{s:2:"ns";i:0;s:1:"*";s:17:"Computer hardware";s:6:"exists";s:0:"";}i:34;a:3:{s:2:"ns";i:0;s:1:"*";s:20:"Computer programming";s:6:"exists";s:0:"";}i:35;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Computer storage";s:6:"exists";s:0:"";}i:36;a:3:{s:2:"ns";i:0;s:1:"*";s:30:"Concurrency (computer science)";s:6:"exists";s:0:"";}i:37;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Convex Computer";s:6:"exists";s:0:"";}i:38;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Cost efficiency";s:6:"exists";s:0:"";}i:39;a:3:{s:2:"ns";i:0;s:1:"*";s:12:"Data General";s:6:"exists";s:0:"";}i:40;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Data parallelism";s:6:"exists";s:0:"";}i:41;a:3:{s:2:"ns";i:0;s:1:"*";s:8:"Deadlock";s:6:"exists";s:0:"";}i:42;a:3:{s:2:"ns";i:0;s:1:"*";s:23:"Deterministic algorithm";s:6:"exists";s:0:"";}i:43;a:3:{s:2:"ns";i:0;s:1:"*";s:29:"Digital Equipment Corporation";s:6:"exists";s:0:"";}i:44;a:3:{s:2:"ns";i:0;s:1:"*";s:21:"Distributed computing";s:6:"exists";s:0:"";}i:45;a:3:{s:2:"ns";i:0;s:1:"*";s:18:"Distributed memory";s:6:"exists";s:0:"";}i:46;a:3:{s:2:"ns";i:0;s:1:"*";s:25:"Distributed shared memory";s:6:"exists";s:0:"";}i:47;a:3:{s:2:"ns";i:0;s:1:"*";s:19:"Dryad (programming)";s:6:"exists";s:0:"";}i:48;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"EMC Corporation";s:6:"exists";s:0:"";}i:49;a:3:{s:2:"ns";i:0;s:1:"*";s:23:"Embarrassingly parallel";s:6:"exists";s:0:"";}i:50;a:3:{s:2:"ns";i:0;s:1:"*";s:20:"Explicit parallelism";s:6:"exists";s:0:"";}i:51;a:3:{s:2:"ns";i:0;s:1:"*";s:24:"Fiber (computer science)";s:6:"exists";s:0:"";}i:52;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Flynn's taxonomy";s:6:"exists";s:0:"";}i:53;a:3:{s:2:"ns";i:0;s:1:"*";s:36:"Free On-line Dictionary of Computing";s:6:"exists";s:0:"";}i:54;a:3:{s:2:"ns";i:0;s:1:"*";s:30:"GNU Free Documentation License";s:6:"exists";s:0:"";}i:55;a:3:{s:2:"ns";i:0;s:1:"*";s:17:"Garbage collector";s:6:"exists";s:0:"";}i:56;a:3:{s:2:"ns";i:0;s:1:"*";s:13:"Global Arrays";s:6:"exists";s:0:"";}i:57;a:3:{s:2:"ns";i:0;s:1:"*";s:14:"Grid computing";s:6:"exists";s:0:"";}i:58;a:3:{s:2:"ns";i:0;s:1:"*";s:11:"Groupe Bull";s:6:"exists";s:0:"";}i:59;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Gustafson's law";s:6:"exists";s:0:"";}i:60;a:3:{s:2:"ns";i:0;s:1:"*";s:2:"HP";s:6:"exists";s:0:"";}i:61;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Hewlett-Packard";s:6:"exists";s:0:"";}i:62;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Hewlett Packard";s:6:"exists";s:0:"";}i:63;a:3:{s:2:"ns";i:0;s:1:"*";s:26:"High-performance computing";s:6:"exists";s:0:"";}i:64;a:3:{s:2:"ns";i:0;s:1:"*";s:13:"HiperDispatch";s:6:"exists";s:0:"";}i:65;a:3:{s:2:"ns";i:0;s:1:"*";s:9:"Honeywell";s:6:"exists";s:0:"";}i:66;a:3:{s:2:"ns";i:0;s:1:"*";s:9:"Huge page";s:6:"exists";s:0:"";}i:67;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Hyper-threading";s:6:"exists";s:0:"";}i:68;a:3:{s:2:"ns";i:0;s:1:"*";s:14:"HyperTransport";s:6:"exists";s:0:"";}i:69;a:3:{s:2:"ns";i:0;s:1:"*";s:3:"IBM";s:6:"exists";s:0:"";}i:70;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"IEEE";s:6:"exists";s:0:"";}i:71;a:3:{s:2:"ns";i:0;s:1:"*";s:20:"Implicit parallelism";s:6:"exists";s:0:"";}i:72;a:3:{s:2:"ns";i:0;s:1:"*";s:29:"Instruction-level parallelism";s:6:"exists";s:0:"";}i:73;a:3:{s:2:"ns";i:0;s:1:"*";s:18:"Instruction window";s:6:"exists";s:0:"";}i:74;a:3:{s:2:"ns";i:0;s:1:"*";s:17:"Intel Corporation";s:6:"exists";s:0:"";}i:75;a:3:{s:2:"ns";i:0;s:1:"*";s:28:"Intel QuickPath Interconnect";s:6:"exists";s:0:"";}i:76;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"Itanium";s:6:"exists";s:0:"";}i:77;a:3:{s:2:"ns";i:0;s:1:"*";s:6:"Java 7";s:6:"exists";s:0:"";}i:78;a:3:{s:2:"ns";i:0;s:1:"*";s:19:"Karp–Flatt metric";s:6:"exists";s:0:"";}i:79;a:3:{s:2:"ns";i:0;s:1:"*";s:8:"Kilobyte";s:6:"exists";s:0:"";}i:80;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"LWN.net";s:6:"exists";s:0:"";}i:81;a:3:{s:2:"ns";i:0;s:1:"*";s:12:"Linux kernel";s:6:"exists";s:0:"";}i:82;a:3:{s:2:"ns";i:0;s:1:"*";s:21:"Locality of reference";s:6:"exists";s:0:"";}i:83;a:3:{s:2:"ns";i:0;s:1:"*";s:14:"MESIF protocol";s:6:"exists";s:0:"";}i:84;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"MIMD";s:6:"exists";s:0:"";}i:85;a:3:{s:2:"ns";i:0;s:1:"*";s:17:"MIPS architecture";s:6:"exists";s:0:"";}i:86;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"MISD";s:6:"exists";s:0:"";}i:87;a:3:{s:2:"ns";i:0;s:1:"*";s:30:"Massively parallel (computing)";s:6:"exists";s:0:"";}i:88;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Memory coherence";s:6:"exists";s:0:"";}i:89;a:3:{s:2:"ns";i:0;s:1:"*";s:25:"Message Passing Interface";s:6:"exists";s:0:"";}i:90;a:3:{s:2:"ns";i:0;s:1:"*";s:9:"Microsoft";s:6:"exists";s:0:"";}i:91;a:3:{s:2:"ns";i:0;s:1:"*";s:33:"Multi-channel memory architecture";s:6:"exists";s:0:"";}i:92;a:3:{s:2:"ns";i:0;s:1:"*";s:15:"Multiprocessing";s:6:"exists";s:0:"";}i:93;a:3:{s:2:"ns";i:0;s:1:"*";s:14:"Multiprocessor";s:6:"exists";s:0:"";}i:94;a:3:{s:2:"ns";i:0;s:1:"*";s:38:"Multithreading (computer architecture)";s:6:"exists";s:0:"";}i:95;a:3:{s:2:"ns";i:0;s:1:"*";s:27:"Nehalem (microarchitecture)";s:6:"exists";s:0:"";}i:96;a:3:{s:2:"ns";i:0;s:1:"*";s:22:"Non-blocking algorithm";s:6:"exists";s:0:"";}i:97;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"OpenACC";s:6:"exists";s:0:"";}i:98;a:3:{s:2:"ns";i:0;s:1:"*";s:6:"OpenCL";s:6:"exists";s:0:"";}i:99;a:3:{s:2:"ns";i:0;s:1:"*";s:8:"OpenHMPP";s:6:"exists";s:0:"";}i:100;a:3:{s:2:"ns";i:0;s:1:"*";s:6:"OpenMP";s:6:"exists";s:0:"";}i:101;a:3:{s:2:"ns";i:0;s:1:"*";s:11:"OpenSolaris";s:6:"exists";s:0:"";}i:102;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Operating system";s:6:"exists";s:0:"";}i:103;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"Opteron";s:6:"exists";s:0:"";}i:104;a:3:{s:2:"ns";i:0;s:1:"*";s:13:"POSIX Threads";s:6:"exists";s:0:"";}i:105;a:3:{s:2:"ns";i:0;s:1:"*";s:19:"Parallel Extensions";s:6:"exists";s:0:"";}i:106;a:3:{s:2:"ns";i:0;s:1:"*";s:13:"Parallel LINQ";s:6:"exists";s:0:"";}i:107;a:3:{s:2:"ns";i:0;s:1:"*";s:24:"Parallel Virtual Machine";s:6:"exists";s:0:"";}i:108;a:3:{s:2:"ns";i:0;s:1:"*";s:18:"Parallel computing";s:6:"exists";s:0:"";}i:109;a:3:{s:2:"ns";i:0;s:1:"*";s:26:"Parallel programming model";s:6:"exists";s:0:"";}i:110;a:3:{s:2:"ns";i:0;s:1:"*";s:30:"Parallel random-access machine";s:6:"exists";s:0:"";}i:111;a:3:{s:2:"ns";i:0;s:1:"*";s:17:"Parallel slowdown";s:6:"exists";s:0:"";}i:112;a:3:{s:2:"ns";i:0;s:1:"*";s:19:"Process (computing)";s:6:"exists";s:0:"";}i:113;a:3:{s:2:"ns";i:0;s:1:"*";s:14:"Race condition";s:6:"exists";s:0:"";}i:114;a:3:{s:2:"ns";i:0;s:1:"*";s:19:"Resource starvation";s:6:"exists";s:0:"";}i:115;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"SIMD";s:6:"exists";s:0:"";}i:116;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"SISD";s:6:"exists";s:0:"";}i:117;a:3:{s:2:"ns";i:0;s:1:"*";s:4:"SPMD";s:6:"exists";s:0:"";}i:118;a:3:{s:2:"ns";i:0;s:1:"*";s:11:"Scalability";s:6:"exists";s:0:"";}i:119;a:3:{s:2:"ns";i:0;s:1:"*";s:27:"Scalable Coherent Interface";s:6:"exists";s:0:"";}i:120;a:3:{s:2:"ns";i:0;s:1:"*";s:17:"Scratchpad memory";s:6:"exists";s:0:"";}i:121;a:3:{s:2:"ns";i:0;s:1:"*";s:20:"Semiconductor memory";s:6:"exists";s:0:"";}i:122;a:3:{s:2:"ns";i:0;s:1:"*";s:24:"Sequent Computer Systems";s:6:"exists";s:0:"";}i:123;a:3:{s:2:"ns";i:0;s:1:"*";s:18:"Server (computing)";s:6:"exists";s:0:"";}i:124;a:3:{s:2:"ns";i:0;s:1:"*";s:13:"Shared memory";s:6:"exists";s:0:"";}i:125;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Silicon Graphics";s:6:"exists";s:0:"";}i:126;a:3:{s:2:"ns";i:0;s:1:"*";s:30:"Silicon Graphics International";s:6:"exists";s:0:"";}i:127;a:3:{s:2:"ns";i:0;s:1:"*";s:27:"Simultaneous multithreading";s:6:"exists";s:0:"";}i:128;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Software lockout";s:6:"exists";s:0:"";}i:129;a:3:{s:2:"ns";i:0;s:1:"*";s:7:"Speedup";s:6:"exists";s:0:"";}i:130;a:3:{s:2:"ns";i:0;s:1:"*";s:34:"Stanford Linear Accelerator Center";s:6:"exists";s:0:"";}i:131;a:3:{s:2:"ns";i:0;s:1:"*";s:13:"Supercomputer";s:6:"exists";s:0:"";}i:132;a:3:{s:2:"ns";i:0;s:1:"*";s:11:"Superscalar";s:6:"exists";s:0:"";}i:133;a:3:{s:2:"ns";i:0;s:1:"*";s:25:"Symmetric multiprocessing";s:6:"exists";s:0:"";}i:134;a:3:{s:2:"ns";i:0;s:1:"*";s:34:"Synchronization (computer science)";s:6:"exists";s:0:"";}i:135;a:3:{s:2:"ns";i:0;s:1:"*";s:6:"Sysctl";s:6:"exists";s:0:"";}i:136;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Task parallelism";s:6:"exists";s:0:"";}i:137;a:3:{s:2:"ns";i:0;s:1:"*";s:23:"Temporal multithreading";s:6:"exists";s:0:"";}i:138;a:3:{s:2:"ns";i:0;s:1:"*";s:18:"Thread (computing)";s:6:"exists";s:0:"";}i:139;a:3:{s:2:"ns";i:0;s:1:"*";s:25:"Threading Building Blocks";s:6:"exists";s:0:"";}i:140;a:3:{s:2:"ns";i:0;s:1:"*";s:19:"Tukwila (processor)";s:6:"exists";s:0:"";}i:141;a:3:{s:2:"ns";i:0;s:1:"*";s:18:"Unified Parallel C";s:6:"exists";s:0:"";}i:142;a:3:{s:2:"ns";i:0;s:1:"*";s:21:"Uniform memory access";s:6:"exists";s:0:"";}i:143;a:3:{s:2:"ns";i:0;s:1:"*";s:6:"Unisys";s:6:"exists";s:0:"";}i:144;a:3:{s:2:"ns";i:0;s:1:"*";s:9:"Unix-like";s:6:"exists";s:0:"";}i:145;a:3:{s:2:"ns";i:0;s:1:"*";s:16:"Vector processor";s:6:"exists";s:0:"";}i:146;a:3:{s:2:"ns";i:0;s:1:"*";s:14:"Virtual memory";s:6:"exists";s:0:"";}i:147;a:3:{s:2:"ns";i:0;s:1:"*";s:24:"Von Neumann architecture";s:6:"exists";s:0:"";}i:148;a:3:{s:2:"ns";i:0;s:1:"*";s:9:"Windows 7";s:6:"exists";s:0:"";}i:149;a:3:{s:2:"ns";i:0;s:1:"*";s:10:"Windows NT";s:6:"exists";s:0:"";}i:150;a:3:{s:2:"ns";i:0;s:1:"*";s:22:"Windows Server 2008 R2";s:6:"exists";s:0:"";}i:151;a:3:{s:2:"ns";i:14;s:1:"*";s:27:"Category:Parallel computing";s:6:"exists";s:0:"";}}s:9:"templates";a:25:{i:0;a:3:{s:2:"ns";i:10;s:1:"*";s:15:"Template:Anchor";s:6:"exists";s:0:"";}i:1;a:3:{s:2:"ns";i:10;s:1:"*";s:16:"Template:Div col";s:6:"exists";s:0:"";}i:2;a:3:{s:2:"ns";i:10;s:1:"*";s:21:"Template:Column-width";s:6:"exists";s:0:"";}i:3;a:3:{s:2:"ns";i:10;s:1:"*";s:20:"Template:Div col end";s:6:"exists";s:0:"";}i:4;a:3:{s:2:"ns";i:10;s:1:"*";s:16:"Template:Reflist";s:6:"exists";s:0:"";}i:5;a:3:{s:2:"ns";i:10;s:1:"*";s:17:"Template:Cite web";s:6:"exists";s:0:"";}i:6;a:3:{s:2:"ns";i:10;s:1:"*";s:15:"Template:FOLDOC";s:6:"exists";s:0:"";}i:7;a:3:{s:2:"ns";i:10;s:1:"*";s:27:"Template:Parallel Computing";s:6:"exists";s:0:"";}i:8;a:3:{s:2:"ns";i:10;s:1:"*";s:27:"Template:Parallel computing";s:6:"exists";s:0:"";}i:9;a:3:{s:2:"ns";i:10;s:1:"*";s:15:"Template:Navbox";s:6:"exists";s:0:"";}i:10;a:3:{s:2:"ns";i:10;s:1:"*";s:24:"Template:Category-inline";s:6:"exists";s:0:"";}i:11;a:3:{s:2:"ns";i:10;s:1:"*";s:13:"Template:Icon";s:6:"exists";s:0:"";}i:12;a:3:{s:2:"ns";i:10;s:1:"*";s:26:"Template:Commonscat-inline";s:6:"exists";s:0:"";}i:13;a:3:{s:2:"ns";i:10;s:1:"*";s:32:"Template:Commons category-inline";s:6:"exists";s:0:"";}i:14;a:3:{s:2:"ns";i:10;s:1:"*";s:22:"Template:Sister-inline";s:6:"exists";s:0:"";}i:15;a:3:{s:2:"ns";i:828;s:1:"*";s:13:"Module:Anchor";s:6:"exists";s:0:"";}i:16;a:3:{s:2:"ns";i:828;s:1:"*";s:16:"Module:Arguments";s:6:"exists";s:0:"";}i:17;a:3:{s:2:"ns";i:828;s:1:"*";s:17:"Module:TableTools";s:6:"exists";s:0:"";}i:18;a:3:{s:2:"ns";i:828;s:1:"*";s:19:"Module:Citation/CS1";s:6:"exists";s:0:"";}i:19;a:3:{s:2:"ns";i:828;s:1:"*";s:33:"Module:Citation/CS1/Configuration";s:6:"exists";s:0:"";}i:20;a:3:{s:2:"ns";i:828;s:1:"*";s:29:"Module:Citation/CS1/Whitelist";s:6:"exists";s:0:"";}i:21;a:3:{s:2:"ns";i:828;s:1:"*";s:35:"Module:Citation/CS1/Date validation";s:6:"exists";s:0:"";}i:22;a:3:{s:2:"ns";i:828;s:1:"*";s:13:"Module:Navbox";s:6:"exists";s:0:"";}i:23;a:3:{s:2:"ns";i:828;s:1:"*";s:18:"Module:HtmlBuilder";s:6:"exists";s:0:"";}i:24;a:3:{s:2:"ns";i:828;s:1:"*";s:13:"Module:Navbar";s:6:"exists";s:0:"";}}s:6:"images";a:4:{i:0;s:8:"NUMA.svg";i:1;s:9:"Hwloc.png";i:2;s:25:"Folder_Hexagonal_Icon.svg";i:3;s:16:"Commons-logo.svg";}s:13:"externallinks";a:29:{i:0;s:52:"http://cs.nyu.edu/~lerner/spring10/projects/NUMA.pdf";i:1;s:73:"https://www.usenix.org/legacy/event/atc11/tech/final_files/Blagodurov.pdf";i:2;s:58:"http://people.inf.ethz.ch/zmajo/publications/11-systor.pdf";i:3;s:98:"http://web.archive.org/web/20110929024052/http://www.kingston.com/newtech/MKF_520DDRwhitepaper.pdf";i:4;s:56:"http://www.kingston.com/newtech/MKF_520DDRwhitepaper.pdf";i:5;s:84:"http://www.slideshare.net/networksguy/ccnuma-cache-coherent-nonuniform-memory-access";i:6;s:76:"http://www.cs.berkeley.edu/~kubitron/cs258/handouts/papers/p80-stenstrom.pdf";i:7;s:62:"http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-5656.pdf";i:8;s:45:"http://www.numascale.com/numa_technology.html";i:9;s:78:"http://msdn.microsoft.com/en-us/library/windows/desktop/aa363804(v=vs.85).aspx";i:10;s:93:"http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#numa";i:11;s:32:"http://lse.sourceforge.net/numa/";i:12;s:80:"http://kernelnewbies.org/Linux_3.8#head-c16d4288b51f0b50fbf615657e81b0db643fa7a0";i:13;s:32:"https://lwn.net/Articles/524977/";i:14;s:81:"http://kernelnewbies.org/Linux_3.13#head-d29c7db2e73bc464eb67ed8de953d0bfc9841636";i:15;s:58:"https://www.kernel.org/doc/Documentation/sysctl/kernel.txt";i:16;s:32:"https://lwn.net/Articles/568870/";i:17;s:73:"http://www.intel.com/pressroom/archive/reference/whitepaper_QuickPath.pdf";i:18;s:71:"http://www.intel.com/pressroom/archive/releases/2007/20070918corp_b.htm";i:19;s:36:"http://lse.sourceforge.net/numa/faq/";i:20;s:54:"http://cs.gmu.edu/cne/modules/dsm/yellow/page_dsm.html";i:21;s:57:"http://www.opensolaris.org/os/community/performance/numa/";i:22;s:58:"http://h18002.www1.hp.com/alphaserver/nextgen/overview.wmv";i:23;s:31:"http://www.alphaprocessors.com/";i:24;s:46:"http://developer.amd.com/pages/1162007106.aspx";i:25;s:33:"http://oss.sgi.com/projects/numa/";i:26;s:69:"http://www.realworldtech.com/page.cfm?NewsID=361&date=05-05-2006#361/";i:27;s:63:"http://www.realworldtech.com/page.cfm?ArticleID=RWT082807020032";i:28;s:80:"http://www.sql-server-performance.com/articles/per/high_call_volume_NUMA_p1.aspx";}s:8:"sections";a:7:{i:0;a:8:{s:8:"toclevel";i:1;s:5:"level";s:1:"2";s:4:"line";s:13:"Basic concept";s:6:"number";s:1:"1";s:5:"index";s:1:"1";s:9:"fromtitle";s:25:"Non-uniform_memory_access";s:10:"byteoffset";i:1733;s:6:"anchor";s:13:"Basic_concept";}i:1;a:8:{s:8:"toclevel";i:1;s:5:"level";s:1:"2";s:4:"line";s:41:"
Do you want to build a website?
Start Here

Our Guidelines:

  • Reliability
  • Professionalism
  • Innovation