"Good" hardware is integral for a data warehouse and its software to function efficiently, and the architect of the warehouse must be "hardware aware". As each hardware and software technology advances, so do data warehouses with the advent of, for example, new nonvolatile memory (NVM) and high-speed networks for base support. This article focuses on the need to develop and adopt new management and analysis methods.
Research Status
The new hardware will change the traditional computing, storage, and network systems and put a directly impact on the architecture and design scheme of the data management and analysis systems. It will also pose challenges on the core functionalities and related key technologies including indexing, analysis, and transaction processing. In the next part, we will introduce the present state of domestic and international relevant research.
System Architecture and Design Schemes in Platforms with New Hardware
The advent of high-performance processors and new accelerators has led to the shift from single-CPU architectures systems to heterogeneous, hybrid processing architectures. Data processing strategies and optimization techniques have evolved from standardization to customization and from software optimization to hardware optimization. From the classical iterative pipeline processing model to the column processing model and finally to the vector processing model optimization that is the combination of the former two, the traditional software-based data processing model has reached a mature stage. Though, the advent of JIT real-time compilation techniques and optimization techniques combined with vector processing models provides new optimization space at the register level. However, with the deepening and development of research, software-based optimization techniques are gradually touching their "ceiling". Academics and industry are beginning to explore some new ways to accelerate the performance of data processing through software/hardware co-design. Instruction-level optimization, coprocessor query optimization, hardware customization, workload hardware migration, increasing hardware-level parallelism, hardware-level operators, and so on are used to provide hardware-level performance optimization. However, the differences between the new processor and x86 processor fundamentally change the assumptions of traditional database software design on hardware. Databases are facing with more complicated architectural issues on heterogeneous computing platforms. In the future, the researcher needs to break the software-centric design idea which is effective for the traditional database systems.
The design of traditional databases has to trade-off among a number of important factors such as latency, storage capacity, cost effectiveness, and the choice between volatile and nonvolatile storage devices. The unique properties of NVM bring the opportunity to the development of data management systems, but it also introduces some new constraints. The literature conducted a forward-looking research and exploration in this area. The results show that neither the disk-oriented systems nor the memory-oriented systems are ideally suited for NVM-based storage hierarchy, especially when the skew in the workload is high. The authors also found that storage I/O is no longer the main performance bottleneck in the NVM storage environment; instead, a significant amount of cost on how to organize, update, and replace data will become new performance bottlenecks. For NVM-based storage hierarchy, WAL (Write-Ahead Logging) and logical logs that are common in the traditional database also have a large variety of unnecessary operations. These issues indicate that diversified NVM-based storage hierarchy will lead to the new requirements for cache replacement, data distribution, data migration, metadata management, query execution plan, fault recovery, and other aspects to explore corresponding design strategies to better adapt to new environments. Therefore, NVM-specific or NVM-oriented architecture designed to utilize the nonvolatile property of NVM is necessary. The research about this area is just beginning; CMU's N-Store presents exploratory research on how to build prototype database system on NVM-based storage hierarchy.
RDMA-enabled network is changing the assumption in traditional distributed data management systems in which network I/O is the primary performance bottleneck. Some systems have introduced RDMA inside, but they just carry out some add-on optimizations for RDMA later; the original architecture is obsolete. As a result, they cannot take full advantage of the opportunities presented by RDMA. It has been demonstrated that migrating a legacy system to an RDMA-enabled network simply cannot fully exploit the benefits of the high-performance network; neither the shared-nothing architecture nor the distributed shared-memory architecture can bring out the full potential of RDMA. For a shared-nothing architecture, the optimization goal is to maximize data localization, but the static partitioning technique or dynamic partitioning strategy does not fundamentally resolve the problem of frequent network communication. Even with IPoIB (IP over Infiniband) support, for shared-nothing architecture, it is difficult to gain the most improvement on data-flow model and control-flow model simultaneously. Similarly, for distributed shared-memory architectures, there is no built-in support for cache coherency; accessing cache via RDMA could have significant performance side effects if mismanaged from client. At the same time, garbage collection and cache management might also be affected by it. In the RDMA network, the uniform abstraction for remote and local memory has proven to be inefficient. Research shows that the emerging RDMA-enabled high-performance network technologies necessitate a fundamental redesign of the way we build distributed data management system in the future.
Storage and Indexing Techniques in Platforms with New Hardware
Since NVM can serve as both internal and external memory, the boundaries between the original internal and external memory are obscured, making the composition of the underlying NVM storage diverse. Because different NVMs have their own features on access delay, durability, etc., it is theoretically possible to replace traditional storage medium without changing the storage hierarchy or mix with them. In addition, NVMs also can enrich the storage hierarchies as a LLC or as a new cache layer between RAM and HDD, which further reduce read/write latency across storage tiers. The variety of the storage environment also places considerable complexity in implementing data management and analysis techniques.
From data management perspective, how to integrate NVM into the I/O stack is a very important research topic. There are two typical ways to abstract NVM, as persistence heap or as file system. Because NVM can exchange data directly with the processor using a memory bus or a dedicated bus, memory objects that do not need to be serialized into disk can be directly created with heap. Typical research works include NV-heap, Mnemosyne, and HEAPO. In addition, the NVM memory environment also brings a series of new problems to be solved, such as ordering, atomic operations, and consistency guarantee. In contrast, file-based NVM abstraction can take advantage of the semantics of existing file systems in namespaces, access control, read–write protection, and so on. But, in the design, in addition to take full advantage of the fine-grained addressing and in-place update capability of NVM, the impact of frequent writes on NVM lifetime needs to be considered. Besides, file abstraction has a long data access path which implies some unnecessary software overhead. PRAMFS, BPFS, and SIMFS are all typical NVM-oriented file systems. Whether abstracting the NVM in the way of persisting memory heap or file system can become the basic building block for the upper data processing technology. However, they can only provide low-level assurance on atomicity. Some of the high-level features (such as transaction semantics, nonvolatile data structures, etc.) also require corresponding change and improvement in upper data management system.
Different performance characteristics of NVMs will affect the data access and processing strategies of the heterogeneous processor architecture. On the contrary, the computing characteristics of the processors also affect the storage policy. For example, under the coupled CPU/GPU processor architecture, the data distribution and the exchange should be designed according to the characteristics of low-latency CPU data access and the large-granularity GPU data processing. In addition, if NVM can be used to add a large-capacity, low-cost, high-performance storage tier under traditional DRAM, hardware accelerators such as GPU can access NVM storage directly across the memory through dedicated data channel, reducing the data transfer cost in traditional storage hierarchies. We need to realize that hybrid storage and heterogeneous computing architecture will exist for a long time. The ideal technical route in the future is to divide the data processing into different processing stages according to the type of workload and to concentrate the computing on the smaller data set to achieve the goal of accelerating critical workloads through hardware accelerators. It is the ideal state to concentrate 80% of the computational load on 20% of the data, which simplifies data distribution and computing distribution strategies across hybrid storage and heterogeneous computing platforms.
In addition to improving data management and analytics support at the storage level, indexes are also key technologies for efficiently organizing data to accelerate the performance of upper-level data analytics. Traditional local indexing and optimization techniques based on B+ tree, R-tree, or KD-tree are designed for the block-level storage. Due to the significant difference between NVM and disk, the effectiveness of the existing indexes can be severely affected in NVM storage environments. For NVM, the large number of writes caused by index updates not only reduces their lifespan but also degrades their performance. To reduce frequent updates and writes of small amounts of data, merge updates or late updates, which are frequently used on flash indexes, are typical approaches. Future indexing technologies for NVM should be more effective in controlling the read and write paths and impact areas that result from index updates, as well as enabling layered indexing techniques for the NVM storage hierarchies. At the same time, concurrency control on indexes such as B+ tree also shows obvious scalability bottlenecks in highly concurrent heterogeneous computing architectures. The coarse-grained range locks of traditional index structures and the critical regions corresponding to latches are the main reasons for limiting the degree of concurrency. Some levels of optimization techniques such as multi-granularity locking and latch avoidance increase the degree of concurrency of indexed access updates, but they also unavoidably introduce the issue of consistency verification, increased transaction failure rates, and higher overhead. In the future, the indexing technology for highly concurrent heterogeneous computing infrastructure needs a more lightweight and flexible lock protocol to balance the consistency, maximum concurrency, and lock protocol simplicity.
Query Processing and Optimization in Platforms with New Hardware
The basic assumption of the traditional query algorithms and data structures on the underlying storage environment does not stand in NVM storage environment. Therefore, the traditional algorithms and data structures are difficult to obtain the ideal effect in the NVM storage environment.
Reducing NVM-oriented writes is a major strategy in previous studies. A number of technologies, which include unnecessary write avoiding, write cancelation and write pausing strategies, dead write prediction, cache coherence enabled refresh policy, and PCM-aware swap algorithm, are used to optimize the NVM writes. With these underlying optimizations for drivers, FTL, and memory controller, the algorithms can directly benefit, but algorithms can also be optimized from a higher level. In this level, there are two ways to control or reduce NVM writes: One is to take advantage of extra caches to mitigate NVM write requests with the help of DRAM; the other is to utilize the low-overhead NVM reads and in-time calculations to waive the costly NVM writes. For further reducing NVM writes, even parts of constraints on data structures or algorithms can be appropriately relaxed. In the future, how to design, organize, and operate write-limited algorithms and data structures is an urgent question. However, it is important to note that with NVM asymmetric read/write costs, the major performance bottlenecks have shifted from the ratio of sequential and random disk I/O to the ratio of NVM read and write. As a result, previous cost models inevitably fail to characterize access patterns of NVM accurately. Furthermore, heterogeneous computing architectures and hybrid storage hierarchy will further complicate the cost estimation in new hardware environments. Therefore, how to ensure the validity and correctness of the cost model under the new hardware environment is also a challenging issue. In the NVM storage environment, the basic design principle for NVM-oriented algorithms and data structures is to reduce the NVM write operations as much as possible. Put another way, write-limited (NVM-aware, NVM-friendly) algorithms and data structures are the possible strategy.
From the point of view of processor development, query optimization technologies have gone through several different stages with the evolution of hardware. During different development stages, there are significant differences in query optimization goals, from mitigating disk-oriented I/O to designing cache-conscious data structures and access methods and developing efficient parallel algorithm. Nowadays, the processor technology moves from multi-core to many-core which greatly differs from the multi-core processor in terms of core integration, number of threads, cache structure, and memory access. The object that should be optimized has been turned into SIMD, GPUs, APUs, Xeon Phi coprocessors, and FPGAs. The query optimization is becoming more and more dependent on the underlying hardware. But current query optimization techniques for new hardware are in an awkward position: lacking the holistic consideration for evolving hardware, the algorithm requires constant changes to accommodate different hardware features. From a perspective of the overall architecture, the difficulty of query optimization is further increased under new hardware architecture.
In optimization techniques for join algorithm, a hot research topic in recent years is to explore whether hardware-conscious or hardware-oblivious algorithm designs are the best choices for new hardware environments. The goal of hardware-conscious algorithms is the pursuit of the highest performance, whose guiding ideology is to fully consider the hardware-specific characteristics to optimize join algorithms; instead, the goal of hardware-oblivious algorithms is the pursuit of generalizing and simplify, whose guiding principle is to design the join algorithm based on the common characteristics of the hardware. The competition between the two technology routes has intensified in recent years, from the basic CPU platform to the NUMA platform, and it will certainly be extended to the new processor platforms in the future. The underlying reason behind this phenomenon is that it is difficult to quantify the optimization techniques, as in the field of in-memory database technology, although there are numerous hash structures and join algorithms currently, the simple question of which is the best in-memory hash join algorithm is still unknown. In the future, when the new hardware environment is more and more complicated, performance should not be the only indicator to evaluate the advantages and disadvantages of algorithm. More attention should be paid to improving the adaptability and scalability of algorithms on heterogeneous platforms.
Transaction Processing in Platforms with New Hardware
Recovery and concurrency control are the core functions of transaction processing in DBMS. They are closely related to the underlying storage and computing environment. Distributed transaction processing is also closely related to the network environment.
WAL-based recovery methods can be significantly affected in NVM environments. First, because the data written on the NVM are persistent, transactions are not forced to be stored to disk when submitted, and the design rules for flush-before-commit in the WAL are broken. Moreover, because of NVM high-speed random read/write capabilities, the advantages of the cache turn into disadvantages. In extreme cases, transactional update data can also have significant redundancies in different locations (log buffers, swap areas, disks). The NVM environment not only has an impact on the assumptions and strategies of WAL, but also brings some new issues. The way to ensure the atomic NVM write operation is the most fundamental problem. Through some hardware-level primitives and optimization on processor cache, this problem can be guaranteed partially. In addition, due to the effect of out-of-order optimization in modern processor environments, there is a new demand in serializing the data written into NVM to ensure the order of log records. Memory barriers became the main solution, but the read/write latencies caused by memory barriers in turn degrade the transactional throughput based on WAL. Thus, the introduction of NVM changes the assumptions in the log design, which will inevitably introduce new technical issues.
There is a tight coupling between the log technology and NVM environment. In the case of directly replacing the external storage by NVM, the goal of log technology optimization is to reduce the software side effects incurred by ARIES-style logs. In the NVM memory environment, it can be further subdivided into different levels, including hybrid DRAM/NVM and NVM only. Currently there are many different optimization technologies on NVM-oriented logging, including two-layer logging architecture, log redundancies elimination, cost-effective approaches, decentralized logging and others. In some sense, the existing logging technologies for NVM are actually stop-gap solutions. For future NVM-based systems, the ultimate solution is that to develop logging technology on pure-NVM environment, where the entire storage system, including the multi-level cache on processor, will consist of NVM. Current research generally lacks attention on the durability of NVM; thus, finding a more appropriate trade-off among high-speed I/O, nonvolatility, and poor write tolerance is the focus of future NVM logging research.
Concurrent control effectively protects the isolation property of transactions. From the execution logic of the upper layers of different concurrency control protocols, it seems that the underlying storage is isolated and transparent. But in essence, the specific implementation of concurrency control and its overhead ratio in system are closely related to the underlying storage environment. Under the traditional two-tier storage hierarchy, the lock manager for concurrency control in memory is almost negligible, because the disk I/O bottlenecks are predominant. But in NVM storage environment, with the decrease in the cost of disk I/O, the memory overhead incurred by lock manager becomes a new bottleneck. In addition, with the multi-core processors, the contradiction between the high parallelism brought by rich hardware context and the consistency of data maintenance will further aggravate the complexity of lock concurrency control. The traditional blocking strategies, such as blocking synchronization and busy-waiting, are difficult to apply. To reduce the overhead of concurrency control, it is necessary to control the lock manager's resource competition. The research mainly focuses on reducing the number of locks with three main approaches, including latch-free data structures, lightweight concurrency primitives, and distributed lock manager. In addition, for MVCC, there is a tight location coupling between the index and the multi-version records in the physical storage. This will result in serious performance degradation when updating the index. In a hybrid NVM environment, an intermediate layer constructed by low-latency NVM can be used to decouple the relationship between the physical representation and the logical representation of the multi-version log. This study that builds a new storage layer to ease the bottleneck of reading and writing is worth learning.
The approach to improving the extensibility of distributed transactions has always been the central question in building distributed transactional systems. On the basis of a large number of previous studies, the researchers have already formed a basic consensus that it is difficult to guarantee the scalability of systems with a large number of distributed transactions. Therefore, research focuses on how to avoid distributed transactions and to control and optimize the proportion of distributed transactions. Most of these technologies are not transparent to application developers and need to be carefully controlled or preprocessed at the application layer. On the other hand, a large number of studies are also exploring how to deregulate strict transactions semantic. Under this background, the paradigm of data management is also shifting from SQL, NoSQ to NewSQL. This development once again shows that, for a large number of critical applications, it is impossible to forego the transaction mechanism even with requirements of scalability. However, these requirements are hard to meet in the traditional network. In traditional network environments, limited bandwidth, high latency, and overhead make distributed transactions not scalable. With RDMA-enabled high-performance network, the previously unmanageable hardware limitations and software costs are expected to be fully mitigated. The RDMA-enabled network addresses the two most important difficulties encountered with traditional distributed transaction scaling: limited bandwidth and high CPU overhead in data transfers. Some RDMA-aware data management systems have also emerged, but such systems are primarily concerned with direct RDMA support at the storage tier, and transaction processing is not the focus. Other RDMA-aware systems focus on transactional processing, but they still adopt centralized managers that affect the scalability of distributed transactions. In addition, although some data management systems that fully adopt RDMA support distributed transactions, they have only have limited consistent isolation levels, such as serialization and snapshot isolation. Relevant research shows that the underlying architecture of data management should to be redesigned, such as the separation of storage and computing. Only in this way, it is possible to fully exploit all the advantages of RDMA-enabled networks to achieve fully scalable distributed transactions.
The new hardware and environment has many encouraging and superior features, but it is impossible to automatically enjoy the "free lunch" by simply migrating existing technologies onto the new platform. Traditional data management and analysis techniques are based on the x86 architecture, two-tier storage hierarchy, and TCP/IP-Ethernet. The huge differences from heterogeneous computing architectures, nonvolatile hybrid storage environments, and high-performance networking systems determine that the traditional design principles and rules of thumb are difficult to apply. In addition, in the new hardware environment, inefficient or useless components, and technologies in traditional data management and analysis systems also largely limit the efficiency of the hardware. Meanwhile, under the trend of diversified development of the hardware environment, there lacks the corresponding architectures and technologies. In the future, it is necessary to research with the overall system and core functions. In view of the coexistence of traditional hardware and new hardware as well as the common problems of extracting and abstracting differentiated hardware, the future research will be based on perception, customization, integration, adaptation, and even reconstruction in studying the appropriate architecture, components, strategies, and technologies, in order to release the computing and storage capacity brought by the new hardware.