Formalizing Adaptive Sampling Strategies in Database Management System Performance Modeling Using Transfer Learning
DOI:
https://doi.org/10.63125/kv173d65Keywords:
Transfer Learning, DBMS Tuning, Adaptive Sampling, Bhattacharyya Distance, Parameter Classification, Performance Modelling, Active Learning, Configuration OptimisationAbstract
Database Management Systems (DBMSs) sit at the heart of modern data-driven applications, exposing hundreds of tunable parameters whose interactions are notoriously difficult to model. Machine-learning–based performance prediction has emerged as a scalable alternative to manual tuning, but most published pipelines silently assume that the environment in which a model is trained matches the environment in which it is deployed. In production, that assumption almost never holds: developers train on virtualised testbeds, then deploy onto hardware with different core counts, memory bandwidth, storage characteristics and concurrency profiles. The resulting domain shift quietly degrades predictive accuracy and motivates transfer learning as a remedy. This paper proposes ChimeraTL, a formalised, feedback-driven adaptive sampling pipeline for DBMS performance modelling. Rather than treating all configuration parameters uniformly, ChimeraTL partitions them into two behaviourally distinct sets using the Bhattacharyya distance between their per-parameter source and target performance distributions. Parameters with low cross-domain divergence are designated linear and reused through an affine projection of source data; parameters with high divergence are designated non-linear and are explored fresh in the target domain under an active-sampling regime that consumes online error feedback. A confidence-driven gate triggers the transition from the linear phase to the non-linear phase, so computation is allocated proportionally to the difficulty of each parameter subset. An evaluation on PostgreSQL and MySQL using TPC-C and TPC-H workloads, across a four-core virtual machine (source) and a twelve-core SSD-backed server (target), shows that ChimeraTL reduces prediction mean-squared error by up to 54% and the number of target queries to reach 90% of final accuracy by approximately 63% relative to a no-transfer baseline. ChimeraTL also outperforms linear projection, hybrid sampling and Bayesian optimisation across all measured dimensions. The architecture is modular, containerisable and integrates with standard DBMS telemetry, making it a practical foundation for next-generation autonomous tuning frameworks.


