Loading...
Thumbnail Image
Item

Novel Variations and Applications of Multi-Armed Bandit Learning

Sawwan, Abdalaziz
Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2025-08
Group
Department
Computer and Information Science
Research Projects
Organizational Units
Journal Issue
DOI
https://doi.org/10.34944/pm5k-ns15
Abstract
The past two decades have seen the Multi-Armed Bandit (MAB) paradigm rise from a stylised thought experiment to a central tool for data-driven decision making in areas as diverse as online advertising, edge computing, crowdsensing, and personalised recommendations. Yet real systems routinely violate the assumptions of the classical i.i.d. model: rewards may arrive after stochastic delays, actions often carry monetary costs or structural constraints, and complex side information, from spatial coordinates to graph embeddings, must be exploited efficiently. This dissertation, Novel Variations and Applications of Multi-Armed Bandit Learning, synthesises several novel pieces of research into a single narrative that extends the MAB framework along five orthogonal axes and demonstrates their utility in large-scale, real-world scenarios. The study begins by identifying the fundamental limitations of existing delayed-feedback and combinatorial bandit models. We then develop a sequence of increasingly expressive frameworks that (i) treat instantaneous & delayed returns as jointly observable random variables, (ii) embed budget constraints and hard deadlines directly into the learning loop, (iii) encourage diversity in crowdsensing allocations through adaptive reward shaping, (iv) capture Markovian state transitions and network-like dependencies, and (v) integrate graph neural network representations with contextual bandits for next-generation recommenders. Each chapter pairs rigorous theory, including tight upper bounds on cumulative regret, with domain-specific case studies to verify practical impact. Unified dual-return model: We introduce a stochastic framework in which every arm yields both a short-term and a long-term reward; three near-optimal UCB-based algorithms are proved to achieve $\tilde{O}(\sqrt{T})$ regret while allowing infinite delays. Random transformation factors: Extending the above, we allow the multiplicative link between returns to be a known-mean random variable, generalising fixed-scaling models and preserving regret optimality. Budget-constrained & deadline-driven bandits: We formulate the first delayed MAB that couples stochastic costs, finite budgets, and a hard termination round; the BD-UCB algorithm balances exploration with the probability that delayed rewards surface before expiry. Diversity-aware crowdsensing: Two CMAB variants dynamically down-weight repeatedly covered tasks, or equivalently up-weight novel ones, producing provably lower regret and up to 23% higher weighted coverage on real mobility traces. Markovian dependencies: We design an index policy that attains logarithmic regret when each arm evolves as a hidden three-state Markov chain, outperforming classical UCB by roughly 15% in cumulative regret on synthetic networks. Graph-aware recommendation: LightGCN embeddings feed a LinUCB decision module; stochastic-reward calibration boosts hit-rate and NDCG over pure GNN or pure bandit baselines on Amazon-Book and Gowalla datasets. Extensive experiments, which span synthetic benchmarks, cellular traffic videos, city-scale population data, and public recommendation corpora, validate the theoretical guarantees. The proposed algorithms consistently reduce average regret by 10–40% and maintain robust performance under non-stationary reward drift. These contributions advance bandit learning from isolated, myopic choices to holistic decision pipelines that honor economic, temporal, and structural constraints. This dissertation provides new regret bounds and algorithms. Furthermore, it bridges theory with demanding applications, like mobile crowdsensing, network security, and adaptive recommendation, in a way that lays a foundation for future work on safe and context-rich MABs.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos