Multi-Armed Bandit for Link Configuration in Millimeter-Wave Networks: An Approach for Solving Sequential Decision-Making Problems
Zhang, Y., & Heath Jr, R. W. (2023, February 7). IEEE VEHICULAR TECHNOLOGY MAGAZINE.
Establishing and maintaining millimeter-wave (mm-wave) links are challenging due to the changing environment and the high sensibility of mm-wave signals to user mobility and channel conditions. mm-Wave link configuration problems often involve a search for optimal system parameter(s) under environmental uncertainties from a finite set of alternatives that are supported by the system hardware and protocol. For example, beam sweeping aims at identifying the optimal beam(s) for data transmission from a discrete codebook. Selecting parameters such as the beam sweeping period and the beamwidth is crucial to achieving high overall system throughput. In this article, we motivate the use of the multi-armed bandit (MAB) framework to intelligently search out the optimal configuration when establishing the mm-wave links. MAB is a reinforcement learning framework that guides a decision maker to sequentially select one action from a set of actions. As an example, we show that within the MAB framework, the optimal beam sweeping period, beamwidth, and beam directions could be dynamically learned with sample-computational-efficient bandit algorithms. We conclude by highlighting some future research directions on enhancing mm-wave link configuration design with MAB.