2023 journal article

Learning-Based Data Gathering for Information Freshness in UAV-Assisted IoT Networks

IEEE INTERNET OF THINGS JOURNAL, 10(3), 2557–2573.

co-author countries: China 🇨🇳 United States of America 🇺🇸
author keywords: Data collection; Internet of Things; Trajectory; Energy efficiency; Autonomous aerial vehicles; Trajectory planning; Wireless sensor networks; Age of Information (AoI); deep reinforcement learning (DRL); Internet of Things (IoT) network; sampling mode; unmanned aerial vehicle (UAV)
Source: Web Of Science
Added: March 13, 2023

Unmanned aerial vehicle (UAV) has been widely deployed in efficient data collection for Internet of Things (IoT) networks. Information freshness in data collection can be characterized by the Age of Information (AoI). It is highly challenging to schedule multiple energy-constrained UAVs to improve information freshness especially when the generation instants of sensing samples are unpredictable. To deal with this issue, we leverage state-of-art reinforcement learning (RL) methods to design flight trajectories of UAVs without knowing the sampling mode each sensor node (SN) adopts. Each SN can sample the environment at periodical or random intervals. Multiple energy-constrained UAVs are dispatched to collect update packets from the SNs when flying over them. The UAV trajectory planning problem for AoI minimization is formulated as a Markov decision process (MDP). The objective is to minimize the average AoI of the SNs under the constraints of energy capacity and collision avoidance for the UAVs. Then, we propose two learning algorithms based on the Sarsa and value-decomposition network (VDN), respectively, which allow the UAVs to fulfill data collection tasks requested by the SNs. By learning directly from the environment, the Sarsa-based algorithm can approach the optimal policy asymptotically when certain conditions are satisfied. As one of the most popular multiagent deep RL methods, the VDN-based algorithm enables each UAV to make its own decision independently on its flight and data collection based on the partially observed network information. Simulation results validate the effectiveness of the proposed two learning-based algorithms compared with baseline policies.