Optimal bandwidth allocation for web crawler systems with time constraints

PDF / 2,582,001 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
0 Downloads / 345 Views

ORIGINAL RESEARCH

Optimal bandwidth allocation for web crawler systems with time constraints Weiping Zhu1 · Yaodong Li1 · Shu Li2 · Yi Xu3 · Xiaohui Cui4 Received: 2 November 2019 / Accepted: 21 July 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Web crawler is an important tool to obtain information from the Internet in a timely manner. In a typical web crawler system with limited bandwidth, many websites are crawled with different time constraints. Existing studies regarding web crawler systems do not consider the bandwidth allocation in such a complex environment; hence, the time constraints may not be satisfied. In this study, we investigate the bandwidth allocation approaches for such a web crawler system. The approaches are designed for two scenarios, i.e., when the number of websites exceeds or does not exceed the maximum number of web crawlers that the system can execute simultaneously. For the latter situation, we propose approaches to control the bandwidth for web crawlers to minimize the maximum complete time or minimize the sum of execution times of all web crawlers, considering assumptions of both sufficient and insufficient bandwidths. For the former situation, we propose a round-based reallocation approach to schedule both the sequence and bandwidth allocation of the web crawlers. Extensive simulations are conducted to validate the proposed approaches, and the results show that our approaches satisfy the time constraints well and achieve desirable execution performances in various scenarios. Keywords Bandwidth allocation · Web crawler · Time constraint · Optimization

1 Introduction

* Xiaohui Cui [email protected] Weiping Zhu [email protected] Yaodong Li [email protected] Shu Li whu‑[email protected] Yi Xu [email protected] 1

School of Computer Science, Wuhan University, Wuhan, People’s Republic of China

2

School of Mathematics and Statistics, Wuhan University, Wuhan, People’s Republic of China

3

Department of Mathematics, Southeast University, Nanjing, People’s Republic of China

4

School of Cyber Science and Engineering, Wuhan University, Wuhan, People’s Republic of China

In the last decade, the amount of data on the Internet has grown significantly (Ding and Wang 2018). The data contain a significant amount of useful information; however, it is difficult to obtain the information in a timely manner (Kumar et al. 2017). For example, when graduates seek jobs, they often browse several dozens of websites multiple times daily to obtain job-related information. Hence, they may read redundant or useless content and encounter difficulties in obtaining the latest information. In another example, people browsing the Internet may overlook information regarding food safety from multiple data sources, and this may affect their health. A web crawler is a program that can automatically download web pages from the Internet and extract the required information from them (Wang et al. 2018b; Thelwall 2001). A web crawler starts from one or several initial web pages, a

Data Loading...

Optimal bandwidth allocation for web crawler systems with time constraints

Recommend Documents

Optimal Nash Equilibria for Bandwidth Allocation

Web Crawler

Dynamic Bandwidth Allocation Based on Online Traffic Prediction for Real-Time MPEG-4 Video Streams

Bandwidth Allocation in Data Center Networks

Optimal Control Problems with Convex Control Constraints

Optimal power allocation for CRN-NOMA systems with adaptive transmit power

Numerical Methods for Optimal Control Problems with State Constraints

A Joint Bandwidth and Power Allocation Scheme for Heterogeneous Networks

Performance Evaluation of Weighted Fair Queuing Model for Bandwidth Allocation

Linear Optimal Estimation for Discrete-time and Continuous-time Systems with Multiple Measurement Delays

Optimal pilot symbol power allocation under time-variant channels

Efficient computation of optimal temporal walks under waiting-time constraints