Cassarà P., Colucci M., Gotta A., Tonellotto N. Joint modeling of arrival process and length distribution of queries in Web search engines. Technical report, 2016. |

Abstract (English) |
This paper proposes a novel fitting procedure via non-parametric kernel- based models of the probability mass function of a discrete arrival process, derived from real traffic traces of queries to a Web search engine. Most of the adopted estimation techniques for probability mass functions are based on parameter estimations for a given family of probability distri- bution functions. Conversely, the proposed procedure, jointly with a kernel-based model of the probability distribution function, doesn't need any assumptions about membership to a families of distributions, or about parameters. The fitting procedure based on the Generalized Cross-Entropy resolves a Quadratic Programming Problem. Furthermore, the estimated probability mass function can be expressed in a closed form, as a weighted sum of kernel functions. We also examine the performance of the proposed procedure via numer- ical experiments and present an example of traffic analysis with real data traffic. Results show that our estimation of the probability mass function, closely matches the empirical probability mass function. Precisely, through the procedure, both temporal and statistical characteristics, such as auto- correlation, long-range dependence, and skewness, can be well approximated. | |

Subject | Web Search Engine Batch Arrival Process Kernel-Based Probability Distribution Models Generalized Cross Entropy C.2 COMPUTER-COMMUNICATION NETWORKS |

1) Download Document PDF |

Open access Restricted Private