How many bins should be put in a regularhistogram

Author: Birgé Lucien   Rozenholc Yves  

Publisher: Edp Sciences

E-ISSN: 1262-3318|10|issue|24-45

ISSN: 1292-8100

Source: ESAIM: Probability and Statistics, Vol.10, Iss.issue, 2006-01, pp. : 24-45

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Given an n-sample from some unknown density f on [0,1], it is easy to construct anhistogram of the data based on some given partition of [0,1], but not so much is knownabout an optimal choice of the partition, especially when the data set is not large, even ifone restricts to partitions into intervals of equal length. Existing methods are either rulesof thumbs or based on asymptotic considerations and often involve some smoothnessproperties of f. Our purpose in this paper is to give an automatic, easy to program andefficient method to choose the number of bins of the partition from the data. It is based on boundson the risk of penalized maximum likelihood estimators due to Castellan and heavy simulationswhich allowed us to optimize the form of the penalty function. These simulations show that themethod works quite well for sample sizes as small as 25.