A Gaussian Copula Regression Model for Movie Box-office Revenue Prediction with Social Media
Previous work explored many kinds of features for the task of movie box-office prediction. However, little prior work has investigated the dependency relationships among these features. In this paper, we propose a novel Gaussian Copula regression model to
- PDF / 272,833 Bytes
- 10 Pages / 439.37 x 666.142 pts Page_size
- 8 Downloads / 198 Views
Abstract. Previous work explored many kinds of features for the task of movie box-office prediction. However, little prior work has investigated the dependency relationships among these features. In this paper, we propose a novel Gaussian Copula regression model to study the correlation among predictive features. In particular, we first extract structured movie metadata and user activities on social media as features. We then apply Gaussian kernel to smooth out the data and learn the covariance matrix among the marginal distributions by maximum likelihood. We propose to approximately infer the movie box-office revenue by exploiting the covariance matrix. Experimental results show that our proposed method outperforms the baseline methods in the first week revenue prediction task and can achieve comparable performance on the gross revenue prediction task with a state-of-the art baseline in gross revenue prediction task. Our model is robust under various experimental settings. Keywords: Copula regression
1
· Movie revenue · Social media
Introduction
Predicting the revenues of up-coming movies is of clear interest to investors, movie related producers and movie theaters. Traditional approaches exploit structured metadata of a movie, such as its genre, MPAA rating, number of screens, to predict its future market performance, which show that movie boxoffice revenues are predictable. However, this line of work suffers some limitations. On the one hand, they assumed that features used in the predication model are independent. Even though such restrictions make models easily scalable, they limit the expressiveness of models in some scenes. On the other hand, despite traditional models are good at capturing the linear and non-linear relationships, they still can not deal with arbitrary marginal distributions. Statistical analysis of historical movie market data show that movie relevant features follow different distributions [8]. For example, movie revenues are pareto law distributed, while number of theaters that a movie is shown follows a bimodal distribution. c Springer Science+Business Media Singapore 2015 X. Zhang et al. (Eds.): SMP 2015, CCIS 568, pp. 28–37, 2015. DOI: 10.1007/978-981-10-0080-5 3
A Gaussian Copula Regression Model
29
To address these problems, we investigate a Gaussian Copula regression model to learn dependency relationships among features to predict movie boxoffice revenues. Given a movie, we automatically predict its first week and gross revenue by using both movie metadata and user activities on social media as features. The Copula we use in the paper is a family of distribution functions which is commonly used in statistical and economical domain. Even though Copula was first introduced in 1959 [7], it is a rather new topic in natural language processing and machine learning domain. It is capable of modeling multi-variate distribution by decoupling the multi-variate distribution to corresponding marginal distributions and correlation matrix. To the best of our knowledge, we are among the first to investi
Data Loading...