Predicting Table Tennis Tournaments: A comparison of statistical modelling techniques
Keywords:
tournament analysis, random forest, statistical learning, table tennis, LASSO RegressionAbstract
Every year, at least one of four important recurring table tennis tournaments takes place, where top players compete. Those tournaments are the World Table Tennis Championships, the Table Tennis World Cup, the Olympic Games and the ITTF World Tour. In other areas of sports, it is common to analyse major tournaments and predict future ones (see, e.g., Groll et al., 2018, for football). This work aims to bring this aspect of analysis to the world of table tennis by conducting recent holdings of the Men’s World Cup and the Grand Finals of the Men’s ITTF World Tour. There are two main goals: 1) to compare different modelling techniques on historic tournaments to find the model with the best predictive performance, and 2) to understand which factors are important for good predictions. The results show that it is indeed possible to apply statistical machine learning methods on table tennis tournaments for prediction with a correct classification rate of around 75% by a random forest and 74% by a penalized generalized linear logit model. Even though both models based their predictive power mainly on the official table tennis rankings and points, variables like age, playing hand or individual strength were important factors as well.
Downloads
References
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton, Florida: CRC Press.
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1), 1-3.
Brunner, S., & Groll, A. (2018). Modellierung und Vorhersage von Tennisspielen bei Grand Slam Turnieren. Dortmund
Ceriani Lidia, P. V. (2012). The origins of the Gini index: extracts from Variabilitá e Mutabilitá (1912) by Corrado Gini. The Journal of Economic Inequality, 10(3), 421-443. https://doi.org/10.1007/s10888-011-9188-x
Ekstrøm, C. T., Van Eetvelde, H., Ley, C., & Brefeld, U. (2021). Evaluating one-shot tournament predictions. Journal of Sports Analytics, 7(1), 37-46. https://doi.org/10.3233/JSA-200454
Fahrmeir, L., & Tutz, G. (2001). Multivariate Statistical Modelling Based on Generalized Linear Models (2nd ed.). New York: Springer.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1-22. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880/
Grand View Research. (2021). Sports Betting Market Size, Share & Trends Analysis Report By Platform (Online, Offline), By Type (Fixed Odds Wagering, eSports Betting), By Sports Type (Football, Basketball), By Region, And Segment Forecasts, 2021 - 2028. Grand View Research. https://www.grandviewresearch.com/industry-analysis/sports-betting-market-report
Groll, A., Heiner, J., Schauberger, G., & Uhrmeister, J. (2020). Prediction of the 2019 IHF World Men’s Handball Championship–A sparse Gaussian approximation model. Journal of Sports Analytics (Preprint), 6(3), 187-197. http://doi.org/10.3233/JSA-200384
Groll, A., Ley, C., Schauberger, G., & Van Eetvelde, H. (2019a). A hybrid random forest to predict soccer matches in international tournaments. Journal of quantitative analysis in sports, 15(4), 271-287. https://doi.org/10.1515/jqas-2018-0060
Groll, A., Ley, C., Schauberger, G., Van Eetvelde, H., & Zeileis, A. (2019b). Hybrid Machine Learning Forecasts for the FIFA Women's World Cup 2019. arXiv preprint arXiv:1906.01131. https://doi.org/10.48550/arXiv.1906.01131
Groll, A., Schauberger, G., & Tutz, G. (2015). Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: An application to the FIFA World Cup 2014. Journal of Quantitative Analysis in Sports, 11(2), 97-115. https://doi.org/10.1515/jqas-2014-0051
Gu, W., & Saaty, T. (2019). Predicting the Outcome of a Tennis Tournament: Based on Both Data and Judgments. Journal of Systems Science and Systems Engineering, 28, 317-343. https://doi.org/10.1007/s11518-018-5395-3
ITTF Archive. (2019). Retrieved from https://results.ittf.link/index.php?option=com_content&view=featured&Itemid=101
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22. https://cogns.northwestern.edu/cbmg/LiawAndWiener2002.pdf
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). New York: Chapman & Hall.
Peters, M., & Murphey, K. (1992). Cluster analysis reveals at least three, and possibly five distinct handedness groups. Neuropsychologia, 30(4), 373-380. https://doi.org/10.1016/0028-3932(92)90110-8
R Core Team. (2019). R: A language and environment for statistical computing. (R. F. Computing, Producer). R Core Team. https://www.R-project.org/
Robin, X. (2021). pROC (R-Package). Display and Analyze ROC Curves. Expasy. http://expasy.org/tools/pROC/
Schauberger, G., & Groll, A. (2018). Predicting matches in international football tournaments with random forests. Statistical Modelling, 18(5-6), 460-482. https://doi.org/10.1177/1471082X18799934
Theodoridis, S. (2015). Machine Learning - A Bayesian and Optimization Perspective. Amsterdam: Elsevier Ltd.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
World Cup Playing System. (2019). Retrieved from https://ittf.cdnomega.com/eu/2019/02/