I'M BETTOR | FREE Soccer and Football Predictions analyzed by unique system and team of experienced tipsters. Football tips and Match Previews for 150+ Leagues and Cups!
I'M BETTOR | Free Football Predictions
FOOTBALL PREDICTIONS   BETTING ARTICLES   TABLES/STATISTICS    
  ODDS | LIVE SCORES | FOOTBALL DAILY | BOOKMAKERS | online casinoCASINO
  Advertisement
Bookmaker Reviews & Betting Offers
mybet
  Sportsbook offers
mybetExclusive 100% bonus!
Claim your €200 first deposit bonus and start betting on your favourite sports! Exclusive offer for I'M BETTOR users!
Read mybet review
PinnaclePinnacle
Are you looking for the best odds with the highest betting limits? You just found it!
Read Pinnacle review
william-hillTriple your deposit!
William Hill provides an industry leading betting experience to sports bettors. To make the picture complete, they offer a £/€20 Free Bet.
Read William Hill review
100% Welcome Offer
Deposit €25 and play for €50 today!
Read Betfinal review
Double your deposit!
Claim your bonus and get into action: Deposit 25GBP and bet for 50GBP! Bet with Betsson on best markets.
Read Betsson review
  Casino offers
10betThe World's Happiest Casino
Get going BIG TIME with an amazing bonus 100% UP TO €100 + 50 FREE SPINS
leovegasYour Mobile Casino
€1,600 Welcome bonus + 300 FREE SPINS
casumoCasumos welcome kit
€1200 bonus and 200 free spins
rizk casinoRizk Casino
Deposit with €50 Play with €150 bonus + 50 free spins
guts casinoWelcome Bonus
Get your €300 Welcome Package Plus 100 Free Spins
more casino offers
Livescores   Soccer.ru
Rambler's Top100  
 

The problem with data mining in sports betting

Published: 06.01.2018 02:00


Using data as part of a betting strategy is common practice. However, as impressive as some results may appear, the process of producing such results is the important part. What are the problems with data mining in sports betting? Read on to find out.

Over the past few months, I’ve come across a sizeable number of websites, blogs and forum posts which claim to have uncovered profitable betting systems simply from retrospectively applying a few seemingly arbitrary selection criteria to a large data set of historical results and betting odds.

In this article I investigate the pitfalls of searching for a profitable advantage via data mining: for the sports bettor, correlation without causation spells trouble.

Data mining and dredging

Data mining involves the process of analysing large sets of data to uncover patterns and information. More specifically, the task of data dredging is the use of data mining to uncover patterns in that data which can be presented as statistically significant.

We can’t retrofit an explanation on the back of an outcome since this is turning causality on its head.

Sporting betting lends itself easily to data mining and dredging. Various websites make large volumes of historical football results and betting odds available for the purposes of retrospectively searching for and testing profitable betting systems.

The major limitation of using this as a data analysis tool, however, is that priori hypotheses to account for why those patterns might have occurred are typically not proposed.

Correlation without causation 

I have previously discussed the pitfalls of confusing correlation with causation, precision with accuracy and validity. For a betting system to be valid and really doing what it is supposed to be doing we must have some idea about what causes its success in the first place.

Unless you establish the causation behind the correlation, you will have no idea what might cause your correlation to break down - correlation without causation is meaningless.

Hidden value in English League Two soccer?

On my twitter feed a few weeks ago, my attention was drawn to the outstanding returns one could have realised by blindly betting on all away wins in English League 2 from 2012/13 to 2016/17 inclusive – approaching 3,000 wagers – amounting to 4.3% from closing Pinnacle odds and nearly 10% from best market prices.

Only one of those five seasons witnessed a loss to Pinnacle’s closing prices, and that was small. The profit chart looks like this.

The suggestion was that the market was undervaluing away teams in this division, that is to say they were overpriced. This is not some short-term aberration, however; rather it would appear to be a consistent and systematic error in the way bettors had underestimated the likelihood of away wins in English League 2, far beyond the boundaries of the bookmaker’s profit margin. But can we really believe there is anything causal in what we have found here?

Backing the Draw: It sounds so simple

Another strategy I’ve recently seen published is called Backing the Draw. It claims to have returned close to a 16% profit over turnover from over 2,500 wagers when tested retrospectively over soccer results and Pinnacle match betting odds back to 2012.

The selection criteria are simple: neither team should have drawn in the previous three games; odds should be in the range 3.20 to 3.56. Testing the statistical significance of this profit we find that record is indeed exceptional. We could expect such a level of profitability from these odds to occur perhaps just once in a million times or less assuming the pattern to be nothing but random.

One might well ask why these particular criteria have been chosen. Why not the previous four, five or six games? Why not odds 3.07 to 3.41, or 3.13 to 3.72? Of course, these criteria were almost certainly not chosen before the data were mined; they were simply found to have produced the profitable outcome they did. And we can’t retrofit an explanation on the back of an outcome since this is turning causality on its head.

Unless you establish the causation behind the correlation, you will have no idea what might cause your correlation to break down.

In defence of this strategy you might now also say; “one-in-a-million: surely that must mean this isn’t random, right?” Yes, true. However, if we have a million strategies to test, and we find one of them as statistically significant as this, what is that telling us? As Nassim Taleb, in Fooled by Randomness, narrates on the fantasy of monkeys attempting to recreate the poetry of Homer on a typewriter: 

“If there are five monkeys in the game, I would be rather impressed with the Iliad writer, to the point of suspecting him to be a reincarnation of the ancient poet. If there are a billion to the power one billion monkeys I would be less impressed...”

As Taleb points out, not many people bother to count all the monkeys, and if they did barely any of them would make interesting patterns worth talking about. Survivorship bias ensures we only get to see the winners.

Why bettors need to “count the monkeys”

If we won’t propose priori hypotheses before dredging our data in search of profitable patterns, then instead we should test a large number of betting systems to see how often we find statistical significance. As I replied to this discussion on my twitter feed, “let's plot the distribution of yields from 10,000 samples of blind bets selected according to 10,000 different criteria and see what it looks like.”

Well, I couldn’t find 10,000 samples of blind bets of suitable size – that would involve a lot of data – but rather 1,686 of them of 100 wagers or more. Each sample represented a season of blind betting on a particular result, home, draw or away, for a single soccer league over a single season.

Having first removed Pinnacle’s profit margin to calculate the ‘true’ prices for each outcome, I then calculated the theoretical returns for each sample and their t-statistic, my preferred measure for how unlikely such returns could arise by chance. These are plotted in the distribution below. Positive t-scores represent profitable samples, negative scores loss-making; the larger the number, the more unlikely it is.

Those of you familiar with the normal distribution (bell-shaped curve) will recognise it as evidence of randomness. That is to say, the performance of these samples of blind bets conforms closely to what we would expect to happen if everything was subject to chance only.

Taken as a whole, there is evidently little or nothing systematic happening at all. Those profitable seasons in English League 2 were most probably just lucky performances uncovered by messing around with data and stumbling upon something that looked like a profitable pattern caused by systematically irrational bettor or bookmaker behaviour.

The ‘true’ odds returns for the five seasons taken together would have a t-score of +2.4, implying about a 1-in-100 probability (p-value) that it would happen by chance. Statistically, that is significant and if we were publishing an academic paper about it in isolation we would be motivated to call it something real. But we know that from analysing the bigger picture it almost certainly isn’t, it’s just blind luck.

If we set about devising a betting system via data dredging until we find criteria that are profitable, we risk failing to establish causal explanations for what we find.

In fact a sample from the 2007/08 season for English League 2 performed even better. The 242 matches for which I have data for from December through to May showed a theoretical profit of over 29% (or 35% from ‘true’ odds with the margin removed). Such a performance could be expected by chance about 1-in-1000 times. It was the best performance out of the 1,686 samples. 

In total, 837, or about half, of them were profitable to ‘true’ odds, just as expected. In such a sample of samples, we would naturally expect the best one to show a p-value of around 1-in-1686. We’d expect about 16 of the samples (or about 1%) to have p-values of less than 1-in-100. Similarly, we’d expect about 168 samples (or about 10%) to have p-values of less than 1-in 10. Anything different and we might rightly wonder if any of them were being influenced by anything other than luck.

In fact there were 15 (0.9%) and 158 (9.4%) respectively, pretty close to expectation. The chart below compares theoretical expectation of the percentage of profitable samples with p-values below a particular threshold (1-in-10 = 10%, 1-in-5 = 20% and so on) with the actual percentage occurring. The almost perfect equivalence is striking.

Essentially, the chart is another way of saying that almost everything we are looking at has arisen because of chance and chance only. Yes, a 1-in-1000 profitability is impressive, but if we have over 1,000 samples to choose from, it’s not unexpected, and hence it’s not strong evidence of anything causal. 

What can bettors learn about data mining and dredging?

It’s perhaps not unsurprising that the distribution of profitability by seasonal soccer division is random. It’s hardly the most sophisticated means of devising a betting system. But the significant point is this: if we set about devising a betting system via data dredging until we find criteria that are profitable, we risk failing to establish causal explanations for what we find. 

Unless we have a reason for why that profit happened, it might just be complete rubbish. Correlation without causation simply regresses to the mean. For a sports bettor that means losing money over the long term.

One might argue there’s nothing wrong in taking advantage of luck to make a profit; after all, that’s what betting is about. When we do that, however, we shouldn’t deceive ourselves by assuming that our success is a consequence of anything else.

Related Articles

  • La Liga preview

    Valencia host Real Madrid in this weekend’s La Liga game of the week. Valencia enter this game ahead of their visitors in the table but are far from favourites. Where is the value in the Valencia...
  • The basics of reverse line movement

    Monitoring line movement can provide a snapshot evaluation of the market and understanding why bookmakers make these adjustments can be beneficial to bettors. However, a common mistake is to base...
  • Winter Olympics medal table outright preview

    The 2018 medal table is sure to be hotly contested. Whilst marquee events like ice hockey may take the headlines, it is the alpine and sliding sports that offer the bulk of the medals. Who is best...
  • NFL preview

    The Divisional round has ended and just four teams remain for the NFL’s Conference Championships round. Who will make it through to Superbowl LII? Read on for some expert insight into the NFC...
  • Analysing the Champions League winner odds

    We are now past the halfway point in domestic campaigns across Europe. The Champions League group stage is over and the competition has entered the knockout phase. Which teams offer value in the...
  • UFC 220

    The UFC returns to Boston this weekend as heavyweight champion Stipe Miocic takes on Francis Ngannou in UFC 220’s main event. This article breaks down the stats for an in-depth analysis of the...
  • La Liga preview

    Atletico Madrid host Girona in this weekend’s La Liga game of the week. Girona have been a threat to La Liga’s the biggest sides and took a point from Atletico in their last meeting. Where is the...
  • Serie A Preview

    Inter Milan and Roma entered the Serie A winter break in poor form with dropped points harming their title challenges. Both will need wins to get their season back on track. Where is the value in the...
  • Premier League preview

    Pep Guardiola’s Manchester City welcome Newcastle United to the Etihad this Saturday, looking to get back to winning ways after their first loss of the season. Where is the value in the Manchester...
  • Premier League preview

    This Sunday, Liverpool welcome Manchester City to Anfield, looking to become the first team to defeat Pep Guardiola’s Citizens this season. Can Jurgen Klopp’s reds continue their impressive run...
  • NFL preview

    The Wild Card round has ended and just eight teams remain for the NFL’s divisional round. Only four teams can progress to the Divisional Championship. Who will make it through and keep their...
  • The importance of speed of attack in soccer

    One of the many uses for expected goals is analysing how important specific tactics are in a soccer match. This article explains why bettors need to understand the importance of the speed of attack...
  • La Liga Preview

    Real Sociedad host reigning champions Barcelona in the La Liga game of the week. The Basque side have a strong La Liga record at home to Barca. Read on to find out if there is value on offer in the...
  • Winter Olympics curling preview

    Although curling is not always considered a major sport, every four years it is one of the most watched events at the Winter Olympic Games. Learning how to bet on a fast growing sport like curling...
  • How quickly can you learn a skill?

    Randomness and luck play a crucial part in betting, but there is still an element of skill involved. It might be possible to win in the short-term thanks to luck, but you won’t win in the long-term...
  • 2018 WTA Australian Open betting preview

    Dan Weston has taken an in-depth look at the women’s Australian Open winner odds. With plenty of options to consider, what should bettors look out for ahead of the first Grand Slam event of the...
  • 2018 ATP Australian Open betting preview

    Dan Weston has analysed the conditions and contenders ahead of the first Grand Slam of the year. Is there any value in the men’s Australian Open winner odds? Read on to find out what the stats...
  • Mats Wilander's Australian Open preview

    Ahead of the first Grand Slam of the year, Pinnacle’s new Brand Ambassador Mats Wilander has put his thoughts on what might happen in the men’s and women’s singles tournament into an article....
  • Introducing Mats Wilander

    Pinnacle’s new brand ambassador Mats Wilander is a multiple singles Grand Slam winner and one of only two players to have won at least two Grand Slams on all three surfaces. A successful...
  • How to bet on tennis

    Tennis is a fast-paced, entertaining sport. In order to learn how to bet on tennis, bettors must understand the rules of the sport as well how different styles of play and court surfaces can impact...
  • MLB Betting

    MLB’s long offseason is underway so it is time to look ahead to this year’s MLB futures market. Which teams could shape MLB betting for the upcoming year? Read on for an analysis of those who...
  • The problem with data mining in sports betting

    Using data as part of a betting strategy is common practice. However, as impressive as some results may appear, the process of producing such results is the important part. What are the problems with...
  • The top five most popular betting articles of 2017

    Pinnacle’s Betting Resources published its 1,000th betting-related article in 2017. With various types of articles covering a wide range of sports, bettors have plenty to choose from in terms of...
  • La Liga Preview

    La Liga debutants Girona travel to third-placed Valencia for the highlight fixture in La Liga this weekend. Read on to find out if there is value on offer in the Valencia vs. Girona odds.
  • NFL preview

    The NFL regular season has now finished and just 12 teams are left with a chance of winning the Super Bowl. Eight teams will compete in the Wild Card round this weekend with only four progressing to...
  • Serie A Preview

    Fourth-placed Roma host last season's surprise package Atalanta in the highlight Serie A game of the week. Read on to find out if there is value on offer in the AS Roma vs. Atalanta odds.
  • Why the FA Cup matters for Premier League betting

    Soccer pundits claim that a successful FA Cup run can hinder a team’s performance in the league. This claim is often accepted as a fact, but does it actually impact Premier League betting? After...
  • Serie A Preview

    Inter Milan host fifth-place Lazio in the highlight Serie A game of the week. Both sides will be looking to continue their early form after performing above expectations so far this season. Read on...
  • NFL preview

    With just two weeks to go in the NFL regular season, The New Orleans Saints welcome the Atlanta Falcons in this weekend’s highlight game. With NFC South Championship and playoff spots are on the...
  • La Liga Preview

    The Bernabeu hosts Saturday’s El Clasico in the highlight La Liga game of the week. A Barcelona win would put them 14 points ahead of Real after only 17 games. Read on to find out if there is value...
Comments
Comment Box is loading comments...
 
  Email Newsletter
 

 
  Follow us and Share
                  
  Football Daily
  Advertisement

  About
I'M BETTOR brings you free football predictions (soccer predictions) and free football tips, match previews, betting articles, football tables, football fixtures & results for over 150 Football Leagues & Cups. Compare odds from 80+ bookmakers, find the best odds with odds comparison tool and check our betting strategy articles before you place a bet.
 
IAM BETTOR | Football predictions
About Us   |   Contact Us   |   Terms of Use   |   Partners
"I'M BETTOR is a unique online service that offers high quality and profitable FREE football predictions for all major football leagues using its unique self-learning neural network algorithm that provides precise football predictions and football betting tips based on statistical analysis such as points, goals, attack and defense rating, league standings, team progress/recession etc. Afterwards we take into account: team news, injuries, suspensions, importance of the match, weather and adjust outcome tip.

Our team consists of over 40 professional football analysts with 10+ years of experience and we try our best to ensure that you are provided with the most precise football predictions. Based on our system and experience we can predict the result of a football match with up to 80% success rate.

Warning! This site contains informational content on online betting services. Please be advised that you need to be 18+ years of age to browse this website. Internet gambling is NOT legal in certain areas. Consult your local authorities prior to registering with any online betting service.

Gambling involves high psychological and financial risks. I'M BETTOR cannot be held responsible for any loss or damage as consequence of decisions based on information, betting advices or links provided on this site. The visitor is warned to act exclusively at his own discretion and risk.
I'M BETTOR cannot guarantee the correctness of information obtained from third parties.

18+ Gamcare Do not use this football predictions or betting tips if you feel you might have a gambling problem. GamCare is the leading provider of information, advice, support and free counseling for the prevention and treatment of problem gambling. Make your next click to a group that will help you deal with your situation."
Copyright © 2008 - SPORTMEDIA24