OZmium Sports Betting and Horse Racing Forums

OZmium Sports Betting and Horse Racing Forums (http://forums.ozmium.com.au/index.php)

- Horse Race Betting Systems (http://forums.ozmium.com.au/forumdisplay.php?f=10)

- - Ratings aficionado's (http://forums.ozmium.com.au/showthread.php?t=29896)

Ratings aficionado's

Hi all,

Figured i'd spark another thread on here and get some conversation happening. In this thread I want to discuss ratings and more specifically weighting the individual factors that are taken into account.

I'm not really interested in discussing what factors to consider or what ones people think are more/less important. What i'd like to discuss is how people do/would approach the weighting of each factor.

For example you might start out with a ratings approach where every single factor is weighted to 10 points. So the horse with the best place % gets 10 points, the horse with the best API gets 10 points etc etc and it scales down from there for each horse in the race. What i'm wondering/asking is either how people approach or what they think the best way is to weight these factors from a mathematical point of view to get the best out of your factors in a final rating.

The approach I'm taking at the moment is something close to the following description but i'd like to know if others have some input and/or ideas:

First I take the strike rate for each factor based on 1st, 2nd, 3rd or 4th. I then multiply each strike rate by 4, 3, 2 and 1. For example the top sky rater has the following positional strike rates:

1st - 22.0%
2nd - 19.2%
3rd - 16.4%
4th - 8.9%

(0.220 * 4) + (0.192 * 3) + (0.164 * 2) + (0.089 * 1) = 1.87

I do this for all the factors, then divide them all by the highest score so they are on a 0 to 1 range (i.e. 0% to 100%)

Next I look at the POT of each factor. Majority factors are negative in their own right so what I do is take the lowest POT and boost all other POT's by that amount. i.e. if the lowest is -31%, this would be re-calculated to 0% and a -25% POT would be re-calculated to 6% and so on. Once these are all re-calculated I do the same as the strike rate and adjust so they are on a 0% to 100% scale.

Finally I like to look at what the profit divided by the highest dividend for that factor comes out to as it shows the consistency of that factor. However as majority of the single factor profits are negative I 'boost' them in the same way as the POT calc using the lowest factors profit. Once this is done and I divide the new profit figure by the highest dividend, I adjust to the 0% to 100% scale as well.

once these 3 scales are completed for SR, POT and Profit/MaxDiv I then simply weight these scores for a total figure such as:

(3 x POT outcome) + (2 x P/MaxDiv outcome) + SR outcome

This gives a final score and you simply weight the factors based on this score. I hope the above isn't too hard to understand (let me know if you have questions). This is just the way I approach it at the moment when looking at how to adjust the factors to get a final rating that is more significant then rating all factors the same.

My approach doesn't really have any mathematical significance etc besides me just approaching it on what I believe is more important. Hopefully someone can offer are more structured approach thats more mathematically sound? If not i'm happy to continue with this method but just wondered what ideas other people had.

Cheers

There are three major parts to each race. The venue, the runners and money. Most confuse them. Each should be treated individually first and then combined.

Care to elaborate baton? Don't have to go into specific's with regards to your own approach if you wish, just trying to wrap my head around your statement as not entirely sure what you mean with how those 3 major parts apply to factor weightings for ratings?

i.e. do you mean that you might weight factors differently for a race at Caulfield compared to a race at Ascot? or maybe weight differently for a Class 3 race compared to a Group 2 race?

Would also be interested in any input mattio may have (if he's still around) given his personally created ratings.

Appreciate the reply beton :)

A 1600 race at Ascot is different to a 1600 race at Randwick. Each track and distance has it's own characteristics. The barrier may be the key factor, the leader at the turn may be the key factor. The jockey in this particular race may be the key factor. What happens on a good surface may not occur on a heavy surface. So you need to get the factors for that distance at that track at that surface for that field size.

Then you need to compare the runners including the little bloke on the back. Just because it is classed a a A Lister party does not mean that is a A Lister party if on the contestants of Australian Idol turn up. After you have assessed the class of race with the class of the field and rated them, then you must see how they will suit the race (the track, the distance, the barrier, the fieldsize at that track etc)

Only then can you look at the money and that is 2 parts. Your estimate and what the market says.

You could use either Multiple (Linear) Regression or (Binomial) Logistic Regression, depending on what sort of output you wanted.

You would use Multiple Regression if you wanted the combination of all your independent variables (Sky Rating, track odds, barrier number, career runs, etc) - multiplied by individual coefficients - to sum to some total (a dependent variable). For example, looking at your historical data, you could decide you want the dependent variable to be 100 minus 4 * the lengths behind in this race. If a horse won by 2 lengths you'd be determining what combination of coefficients multiplying your variables and summed together, equate to 108; (100 - 4*(-2)). If a horse lost by 15 lengths you'd be determining how your variables equate to 40. You could then look at upcoming races and find horses whose past results indicate a score greater than 100 - indicating that conditions are good for a win, or are a certain margin greater than those for the other horses in its race.

You would use Binomial Logistic Regression if you wanted the combination of your independent variables - multiplied by individual coefficients, etc - to determine the probability of an either/or event occuring. In this case, the probability of a horse winning (or not winning).

By applying either method to a large enough dataset you would determine the statistical significance of individual variables contributing to the final result. You may find some variables just don't matter.

The problem with both methods is that they clearly would work best using independent variables (for example, Betting Odds would depend somewhat on barrier draw and jockey, much like Sky Ratings do - they are not independent). While you can still use the methods, they'll be more useful as a rule-of-thumb than a firm guide.

Beton's approach of looking at specific field sizes on specific tracks over specific distances under specific track conditions from specific barriers would likely lead to more accurate models, but you may find yourself in turn limited by the size of your resulting dataset.

There's a bunch more reasons why any model won't be accurate (perhaps you're missing important variables - like blood counts, or training performance; or outliers are skewing your data; or the relationship between the variables isn't linear at all, maybe it's logarithmic/polynomial/exponential; etc). Still, I certainly feel it's worthwhile applying some mathematical rigour to your processes. You can do it in Excel and there are a heap of how-to's available.

Depending how big your database is, you could use the approach I used to determine the importance of variables.

I admit it's a rough and ready method but once you have the results it's easy to refine.

This is the method simple but takes time, determine your handicapping factors/ideas, I came up with about 30.

Order them in descending or ascending order according to whether you set min or max point for best.
Run them through your database one by one and print out the final results win% place% for as many selections as you deem important.
I used the top four only.

The nuts and bolts are dependent on what programme you use I use C in .net.

The outcome is quite interesting, some factors almost win outright but for the TAB takeout give or take a few %, some are mere punter's myth.

It takes time but worth it, you could do worse.

walkermac, firstly appreciate the reply. Along with beton's post its the kind of detail and differing view that I'm after.

At this point in time the size of the dataset I'm using wouldn't be sufficient to draw any significant conclusions with regards to how to weight specific factors based on distance, track, conditions etc - however thats not to say i'm discarding this train of thought as its something i'll utilize in the future, it's just not something I can implement now.

It sounds like Binomial Logistic Regression is the path I need to investigate. As you say it would work best with truly independent variables but I think it would be useful as a guide or even for comparative sake to my current approach. The issue with horse racing is you can get into debate's about what is truly independent - i.e. is number of wins a good independent stat to utilize or is that then dependent on number of career starts which then means is % Strike rate a better factor, so on and so forth.

I'll do some googling and mess around with the numbers tonight and see how I go. Having a more solid mathematical approach and background to the factor weightings rather than my current gut feel / POT & SR approach is certainly what I'm keen on.

How are you going with this approach, evajb001?

I found what looks to be a useful tool online with all the bells and whistles, along with some explanation: http://www.jeffreysmorrison.net/default.aspx

Hi walkermac,

Given the way I record the data for each race unfortunately this path wasn't one I could go down which meant I stuck with my original approach.

I've taken a look at that website you posted, unbelievable how much effort the creator of that website would have needed to put in to code that up for anyone to publicly utilize. Once again with how I record my data it isn't of much use to me but nonetheless I had a look through and was impressed.

One a side note I've continued following your NRL and AFL threads. Haven't had a chance to play around with the massey ratings you linked me to but its on my eventual to do list. The ratings approach I currently use has had a much improved fortnight compared to earlier in the season.

Logistic regression. Now you're talking.
From the moment that I learned that all gambling is about applied maths, horse racing included, I have never looked back. The best tool of all, year in and year out, was the SP market, now superseded by the betfair market.