OZmium Sports Betting and Horse Racing Forums

OZmium Sports Betting and Horse Racing Forums (http://forums.ozmium.com.au/index.php)
-   General Topics (http://forums.ozmium.com.au/forumdisplay.php?f=59)
-   -   Is Dosage bunkum? (http://forums.ozmium.com.au/showthread.php?t=31761)

walkermac 21st February 2018 03:07 PM

Is Dosage bunkum?
Dosage is an attempt to determine a horse's potential through its genealogy.

In the early 20th Century a Col. Vuillier identified stallions who frequently appeared in the pedigree of the winners of major European races. He called them 'chefs de race', conjecturing that they were more disposed to pass on advantageous genes to their descendents.

In the 1970s a Dr. Varola published some work that further divided these chefs into categories based on the abilities they passed on: 'Brilliant' chefs passed on genes for pure speed, for example, whereas 'Intermediate', 'Classic', 'Solid' and 'Professional' chefs passed along genes that were advantageous for progessively longer races.

At the end of the line was a Dr. Roman who implemented and formalised the system now in place, where only the last 4 generations are looked at and the results are weighted to pay more heed to parents than great-grandparents, for example.

Dosage was quite popular for a time in the 1980s, particularly as it consistently ruled out otherwise likely-looking candidates from winning the Kentucky Derby. The 1990s saw the first of the runners to win contrary to the Dosage system and it has met with declining interest from then on. Its still has its adherents though and occasionally gets trotted out (I mentioned it quite a bit in the Melbourne Cup thread, for example).

Some of the problems with Dosage are: that new chefs can only be determined in hindsight by looking at years of race records and how the winners' Dosage varies from that which is expected, there is no guarantee on which particular genes are passed on to offspring, there is little account made for half of the genes passed on (those by females), and no account to those passed on by sires who aren't chefs.

Consequently only the most rudimentary analysis can be performed and it is entirely superceded by an individual horse's actual race record. It's possible that DNA analysis will be able to better achieve some of the aims of Dosage in the future, but it's unlikely that that information will be in the public realm - unlike a horse's pedigree.

So how rubbish is it?

First: some terms. A 'Dosage Profile' is the series of digits that track the number of points for each attribute a horse receives from its ancestors. The Dosage Profile for Rekindling is (3-0-10-4-1). It has inherited 3 Brilliant Points, 0 Intermediate Points, 10 Classic Points, 4 Solid Points, and 1 Professional Point.

The Dosage Total (the sum of all those points attained) could indicate the 'classiness' of a horse. On the whole, horses whose parentage included some of the great stallions would expected to be better performing than the milkman's horse. The former type would have a higher Dosage Total than the latter.

As stated earlier the attributes are on a scale of distance. So if all of the points in the Dosage Profile were towards the left, a horse would be considered likelier to be a sprinter. In Rekindling's case, the points are more grouped to the right, which would indicate a predilection for longer races. The Centre of Distribution (CD) is the "balancing point" in a horse's Dosage Profile. For uniformity's sake we can use the CD as an indicator of a horse's best distance. The way CD is calculated means that it can equal a figure somewhere between -2 and +2.

In reading about Dosage you'll likely encounter something akin to the following:

(from http://www.teamvalor.com/dosage/dosage.htm)

Let's test if this scale has any bearing on reality.

I have a couple of years' worth of data from Aus, NZ and HK. I only want to look at races on Good tracks, as the ability to run on soft surfaces may trump that of a horse's ability to run well over a specific distance. I also chose to restrict my analysis to races with prizemoney >= $50,000. If I look at the victor of each race and - where available - their CD, I get the following chart (only distances with 20 or more qualifying races contested over the period are included).

The bars show the range where, given the size of the sample races at each distance, we can have 95% confidence that the mean value lies within. The bigger ranges show where there is less data to work with.

The good news is, as the race distance increases, the CD gets lower - just as we expected! The bad news is, it's nowhere near the scale of change we had hoped for (from +2 to -2).

It should be pointed out that the data I scraped was based on the 'official' chefs de race. The list of chefs was based on North American and European stakes races. Many of the sires we see in this part of the world don't feature on the list. There is a separate Aus/NZ chefs list (http://dosageprofile.com/anz-chefs) but it would be a substantial process to calculate the figures for so many horses and - given the very moderate trend shown in the graph above - I would be amazed if it wouldbe substantially changed.

It might all be a moot point anyway. Dr Roman has stopped maintaining his list of chefs so it's likely to become even less relevant each season, as old sires pass and new sires rise.

Just a bit of fun. There might be a few more graphs to come.

UselessBettor 21st February 2018 06:00 PM

Interesting post.

One thing which struck me from an analysis point of view was how do I know your results are not biased based on the entrants in the field.

This is easily checked though by plotting the average dosage value of the field at each distance vs the average winning dosage value for each distance.

But thinking about it further... are the results skewed because of the dosage figures themselves. If I knew my horse had a dosage figure suggesting it should be a long distance runner then that would influence me to enter it in long distances rather than short distances.

Nevertheless, an interesting post and hope to see more to come.

walkermac 22nd February 2018 12:27 AM

Originally Posted by UselessBettor
One thing which struck me from an analysis point of view was how do I know your results are not biased based on the entrants in the field.

This is easily checked though by plotting the average dosage value of the field at each distance vs the average winning dosage value for each distance.

Here's the same average winner's CD (and error bars) along with the average.

It looks most advantageous to have a higher CD at a 900m race (it's actually an 851m-950m race, forgot to mention that). The average runner's CD is actually just outside of the 95% confidence interval for the actual average winning CD. The slim advantage appears to be maintained 'til around 1200m. But while the leftmost points look notable in comparison, it's still only around 0.15 of difference between the CD of the average winner and the average competitor (3% of the total scale)

Originally Posted by UselessBettor
But thinking about it further... are the results skewed because of the dosage figures themselves. If I knew my horse had a dosage figure suggesting it should be a long distance runner then that would influence me to enter it in long distances rather than short distances.
I think a trainer would defer to entering it in distances that the horse's training performance indicated it would be best suited to. I agree that there is a correlation with dosage figures, just that it's a pretty weak one. I'd be interested to see whether something like the sire's progeny's average winning distance for the preceding x years gave a better indication. Still, I just don't think any system of that type would take into enough account an individual horse's characteristics to be of much use in handicapping.

walkermac 22nd February 2018 11:01 PM

Next graph is that average of each winner's aptitude total for differing distances.

If you recall, Brilliant chefs are considered to predominately pass on pure speed genes. Intermediate, Classic, Solid and Professional chefs-de-race pass on genes suited for progressively longer races. A Dosage Profile for a horse - e.g. (3-0-10-4-1) - notes the totals in each aptitudinal category.

The average field totals are - on the whole - so similar that they're not worth cluttering up the graph. (Where they differ: the average Intermediate totals for Winners are slightly greater than that of the field at distances 2000m+, which doesn't make much logical sense were the model to be correct; and the average Professional totals for Winners are slightly lower than that of the field at distances 1100m-).

As can be seen, each of the aptitudes mostly stay in distinct bands. CD, which has historically been used to indicate preferred racing distance, measures the balancing point between speed-stamina. The speed-based figures (Brilliant and Intermediate) are always far greater than the stamina-based ones however, so it's unlikely we'll ever see the relationship between CD and distance as per the scale here:

Part of that is the due to the disparity between the number of Brilliant and Intermediate chefs (60 and 66, respectively) and Solid and Professional ones (38 and 39, respectively). The Classic total is way up there on account of there being 103 Classic chef-de-races. The greater number of chefs in some categories means there is a greater chance of more of those stallions in a horse's ancestry. (Chef numbers are from this list circa a couple of years ago: https://web.archive.org/web/2016121...fs_by_group.htm)

The situation is even more dire for stamina totals. The average year of birth for each of the chefs in the B-I-C categories are 1955. For Classic and Professional its 1948 and 1935, respectively. Not only are there fewer chefs passing on stamina genes, but they're likelier to be further back in a horse's pedigree, contributing less to their totals (or not at all if they're beyond 4 generations).

I also would have hoped to have seen some movement in the lines. The Brilliant line "should" be highest at the shortest distances, then becoming significantly less important as the race went on - though never to 0 as speed is always useful. Instead it's pretty much constant.

Intermediate and Solid show a small amount of variation as race distance increases, but you would have hoped that the former's totals would have begun to trend downwards beyond a certain 'ideal' distance.

The Classic total on its own seems to give a better indication of distance preference than CD does. Not that it would help us find a winner, as the average runner has pretty much the same figures. It might help in the saleyard slightly if you were particularly keen on finding a certain type of horse however.

Chrome Prince 23rd February 2018 04:55 PM

Brilliant thread walkermac, right up my alley.
The great sires like Sir Tristram, Danehill, Redoute's Choice, Bel Esprit, Fastnet Rock, Northern Dancer etc. do pass on their attributes, but the whole dosage algorithm does not take into account the Dam, and that's where it falls to it's knees.
A leading trainer once told me "all our fillies and mares go straight to the breeding barn at the end of their career". I was gobsmacked!
If a mare or filly doesn't make the grade, never races, is an average, or poor performer, they all go straight to the breeding barn.
This is where the house of cards influences the dosage results for a Sire.

And still Sunline's full brother Flaring Sun could only manage one Maiden win at Seymour. Same Dam, same Sire. Ironically yet again at his first start the bookies opened him up at $2.20 and although he drifted to $3.70, he finished 7th of 10.

It all depends on the Dam, and what genes the progeny inherit.

They get it so wrong at the sales these bloodstock agents and trainers, the one they got right was paying the price they did for Black Caviar, but that was all Peter Moody who now is a very good bloodstock agent.

I'm a huge believer in interactive nicking.

walkermac 24th February 2018 12:55 AM

Originally Posted by walkermac
The Dosage Total (the sum of all those points attained) could indicate the 'classiness' of a horse. On the whole, horses whose parentage included some of the great stallions would expected to be better performing than the milkman's horse. The former type would have a higher Dosage Total than the latter.
The 'well-bred' type would definitely have a higher Dosage Total but - as it turns out - they wouldn't necessarily be better performing than the milkman's horse.

I took "better performing" to mean: they win more expensive races. The graph shows the winner's Dosage Total (there is little discernible difference from that of the average of the field). The winning Dosage Total does go up very slightly as the prizemoney increases, but the effect is quite small and whether it's due to the horse being better bred is questionable.

One would think that offspring from certain sires are more likely to have a higher dosage total on account of their starred heritage. They often cost more, are bought by the big stables and would be consequently steered towards more expensive races.

walkermac 26th February 2018 12:49 AM

In Australia at least, dosage certainly does appear to be bunkum. Even if it weren't, it quickly becomes impractical to use without additions of new chefs-de-race as time passes. The nature of how chefs are determined is to see if the data fits better if they were a chef. To gather a sufficient amount of data for a new sire is a matter of years.

The big mark against dosage however, are the subsequent studies regarding horse genes and their inheritability. Dosage was looking in the wrong place.

In 2015 the paper "Potential role of maternal lineage in the thoroughbred breeding strategy" was published. Per the abstract (can't find a free source for the whole work) they examined the race records of nearly 700 thoroughbreds (prizemoney per start). The horses were placed in four categories depending on whether their dams and sires were elite or poor performers (i.e. Elite-Elite, Elite-Poor, Poor-Elite, Poor-Poor). The results showed, with statistical certainty, that the heritability of race performance between dams and foals was greater than that between sires and foals.

It's still a huge crapshoot though. Per the graph, 13.6% of the variation in performance among horses could be attributed to the genetic influence of the dam, but only 3.8% to the sire. This still leaves over 80% to be attributed to other factors: environment, training, nutrition, stress, psychology, ad infinitum.

This paper was preceded by many that looked at genes. For example, in "Mitochondrial DNA: An important female contribution to thoroughbred racehorse performance" they investigate whether mitochondrial DNA, which had been implicated in influencing fitness and performance characteristics in other mammalian species (humans), does the same job in horses.

Mitochondrial DNA converts energy from food into a form that cells can use. It's only inherited from the mother. mtDNA also changes very little over long periods; it can be traced back hundreds of generations. As there are different energy requirements required for racing over varying distances, different groups of genes could be more advantageous at some distances over others. The authors identified 17 (haplo)types that thoroughbreds from their large sample could belong to (these are not the same as the "female families" which were derived from stud books and have been shown to be filled with errors). Group size estimates varied from <0.01% of the total population to 19.2%.

Several types were shown to have statistically significant variation in their performance over distance (races studied: every winner of a UK 3yo race between 1954 and 2003). These groups accounted for just over half of the race population.

("Race Index" is the percentage of victories by each type divided by the percentage of that type in the total population of horses).

Their paper's conclusion: "These observations lend support to there being an important female component contributing to stamina optima which should be taken into account when planning thoroughbred breeding strategies".

Note that their conclusion is talking about genes passed on via Mitochondrial DNA, a process that occurs irrespective of the success or otherwise of a dam's racing career. Given the results of the first paper though, it would appear best to breed from an accomplished mare, of the right type....then cross your fingers that you hit the gene jackpot.

Next posts will be on the "female dosage".

walkermac 28th February 2018 02:01 AM

In light of these genetic discoveries, a layman by the name of Bill Lathrop began an analysis of group race winners with respect to their female heritage. Knowing that there were queries regarding the female families in the stud book, he chose to limit his study to mares born after 1900, where he figured they would have had their act together by.

He collected the results of more than 16,000 group races over 40 years from around the world and placed each of the winners into categories based on the race distance. He re-used the same categories from Dosage: Brilliant, Intermediate, Classic, Solid and Professional. He then did some data crunching to determine which category - or categories - the "conduit mares" of each horse would lie. These mares are the ones handing down their mitochondrial DNA to each subsequent generation.

Each horse has 16 conduit mares: trace each sire in the first four generations back via its line of dams to the first one born after 1900, to get 15 of them. The 16th is the tail-female, its dam's dam's dam's dam, etc. This could be considered the most important as that's where the horse is getting its own copy of mitochondrial DNA.

Unlike dosage, where only the chefs-de-race in its pedigree pass on aptitude, every female line is accounted for when using conduit mares. Hence, there's no equivalent for something like Dosage Total showing how well-bred or 'classy' a horse is; it's only about the balance of influences.

In the previous post I wrote that about 50% of horses inherited mitochondrial DNA that seemed to make a difference to their ability over different distances. Of course this means that the other half may not. Conduit Mares can cater for this by being "Transcendent". ...which is a poetic way to say that their descendents aren't restricted to any one, single category. If they span all categories then they don't effect the balance of the profile.

A Conduit Mare Profile looks much like a Dosage one. Rekindling's is (6-9-1-6-11). The ratio between speed and stamina in the profile is called the Conduit Mare Index and here it equals 0.76.

Differing from dosage, you might also see references to Speed (the sum of the two leftmost categories), Stamina (the sum of the rightmost categories) and Triads. The last would be (16-16-18) for Rekindling; the first figure is the sum of the Brilliant, Intermediate and Classic categories; the second is the sum of the Intermediate, Classic and Solid categories; and the third is the sum of the Classic, Solid and Professional categories.

There is a belief that both the Conduit Mare Index and/or the Triads can indicate a distance preference:

Sprints (5-6F)
The higher the Index the better
The triad should be in the form: High Number - ANY Number - Low Number
Mile (7-9F)
Over 1, the higher the better
Slightly Higher - ANY - Slightly Lower
Medium (10-12F)
Under 1, the lower the better
Slightly Lower - ANY - Slightly Higher
Staying (13F+)
Closer to 0 the better
Low - ANY - High

For Rekindling, his Conduit Mare Index (0.76) best matches the "Medium" distance range, indicating that 2000m-2400m is ideal. The Triads (16-16-18) indicates the same, as (the first) 16 is only slightly lower than 18.

Let's see if it actually works. For the same races as in the previous posts:

In races 1400m or less, the means for winners is higher than that of the losers BUT more data would be required to make those error bars smaller. Much like the CD graph, we do see the Conduit Mare Index getting lower as the distance get longer, but not to the extent hoped.

Regarding the Triad method of determining distance preference, I subtract the Third Triad from the First one. If there's a large positive difference then it must be a High Number minus a Low Number, etc. The graph looks like this:

It's always a negative result as the First Triad is usually smallest (perhaps because Brilliant and Intermediate Conduit Mares number the least among the categories).

There appears to be a clear relationship there, but what I've omitted from the chart are the error bars: they're big, because the standard deviation is around 3 for each reading. ...but they still aren't big enough to squeeze Rekindling, for example, where we'd like. 1300m is the longest distance that has -2 (Rekindling's First Triad - his Third) within its 95% confidence interval.

walkermac 2nd March 2018 02:54 AM

Here are how each category changes over distance:

Professional seems to be the only one that varies over distance and in a way we would expect. (Classic varies also but it's behaving more like Brilliant should).

I wouldn't expect a lot of accuracy in the middle categories. A "transcendent" conduit mare could be listed as Brilliant/Professional, for example, and that seems to imply it's as likely to pass on Brilliant genes, as Intermediate as Classic as Solid as Professional. Points don't get added for each category though, just half for Brilliant and half for Professional. That's fine for determining the balance point (i.e. Conduit Index), but looking at specific category totals would then be incorrect. ...ignoring what the horse actually inherits - if anything useful.

Speed and Stamina, the two leftmost and rightmost numbers in the Conduit Mare Profile respectively:

It looks like the longer the race the higher the Stamina figure you'd like. Speed becomes very slightly less important.

So where do you get this info? You can do a pedigree search at www.pedigreequery.com. The Dosage Profile and associated figures are free on the 5-generation charts. If you want the Conduit Mare information easily you need to buy a membership.

You can calculate both Dosage figures (including Australia/NZ sires) and Conduit Mare figures yourself but it takes some doing. The majority of the Conduit Mares and their attributes are listed here: http://dimarres.wixsite.com/modern-conduit-mares (though there seems to be some missing and a few different, but it'll be ballpark).

And how would you use it? Well, a lot of it seems bunkum but there's a few areas where it appeared helpful. I picked Summer Sham last weekend but talked myself out of having her in the tipping comp 'cause she was untested over 1400m. It's been bugging me ever since... Maybe I should have looked at this bunkum! :)

Her Dosage Profile is (4-2-8-0-0) and the Classic number (8) is spot on for 1400m winners, per the earlier graph (also the second highest - of 11 - in the race after the fifth-finishing favourite; the runner with the 3rd highest total came second).

The Dosage Total (14 - was the 4th highest, including one runner at $51) which looks about right for a $200,000 race (the other two highest totals were our favourite and second placegetter again).

The horses with the two least suitable CDs in the race (outside 1.645 standard deviations from the sample mean for winners) came last and second last.

It was the right kind of race to look at as well; handicaps would complicate things. ...though I'm sure there's just as many races where it would get it completely wrong. Perhaps it was just a fluke it worked for this one race that I was interested in.

I don't have a membership with pedigreequery currently, so can't check the Conduit Mare figures.

walkermac 3rd March 2018 03:34 AM

Originally Posted by walkermac
And how would you use it? Well, a lot of it seems bunkum but there's a few areas where it appeared helpful. I picked Summer Sham last weekend but talked myself out of having her in the tipping comp 'cause she was untested over 1400m. It's been bugging me ever since... Maybe I should have looked at this bunkum! :)
It looks like it worked a treat in the above instance, after the fact, but I thought I should give it a chance to fail miserably ahead of time as well - lest it appear a better approach than I actually think it is.

Wangaratta Race 1 is a set weight maiden over 1590m. Only half of the field have run the distance before and less than half of those have placed at it.

The current favourite is What Lizahead @ $3.10. It has the lowest Classic (and total) points in the race and it's CD is the only one outside of the 95% confidence interval for the sample mean of winners. On the flipside, it's also finished 0.1L back in second in each of its two starts (both over 1400m). Even if the pedigree stuff implies that it's not suited to the race distance, it doesn't necessarily mean it won't win if the opposition are rubbish.

There are two contenders with the highest Classic points total (12). One is outside the 90% confidence interval for the sample mean of winners' CD (The Kroc). The other is Salina Bay, also having the highest total points.

The remaining runners with 8 or more Classic points: Capatas and Big Blows with 10 Classic points each and Red Calypso has 8. 8 was the average winner's Classic total at 1600m. There's nothing saying it can't be less, but surely there are races where no candidate has that many points. It's consequently appealing here to err on the high side of the average.

If I had a pedigreeenquiry membership, or the patience to do 5-10 minutes of work for each horse and the luck to have the designations for all of the necessary conduit mares, I'd perhaps check to see: the Stamina total was around 14 (and definitely over 9), the Professional total was around 6 (and perhaps over 2), and that the Third Triad was around 4 greater than the First. But I don't.

Finally a bit of regular handicapping to put the qualifiers in order. I'll go with: Big Blows, Salina Bay, Capatas, Red Calypso.

All times are GMT +10. The time now is 02:03 PM.

Powered by: vBulletin Version 3.0.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.