Reliability of Approved Instruments in Field Over Time is a Hypothesis: Where is Empirical Study?
Updated: Aug 9, 2022
To obtain the following admissions from the Crown's expert:
Calibration of an approved instrument out in the field does not come from anything that the qualified technician does on the day of the subject test
Calibration of an approved instrument out in the field comes from what the manufacturer did years before
Calibration of an approved instrument out in the field comes from the calibration curve created years before
Out in the field we are counting on the calibration curve, specifically the relationship between the response at the end ofthe detector and the indication; we are counting on that calibration curve not having changed
We [in Ontario] verify that calibration at only one data point on the day of the subject test
If we don’t have disclosure of the Certificate of Calibration from the manufacturer from many years ago, when it came from the manufacturer, we don’t have any information, that permits us to conclude that there has been a reliable measurement result at, for example, 150 or 160 mg/100mls, if the cal. check is only at 100 mg/100 mls at time of use
The CFS scientist will then rely upon the fact that, in general, the breath testing instruments used in Canada, demonstrate that they have a linear calibration curve, so any change in calibration should be uniform across that curve, permitting a cal. check at a single point, 100 mg/100mls
But: The meat of the real issue is that the calibration curve becomes linear as a result of the calibration empirically as the particular instument learns at the factory [the software and procedure are sometimes called "the linearizer"]
It is a hypothesis relied upon by the CFS scientist that ,generally speaking, the relationship between response at theend of the detector and what comes off of the indication is a linear response
That hypothesis needs to be tested empirically
Where is the empirical evidence that tests that hypothesis?
The CFS scientist will then rely on the fact that during evaluation, the original two Intoxilyzer 8000s (sic) evaluated by the Alcohol Test Committee demonstrated a linear response
This particular CFS scientist will then rely upon the fact that during his own evaluation of the Intoxilyzer 5000EN for the Alcohol test Committee, that original instrument demonstrated a linear response
But: Those were all new instruments
No one at the Centre of Forensic Sciences, no one in the Alcohol Test Committee, no member of the Canadian Society of Forensic Sciences has every empirically tested the hypothesis in an aging instrument, that’s seven years old.
Acknowledgement of Hodgson's definition of reliability "over time"
Reliability relates to significant drift in accuracy and precision over time
If reliability relates to drift in accuracy and precision over time, one cannot reach any conclusion whatsoever with respect to one data point calibration check, without any knowledge of the original date and certification of calibration, the date of any re-calibration of the instrument, and without any information as to verification of calibration at any other data points other than 100
One cannot assume that the response is always linear over time
Reliability increases with frequency of calibration, short calibration interval
No published studies that has tested length of time that an instrument keeps it calibration? No empirical studies in Canada, no empirical studies in the United States on that subject.
CFS hypothesis: any change in calibration should be uniform right across the measuring interval scale, challenge the Crown/witness to produce studies that empirically support that hypothesis, there are none the witness is aware of.
Cross-examination of a CFS scientist:
Q. And so that’s why you have specific recommended
standards of the Alcohol Test Committee. Let me put it this way,
if an Intoxilyzer 8000C is being used out in the field by a
particular police officer, it’s reliability, it’s calibration
doesn’t come from something that the qualified technician does...
Q. ...On the day of the subject test. Its
calibration comes from what the manufacturer did at the date of
the calibration back a number of years before.
A. That’s correct.
Q. We are counting on the calibration curve, in
other words the relationship between the response at the end of
the detector and the indication; we are counting on that
calibration curve not having changed, right?
A. But we are also verifying the calibration,
albeit only at that...
Q. Only at one data point, namely 100 milligrams
per 100 mills.
A. That’s correct.
Q. So my question to you is, if an instrument has
not been calibrated properly in the first place, if we don’t ever
see the Certificate of Calibration from the manufacturer from
five years ago or six years ago or seven years ago when it came
from the manufacturer, we don’t have any of that information, how
can we ever make a determination that there has been a reliable
- that the indication on the instrument is reliable if the test
result is 150 or 160 if the only checking that we’ve done is at
100 milligrams per 100 mills?
A. Well in general breath-testing instruments that
have been used in Canada demonstrate that they have a linear
calibration curve and so any change in calibration of the
instrument should be uniform across that calibration curve. That
gives us the ability to put together a procedure that only uses
that single calibration checkpoint.
Q. All right, so now we get to the meat of the
real issue of all the questions that I’m asking you. You’re
saying that any change should be uniform because the calibration
curve essentially becomes linear...
Q. ...As a result of the calibration empirically.
I’m asking you as a scientist. You know what I mean by
Q. You’ve put forward a hypothesis to suggest that
by in large, evidentiary breath test equipment that’s used in
Canada – the response, the relationship between response at the
end of the detector and what comes off of the indication is a
linear response. That’s the hypothesis that you’ve just put
Q. Right? All right, let’s test that empirically.
I know, and I want to suggest to you the same Terry Martin that
we just talked about, when she did her evaluation for the
Intoxilyzer 8000C wrote a paper right after that evaluation where
she tested the instrument and made – reached the conclusion that
the response was linear, right?
Q. That was a new instrument with a new
calibration certificate just like the one – like the 5000EN that
you received for evaluation, right?
Q. No one at the Centre of Forensic Sciences, no
one in the Alcohol Test Committee, no member of the Canadian
Society of Forensic Sciences has every empirically tested the
hypothesis that you’ve just put forward in an aging instrument,
in an instrument that’s five years, six years, seven years old,
A. Not that I’m aware of or no one’s – I’m not
aware of a study that’s been published showing that but it
certainly has been tested and is part of our training and it’s
also been tested by manufacturers, which is why their internal
test only measures the calibration at one point, again at 100
milligrams of alcohol in 100 millilitres of blood unless by
statute a jurisdiction is going to use another point.
Q. Okay, now before we talk about the internal
test procedure, you said that you have this hypothesis that any
change over time - and let’s just go back to a paper by Brian
Hodgson, you know who he is?
Q. You know that he wrote a paper that the Supreme
Court of Canada relied upon in a case called St. Onge Lamoureux?
Q. And you’re very familiar with the paper in
which he defined what accuracy is, what precision is and he
defined what reliability is. He also defined specificity, right?
Q. And he referred to reliability as referring to
significant drift – or I’m sorry, significant drift in accuracy
and precision over time, right? Have I got that roughly right?
A. Sounds like a good...
Q. Sounds like a good...
Q. ...Definition. You’d agree with that
definition with what reliability is?
Q. So here’s the question, if reliability relates
to drift in accuracy and precision over time, how can a court
reach any conclusion whatsoever with respect to one data point
calibration check without any knowledge of the original date and
certification of calibration, the date of any re-calibration of
the instrument and without any information as to if anybody has
verified that calibration at any other data points other than
100? Why on earth would a court assume that the response is
A. Well, without having any evidence I suppose
that the court couldn’t. They would need the evidence of an
Q. The expert who...
THE COURT: Maybe we can frame this – I don’t think
we should be framing this in terms of what legal
conclusions the court might reach.
MR. BISS: All right.
THE COURT: I mean, if you want to ask him how, you
know, how he would understand it to be, that’s one
thing but I think we have to be careful about how
this is framed.
MR. BISS: All right. I’ll do better then, Your
Q. You’d agree with me that reliability increases
with any kind of measuring instrument with frequency of
calibration or short calibration interval? That’s a general
concept across science.
A. Well, obviously yes but then relative to how
long that actual estimate keeps its calibration...
Q. And again, you’d agree with me no published
studies, certainly none that you’re aware of, that has tested
length of time that an instrument keeps it calibration? No
empirical studies in Canada, no empirical studies in the United
States on that subject. You’re not aware of any that are
A. You know, none comes to mind but I haven’t
turned to my mind to that for 20 years.
Q. Right. So here’s the problem, you say – you
draw the inference from the hypothesis that you’ve proposed, that
any change should uniform right across the measuring interval
scale so therefore you put together a technical program, a
technical set of recommendations that says let’s have – and this
is what the Alcohol Test Committee has done, let’s have all of
the police services across Canada run at least one or more
control tests at one data point when they’re running an
evidentiary breath test. That’s the reason for that procedure?
A. See, I don’t know that it was – that the basis
was just the hypothesis. Obviously there must have been testing
of instruments reliability over time and the change in
Q. I wanna suggest to you that there are no such
empirical studies and I mean I’m challenging you, I’m challenging
the Crown to produce them but I wanna suggest you to there are no
such empirical studies. It’s an assumption that’s been made and
as a result of that, I think you said earlier, as a result of
that that they put together a program, a package, a set of norms,
a set of technical norms for qualified technicians to follow,
A. Yes, I’d agree with that.
Q. It’s because of that assumption.
A. But I also can’t – I mean I can’t say that it’s
just an assumption. I can’t think of any published studies right
now, but even if there were no published studies I can’t discount
that – our understanding of the linearity of the instruments
wasn’t – hasn’t been, in fact, studied in any number of forensic
[after a break]
Q. And a control test at one data point is not a
check of the calibration of the instrument; it’s only a technical
procedure to try and help police officers to screen out
instruments that should be taken out of service?
A. I would disagree. Just as stated here, when
the Intoxilyzer 8000C is calibrated, a number of calibrators are
A. ...So that fulfills that section. In my
opinion, the use of just one calibration check at a concentration
of 100 milligrams of alcohol in 100 millilitres of blood is
sufficient to determine if the instrument remains in calibration.
Q. I want to suggest to you that’s a technical
opinion based on a norm of your employer. It’s not a scientific
opinion and you don’t have an empirical...
Q. ...Research to substantiate that.
A. Actually do, which I’ve completely forgotten
about because it actually relates to comparison analysis and I
know of several papers that – comparing blood tests to breath
testing and in those, while there is always going to be
difference between the breath test and the blood test...
A. ...Based on the time that the tests occurred,
as well as the fact that breath testing in North America produces
results that are systemically low between 10 to 12 percent...
A. ...So in correcting those two factors, these
comparison studies have shown that there is good correlation
between blood and breath results from the same individual from
the same incident and moreover – more importantly is that there
was the difference between the two did not vary by concentration.
In other words, there was no evidence that breath-testing
instruments got worse as the blood concentration changed over the
occurrence and most if not all of these tests were done with
instruments that had been in the field for some time. Some could
have been just re-calibrated; others certainly would have been in
use for a variety of times...
Q. Do you have a copy of that study?
A. Not with me, no. It’s one by Jim Wigmore and I
forget who the other author was. Then there’s a paper by Hodgson
who also looked at the relationship and there is one by Cowan and
I don’t know the age of the instrument he used but he performed
simultaneous analyses, blood and breath, using the Intoxilyzer
8000C and showed that in 100 percent of cases the breath result
was lower than the blood result and it – then he wasn’t doing a
linearity analysis in that particular case.
Q. All right, let’s talk about Wigmore and
Hodgson. Do you have knowledge of whether the instruments in
that particular case, it may have been older instruments, but
when had they last been recalibrated?
A. That’s information – in the Centre for Forensic
Science study wasn’t available. I don’t know Mr. Hodgson’s study
makes any statement about when the instrument he was using was
Q. So those studies don’t support the hypothesis
that it doesn’t matter the length of time between calibration and
the breath testing result doesn’t matter in terms of reliability
of the instrument. It does not support that hypothesis.
A. I would disagree.
[Next time I would have these studies at my fingertips to use them in cross-examination. My client in this case did not want an adjournment for that purpose.]
Q. Well, except that we have no information.
You’re saying that these studies support that hypothesis but we
have no information about how recently before the testing had
been done, the simultaneous testing had been done, of the length
of time before that instrument had been recalibrated. I mean,
instruments that are out in the field – I’m sure that your
instruments at the Centre of Forensic Sciences from time to time
go back for recalibration?
A. Very rarely, yes.
Q. But on occasion they do?
Q. But the point is that nobody’s done an
empirical study to determine whether that linear relationship
lasts over a long period of time?
A. I can’t say with certainty whether there has
been a specific study of that or not.
Q. All right.
A. And I don’t have access here to my Alcohol Test
Committee files to see if in fact early on at some point if the
Committee didn’t actually perform that or other labs from which
those individuals came had in fact done such studies.