American Journal of Law & Medicine

Neuroscience-based lie detection: the urgent need for regulation.(Brain Imaging and the Law)

I. INTRODUCTION

"Illustration" or "map" are among the most frequently used words for translating the Chinese character tu, a graphic representation of any phenomenon that can be pictured in life and society, whether in traditional China or elsewhere. (1) Investigations of the early role of tu in Chinese culture first set out to answer questions about who produced tu, the background of its originator, and the originator's purpose. How were pictures conceptualized? Interpreted? In examining tu, Chinese scholars stressed the relational aspect of tu and shu (writing) to answer both these questions, as well as to the importance of not robbing an image of its overall beauty and life with too much graphic detail. In the West, specific concepts of technical or scientific illustrations did not exist before the Renaissance. With the coming of that age, technical illustration became a specific branch of knowledge and activity, with its own specific goals and ends. Although these developments did not proceed in any linear manner in either China or the West, they mirrored the growing importance of science and technology in both societies. However, the desire to understand the function of the brain through observation of human behavior and deficits in patients was marked especially in the West. Ideas about cerebral localization paved the way to developments for mapping brain function--a path that has seen at least eight different technological approaches since the first successful measurements of brain electrical activity in 1920 by Hans Berger. (2) Each technological approach has different potential, limitations, and degrees of invasiveness, yet both share the notion that they enable, at least to some extent, mind reading.

As we enter more fully into the era of mapping and understanding the brain, society will face an increasing number of important ethical, legal, and social issues raised by these new technologies. One set of issues that is already upon us involves the use of neuroscience technologies for the purpose of lie detection. Companies are already selling "lie detection services." If these become widely used, the legal issues alone are enormous, implicating at least the First, Fourth, Fifth, Sixth, Seventh, and Fourteenth Amendments to the U.S. Constitution. (3) At the same time, the potential benefits to society of such a technology, if used well, could be at least equally large. This article argues that non-research use of these technologies is premature at this time. It then focuses on one of many issues and urges that we adopt a regulatory system that will assure the efficacy and safety of these lie-detection technologies.

This article begins by describing the history and functioning of brain-imaging technologies, particularly functioning magnetic resonance imaging (fMRI). It next discusses, and then critically analyzes, the peer-reviewed literature on the use of fMRI for lie detection. It ends by arguing for federal regulation of neuroscience-based lie detection in general and fMRI-based lie detection in particular.

II. AN INTRODUCTION TO BRAIN IMAGING WITH AN EMPHASIS ON fMRI (4)

A. AN EVOLUTION OF "CEREBROSCOPY" (5)

As Illes and colleagues reviewed in 2006, (6) the modern evolution of regional and whole-brain visualization techniques of brain structure and function has yielded ever-clearer bridges between molecules and mind. These advances have been possible in part because of new investigative paradigms and methods. While the earliest reliable noninvasive method, electro-encephalography (EEG), used electrical signals to localize and quantify brain activity, measures of metabolic activity using positron emission tomography (PET) and single photon emission computed tomography (SPECT) followed a few decades later in the 1960s. (7) EEG studies had exceptional benefits for revealing cognitive processing on the subsecond level, localizing epileptogenic foci, and monitoring patients with epilepsy. Penfield's corticography work enabled even more accurate measurements from recordings made directly from the cortex during neurosurgery. (8) PET and SPECT have been used widely in basic research studies of neurotransmission and protein synthesis, further advancing our knowledge of neurodegenerative disorders, affective disorders, and ischemic states. (9)

In the early 1970s, improved detection of weak magnetic fields produced by ion currents within the body enabled the recording of brain signals in the form of extracranial electromagnetic activity for the first time using a technique termed magnetoencephalography (MEG). (10) While not as popular or available as EEG, PET, or SPECT, MEG has still yielded fundamental knowledge about human language and cognition, in addition to important information about epilepsy and various psychiatric diseases. (11)

In the early 1990s, academic medicine witnessed the discovery of a much more powerful technique for measuring brain activity using magnetic resonance imaging (Mill) principles. Using functional Mill (fMRI), researchers can assess brain function in a rapid, non-invasive manner with a high degree of both spatial and temporal accuracy. Today even newer imaging techniques, such as near-infrared spectroscopy (NIRS), are on the horizon, with a growing body of promising results from visual, auditory and somatosensory cortex, speech and language, and psychiatry. (12)

fMRI is a good model for our discussion here, given its excellent spatial and temporal resolution, adaptability to experimental paradigms, and, most important for our purposes, the increasing rate and range of studies for which it is used. Illes and colleagues examined these increases, in fact, in a comprehensive study of the peer-reviewed literature of fMRI alone or in combination with other imaging modalities. (13) They showed an increase in the number of papers--from a handful in 1991, one year after initial proof of concept, to 865 in 2001. An updated database at the end of 2005 showed that in the four years since the original study, another 5300 papers had been published. (14) All told, that makes about 8700 papers since 1991.

B. MAPPING, MIND AND MEANING

1. From Signals to Meaning

For functional imaging to be possible, there must be measurable physiological markers of neural activity. Techniques like fMRI rely on metabolic correlates of neural activity, not on the activity itself. Images are then constructed based on blood-oxygenation-level dependent (BOLD) contrast.

BOLD contrast is an indirect measure of a series of processes. It begins with a human subject performing a behavioral task, for example, repetitive finger tapping. Neuronal networks in several brain regions are activated to initiate, coordinate, and sustain this behavior. These ensembles of neurons require large amounts of energy, in the form of adenosine triphosphate (ATP), to sustain their metabolic activity. Because the brain does not store its own energy, it must make ATP from the oxidation of glucose. Increased blood flow is required to deliver the necessary glucose and oxygen (bound to hemoglobin) to meet this metabolic demand.

Because the magnetic properties of oxyhemoglobin and deoxyhemoglobin are different, they show different signal intensities on an MRI scan. When a brain region is more metabolically active, more oxygenated hemoglobin is recruited to that area, which displaces a certain percentage of deoxyhemoglobin. This displacement results in a local increase in MR signal or BOLD contrast. Although direct neuronal response to a stimulus can occur on the order of milliseconds, increases in blood flow, or the hemodynamic response to this increased neural activity, has a one to two second lag. This hemodynamic response function is important in determining the temporal resolution of fMRI.

Blood flow, as with many other physiological processes in the human body, is influenced by many factors, including the properties of the red blood cells, the integrity of the blood vessels, and the strength of the heart muscle, in addition to the age, health, and fitness level of the individual. Fluctuations in any of these variables could affect the signal measured and the interpretation of that signal. For example, the velocity of cerebral blood flow decreases with age, (15) and similar differences may be induced by pathologic conditions. (16) By comparison, women using hormone replacement therapy may have enhanced cerebrovascular reactivity, thereby increasing the speed and size of their blood flow responses. (17) Some medications, such as anti-hypertensives, can also modify blood flow. (18) Because complete medical examinations are not routinely given to subjects recruited for fMRI studies as healthy controls, it is important to take this potential variability into account when interpreting and comparing fMRI data. Variability in blood flow is especially relevant when evaluating fMRI data taken from a single subject, as might be the case for diagnosing or monitoring psychiatric disease, or if and when fMRI is used eventually by the legal system as a lie detector.

2. Spatial and Temporal Resolution of fMRI

High spatial and temporal resolution is necessary for making accurate interpretations of fMRI data. Advances in MRI hardware over the past decade have greatly increased both, but the ultimate limitation is the correlation of the BOLD signal with underlying neuronal activity. The units of spatial resolution for an MRI scan of the brain are in voxels, or three-dimensional pixels. While modern MR scanners can acquire images at a resolution of less than one cubic millimeter, the task-dependent changes in the BOLD signal might not be large enough to detect in such a small volume of brain tissue. The measured signal from a voxel is directly proportional to its size: the smaller the voxel, the weaker the signal. In regions like the primary visual or motor cortex, where a visual stimulus or a finger-tap will produce a robust BOLD response, small voxels will be adequate to detect such changes. More complex cognitive functions, such as, most importantly for our purposes, moral reasoning or decision-making, use neural networks in several brain regions and therefore the changes in BOLD signal in any specific region (e.g., the frontal lobes) might not be detectible with small voxels. Thus, larger voxel sizes are needed to capture these small changes in neuronal activity, which results in decreased spatial resolution, fMRI is typically used to image brain regions on the order of a few millimeters to centimeters.

As the voxel size increases, the probability of introducing heterogeneity of brain tissue into the voxel also increases (these are referred to as partial volume effects). Instead of a voxel containing only neuronal cell bodies, it might also contain white matter tracts, blood vessels, or cerebrospinal fluid. These additional elements artificially reduce the signal intensity from a given voxel.

Several other processing steps in the analysis of fMRI data can also have implications for spatial resolution, fMRI data are typically smoothed using a Gaussian filter, which improves the reliability of statistical comparisons but in turn, decreases the image's spatial resolution. Spatial resolution is also sacrificed when fMRI data are transformed into a common stereotactic space (also referred to as normalization) for the purposes of averaging or comparing across subjects. Lastly, smaller voxel sizes require increased scanning time, which is often a limiting factor when dealing with certain behavioral paradigms or special subject populations (e.g., children, the elderly, or people in a mentally compromised state).

The temporal resolution of fMRI depends on the hemodynamic response of the brain and how frequently the response is sampled. The measured hemodynamic response in the brain rises and falls over a period of about ten seconds. The more frequently we sample this response, the better the estimate we can make as to the underlying neural activity. On average, fMRI has a resolution on the order of seconds, however latency differences as small as a few hundred milliseconds can be measured. These differences do not reflect the absolute timing of the neural activity, but rather a relative difference between different types of stimuli or different brain regions. The BOLD response is non-linear for multiple stimuli activating the same brain region. If the same brain region is activated in rapid succession, the BOLD response to the later stimuli is reduced compared to the initial stimulus.

3. Correlating Structure and Function

As functional neuroimaging relies on task-dependent activations under highly constrained conditions, (19) correlating structure and function is somewhat analogous to correlating genes and function. This is fraught with challenges, especially where variability can affect the BOLD signal changes. (20) On the most basic level, there are intrinsic properties of the MR scanner that increase variability in the recorded signal from the brain. Technical issues stem from differences between individual MRI sites and scanner drifts, resulting in artifacts and errors occurring within an individual site. Manufacturer upgrades that may also introduce changes in image quality and other features of a study, however welcome, also require ongoing consideration.

Variables outside the MRI device also reduce the accuracy of these readings. Muscles contract with swallowing, pumping blood through large vessels, and moving limbs to improve comfort. All these result in motion artifacts and physiological noise, thus introducing intra-subject variability into the data. Keeping subjects motionless in the scanner presents an especially great challenge when subjects from vulnerable populations are needed: patients suffering from executive function problems or severe memory impairments may find the long period of immobilization taxing.

Inter-subject variability is also a consideration, especially when understanding of single subject data is the goal. Aguirre and colleagues showed that the shape of the hemodynamic response across subjects is highly variable. (21) Thus, if two subjects performed the same task, the levels of BOLD signal may change, and consequently the activation maps might be different. It is also possible that two independent subjects will show different patterns of activation while their behavioral performances are comparable. Although subjects perform the same behavioral task, they might employ different strategies, thereby recruiting different neural networks, resulting in different patterns of activation. In interpreting fMRI activation maps, one must remember that changes in the BOLD signal are indirect inferences of neuronal activity, so all areas of significant activation may not be task-specific. A false positive activation may lead to erroneous conclusions that brain areas are associated with a function, when in fact they are not. Therefore, fMRI results do not definitively demonstrate that a brain region is involved in a task, but only that it is activated during the task. (22)

4. Designing an fMRI Experiment and Analyzing its Results

Part of the art of fMRI imaging is designing an experimental task that is simple and specific so that behavioral responses can be attributed to an isolated mental process and not confounded by other functions (a concept known as functional decomposition). For this reason, fMRI relies on a subtraction technique where the differences in BOLD signal between a control task and an experimental task lead to conclusions about the neuronal activity underlying the areas of increased activation. Therefore, it is essential to design the control and experimental tasks such that the variable of interest is the only variable different between the two tasks. The control condition alone is crucial as it can impact the downstream interpretation of images.

Among the most challenging issues in neuroimaging is the selection of the statistical treatment of the data. There are several stages of processing that are performed on the fMRI data. First, pre-processing prepares the data for statistical analysis. Second, a general linear model regression determines for each subject the degree to which changes in signal intensity for each voxel can be predicted by a reference waveform that represents the timing of the behavioral task. The third and final stage is population inference. This step involves using the regression results for each subject in a random effects model for making inferences to the population, or inferences regarding differences in the population. With this method, activation differences between experimental conditions or groups can be assessed over all brain regions and results can be reported in a standardized coordinate system. Commercial and freely available software packages for data analysis are widely used, but differences exist in the specific implementation of the statistics.

When the results from the regression and the random effects model are combined, the result is a statistical parameter map of brain activity. This activation map is typically color-coded according to the probability value for each voxel. The interpretability of fMRI activation maps then depends on how the data are displayed. The color-coded statistical maps are usually overlaid onto high-resolution anatomical MR images to highlight the brain anatomy. There are several ways to display these composite images. The most rigorous is to overlay the functional data onto single anatomical slices in any imaging plane. Alternatively, the activation maps can be presented on a brain rendered in three dimensions. While this technique gives good visualization of the prominent external brain structures, internal regions like the hippocampus or basal ganglia are not well characterized on these models. Researchers often use both of these techniques to examine data, but ultimately choose the one for presentation that best highlights the main results of the study.

5. Ethical Considerations

Ethical considerations for imaging the function of the brain can be examined in the context of two themes: conditions of the test that enable acquisition of data and conditions about the use of the data themselves.

a. Test Conditions

For any of the methods described above, requirements for the protection of human subjects and disclosure of risks, and benefits if any, must be followed. Of paramount importance is safety. Contraindications to EEG include, for example, dermatologic allergies to electrolytic glue used for ensuring good conductance of scalp electrodes. For MRI, subject claustrophobia, metal implants, and metal objects in the environment that can rapidly become projectiles in the presence of the strong magnetic field are foremost considerations. For all modalities, the accidental discovery of an anomaly that might have clinical significance is an important consideration, and procedures that will be taken for follow-up, if any, must be handled in a forthright manner when obtaining consent. (23) Long-term negative, even life-long reactions to repeat scanning or stimuli--particularly when unpleasant or frightening--are possible but extremely rare.

b. External Conditions

Ethics considerations about how data will be used outside the laboratory setting bring us back to the original questions that scholars asked about tu.

First, conceptualization: Given the substantial complexity of designing any imaging experiment, what acquisition and statistical protocols were applied? This question applies to what Illes and Racine have called design or paradigmatic bias. (24) Are the stimuli age-, gender- and culturally-appropriate? Will they generate results that are generalizable to populations not tested? Are there regions of the brain with significant activity that go unnoticed because they were not within the brain structure or statistical range of choice?

Second, biases of interpretation: What biases might interpreters bring to the maps of data? Investigator bias is inevitable given the nature of the imaging experiment and the necessity for human interpretation of images. Unlike results from a topographic map, for example, the meaning of an activation shown on an image is far from black and white.

Third, how will the map data be used? This third question raises perhaps the toughest ethical challenge of all: privacy, profiling, and predicting future behavior.

III. LIE DETECTION AND fMRI

Despite the long and dubious history of polygraphy and other methods for extracting information from humans, pursuing a method for detecting lying and deception has been an enduring focus of scientific endeavor in the neurobehavioral sciences. (25) Whether this interest is driven by human curiosity, an urge to find the telltale signature of falsehoods, or the perceived practical usefulness of results of such studies to society, is unclear. (26) It is certainly not for ease of use given the many layers of difficulty of capturing data with good real-world or "ecological" validity. (27) It is clear, however, that as new neurotechnologies emerge, their application to this domain is likely to be rapid.

Such improved methods of lie detection are currently the subject of a great deal of interest and research. Several different methods with a more or less scientific basis are being used or developed to detect deception. This section will scan the landscape of lie detection before focusing on efforts to use fMRI for this end. It will then review and critique the scientific literature on fMRI for lie detection.

A. NON-fMRI LIE DETECTION

Human efforts at lie detection must date to near the origin of our species; and, as some evidence points to the existence of intentional deception by other animals, (28) lie detection may well predate us. As soon as humans began to lie, other humans would need to assess whether they were being told the truth. All of us, often without consciously thinking about it, frequently assess the credibility of information, looking, among other things, for evidence that we are being lied to. We look and listen for signs of deception or nervousness. In some cases, we seek truth by coercion, either by legal compulsion ("I swear to tell the truth, the whole truth, and nothing but the truth") or, in some times and places, by physical compulsion, including torture.

Empirical efforts to measure things that could be associated with lying date back at least ninety years. (29) Although disputes exist about who should be given priority for the modern polygraph machine, many trace the concept to William Moulton Marston's early research on the link between deception and blood pressure. (30) Marston, who received his bachelors degree, law degree, and Ph.D. in psychology from Harvard, began to work on using systolic blood pressure as a marker of deception in 1915 in his early work as a psychology graduate student. (31) From then until his death in 1947, Marston continued to improve his lie detection devices and to promote their widespread use. The Frye case, (32) which for many years set the standard for admissibility of scientific evidence in federal court (and continues to play that role in some state courts), revolved around whether Marston's expert testimony about a polygraph examination that he claimed cleared a murder defendant should have been admitted in evidence. (33) The court held that the testimony was properly excluded because the technology lacked general acceptance in the scientific community and affirmed the conviction. (34)

Polygraphs measure several physiological features that are associated with nervousness or stress, such as systolic blood pressure (the first and more rapidly variable number in the familiar blood pressure measurement of, for example, 125/75), heart rate, breathing rate, and skin sweatiness (measuring the electrical conductivity of skin, known as galvanic skin response). (35) The polygraph has been widely used in the United States for various purposes; a National Research Council (NRC) committee estimates that several hundred thousand polygraph examinations are conducted each year in the United States. American courts, however, have never generally considered it sufficiently reliable for its results to be admitted into evidence. (36) In the wake of the Wen Ho Lee case at Los Alamos National Laboratory, in which polygraph examinations played a role, the NRC was asked to report on the value of polygraph evidence. The NRC committee produced a careful report, concluding that the polygraph was not sufficiently valid to be used regularly in national security screening:

 
   Polygraph testing yields an unacceptable choice for DOE [Department 
   of Energy] employee security screening between too many loyal 
   employees falsely judged deceptive and too many major security 
   threats left undetected. Its accuracy in distinguishing actual or 
   potential security violators from innocent test takers is 
   insufficient to justify reliance on its use in employee security 
   screening in federal agencies. (37) 

The federal government has ignored this recommendation and continues to use the polygraph widely in employee security screening.

Polygraphy is the best-established method of lie detection, but other methods exist or are under development. Some, such as voice stress detectors, have little or no scientific support, (38) although they appear to be widely sold and used. Other methods under investigation may be more promising, although it is too early to know how reliable any of them will prove. Five of them deserve attention: fMRI, EEG, near infra-red spectroscopy (NIRS), facial microexpressions, and periorbital thermography. This article focuses on fMRI because it is the most explored and, apparently, the most advanced, but the four other methods will be briefly described.

a. Electroencephalography

As discussed earlier, EEGs measure electric currents generated by the brain. One particular kind of EEG measurement claimed to be useful in detecting lies is the "P300"--a wave of electrical signal, measured at the scalp, that occurs approximately 300 milliseconds after a subject receives a stimulus. The analysis of the timing and shape of this waveform has some meaning, but the credibility of its usefulness is undercut by the hype given it by its leading proponent, Lawrence Farwell. (39)

Farwell is an electrophysiologist who has, for over fifteen years, argued that human P300 waves can be used as a "guilty knowledge" test, to determine whether, for example, a suspect has ever seen the site of a crime. (40) Farwell refers to this process as "brain fingerprinting" and has been selling brain fingerprinting for several years through Brain Fingerprinting Laboratories, a privately held company. (41) The company's website claims that in more than 175 tests, the method has produced inconclusive results six times and has been accurate every other time. (42) Farwell's work, however, has not been substantially vetted in the peer-reviewed literature. (43) Apparently, the only article he has published on his technology in a peer-reviewed journal is a 2001 on-line article in the Journal of Forensic Science where he and a co-author reported on a successful trial of his method with six subjects. (44) He has not revealed any further evidence to support his claims of high accuracy, protecting it as a trade secret. He is an inventor on four patents that are relevant to this work. (45)

Farwell's claims are widely discounted in the relevant scientific community and his credibility is not helped by his inflated claims for the judicial acceptance of his technique. The company's website states that "Iowa Supreme Court overturns the 24 year old conviction of Terry Harrington, Brain Fingerprinting test aids in the appeals." (46) In fact, the Iowa district court (not a federal district court, as the website claims), in an unpublished opinion, rejected Harrington's petition for post-conviction relief on several grounds in spite of that testimony. It did admit the brain fingerprinting evidence, but it may have been the case that the lower court judge admitted the testimony merely to deprive Harrington of one ground for appeal. Harrington appealed and the Iowa Supreme Court reversed for reasons unrelated to the brain fingerprinting test. (47) As to the brain fingerprinting evidence, the Iowa Supreme Court specifically said "Because the scientific testing evidence is not necessary to a resolution of this appeal, we give it no further consideration." (48) The company's website reports, accurately but with a misleading implication, that "The Iowa Supreme Court left undisturbed the law of the case establishing the admissibility of the Brain Fingerprinting evidence." (49)

b. Near-infrared Spectroscopy ("NIRS")

NIRS provides a way to measure changes in blood flow--the same goal as fMRI--in some parts of the brain without the complex apparatus of an MRI machine. The basis of the technology is the measurement of how near-infrared light is scattered or absorbed by various materials. Its application in neuroscience stems from its ability to measure blood flow changes in parts of the brain (when used for brain studies, the technique is sometimes called Optical Topography). …

Log in to your account to read this article – and millions more.