MEDICAL IMAGE ANALYSIS METHODS
THE ELECTRICAL ENGINEERING AND APPLIED SIGNAL PROCESSING SERIES Edited by Alexander Po...
189 downloads
1140 Views
11MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
MEDICAL IMAGE ANALYSIS METHODS
THE ELECTRICAL ENGINEERING AND APPLIED SIGNAL PROCESSING SERIES Edited by Alexander Poularikas The Advanced Signal Processing Handbook: Theory and Implementation for Radar, Sonar, and Medical Imaging Real- Time Systems Stergios Stergiopoulos The Transform and Data Compression Handbook K.R.Rao and RC.Yip Handbook of Multisensor Data Fusion David Hall and James Llinas Handbook of Neural Network Signal Processing Yu Hen Hu and Jenq-Neng Hwang Handbook of Antennas in Wireless Communications Lai Chand Godara Noise Reduction in Speech Applications Gillian M.Davis Signal Processing Noise Vyacheslav P.Tuzlukov Digital Signal Processing with Examples in MATLAB® Samuel Stearns Applications in Time-Frequency Signal Processing Antonia Papandreou-Suppappola The Digital Color Imaging Handbook Gaurav Sharma Pattern Recognition in Speech and Language Processing Wu Chou and Biing-Hwang Juang Propagation Handbook for Wireless Communication System Design Robert K.Crane Nonlinear Signal and Image Processing: Theory, Methods, and Applications
Kenneth E.Barner and Gonzalo R.Arce Smart Antennas Lai Chand Godara Mobile Internet: Enabling Technologies and Services Apostolis K.Salkintzis and Alexander Poularikas Soft Computing with MATLAB® Ali Zilouchian Wireless Internet: Technologies and Applications Apostolis K.Salkintzis and Alexander Poularikas Signal and Image Processing in Navigational Systems Vyacheslav P.Tuzlukov Medical Image Analysis Methods Lena Costaridou THE ELECTRICAL ENGINEERING AND APPLIED SIGNAL PROCESSING SERIES
MEDICAL IMAGE ANALYSIS METHODS
Edited by Lena Costaridou
Boca Raton London New York Singapore
A CRC title, part of the Taylor & Francis imprint, a member of the Taylor & Francis Group, the academic division of T&F Informa pic. Published in 2005 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487–2742 © 2005 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group This edition published in the Taylor & Francis e-Library, 2005. “ To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to http://www.ebookstore.tandf.co.uk/.’
No claim to original U.S. Government works 10 9 8 7 6 5 4 3 2 1 ISBN 0-203-50045-8 Master e-book ISBN
ISBN 0-203-61563-8 (Adobe eReader Format)
International Standard Book Number-10:0-8493-2089-5 (Print Edition) (Hardcover) International Standard Book Number-13:978-0-8493-2089-7 (Print Edition) (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com//) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978–750–8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Catalog record is available from the Library of Congress
Taylor & Francis Group is the Academic Division of T&F Informa plc. Visit the Taylor & Francis Web site at http://www.crcpress.com and the CRC Press Web site at http://www.taylorandfrancis.com
Preface A multitude of medical-imaging modalities are used to probe the human body. The richness of information provided by these techniques combined with the availability of computational resources have provided the basis for the development of precise and quantitative image-processing and -analysis methods, aiming to provide valuable tools in diagnostic medical-image interpretation. Such diagnostic tools can be dif- ferentiated into two categories: image-processing methods that enhance visual inter- pretation of digital images and image-analysis methods that provide automated quantitative tissue detection, delineation, measurements, and characterization. This book is intended as a reference tool for medical physicists, biomedical engineers, computer scientists, electrical engineers, and radiologists involved in health-care delivery and research. It consists of 12 chapters. Chapters 1 to 5 present algorithms or aspects of algorithms that analyze images generated by a certain modality providing detection or diagnostic decisions, termed computer-aided diag- nosis (CAD). CAD represents one of the most successful paradigms of medicalimage analysis by incorporating most of the significant developments that have occurred in enhancement and segmentation of candidate features, in feature extrac- tion and classification, and in reduction or characterization of false positives. Chapter 6 discusses a wavelet method for image enhancement. Chapters 7 and 8 focus on segmentation methods. These methods—aimed at partitioning images into meaningful segments with respect to a certain task of identification of tissue structure, function, or pathology—are initial steps of auto- mated methods. They also have become essential in imaging modalities providing volumetric data. Analysis involving multiple images, such as volumetric or serial imaging, requires derivation of spatial transformations to provide correspondence between homologous image points, with emphasis on data-driven optimized methods. A registration method is presented in Chapter 9. Paradigms of the analysis methods used in bioinformatics and neurosciences are provided in Chapters 10 and 11, respectively. Chapter 12 reviews the methodologies used to evaluate medical-image process- ing and analysis methods, an issue of critical importance for their optimization, selection, and clinical acceptance. I wish to thank Dr. Alexander Poularikas, professor of electrical and computer engineering, University of Alabama in Huntsville, Alabama for offering me the opportunity to edit this book; Michael Slaughter, CRC Press editor, for his guidance and patience as well as all members of our project editing team at CRC Press. I also wish to thank Dr. Athanassios Skodras, professor of digital systems, Hellenic Open University, Greece, and Dr. Metin Akay, associate professor of engi- neering, Dartmouth College, Hanover, New Hampshire, for supporting my efforts at early and late phases of this project.
My deepest appreciation is extended to chapter authors for contributing their expertise, as well as for their enthusiasm, patience, and cooperation during initial manuscript preparation over the Internet. Special acknowledgment is due my home Department of Medical Physics, School of Medicine, University of Patras, Greece, and especially to the members of the medical imaging team, headed by Professor George Panayiotakis, who has encouraged me at all phases of this work. Among colleagues, Dr. Spyros Skiadopoulos has offered constant support to the extensive commuication and information exchange required. Finally, thanks to my mother Melpomeni and my daughter Melenia for their loving support. Lena Costaridou Patras, Greece, 2005
The Editor Lena Costaridou received a diploma of physics from the Department of Physics of the University of Patras, Greece, a M.Sc. degree in medical engineering from the Department of Electrical Engineering and Applied Sciences of the George Washington University, Washington, DC, and a Ph.D. degree in medical physics from the University of Patras, Greece. She is an assistant professor in the Department of Medical Physics, School of Medicine, University of Patras. Her research interests include medical-image processing and analysis, especially mammographic image analysis, and evaluation of medical-imaging systems and techniques. She is the author or coauthor of 30 articles in international peer-reviewed journals and more than 60 international conference papers.
Contributors Carlos Alberola-López Laboratorio de Procesado de Imagen (LPI), ETSI Telecomunicación, Universidad de Valladolid, Spain Laura Astolfi Dipartimento di Informatica e Sistemistica, University of “La Sapienza,” Roma, Italia; Fondazione Santa Lucia IRCCS, Roma, Italia Fabio Babiloni Fondazione Santa Lucia IRCCS, Roma, Italia; Dipartimento di Fisiologia umana e Farmacologia, University of “La Sapienza,” Roma, Italia Heang-Ping Chan Department of Radiology, University of Michigan, Ann Arbor, MI, U.S. Christodoulos I.Christodoulou Department of Computer Science, University of Cyprus, Nicosia, Cyprus; Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus Febo Cincotti Fondazione Santa Lucia IRCCS, Roma, Italia Jan Cornells Vrije Universiteit Brussel, Faculty of Applied Sciences, Department of Electronics and Information Processing, Brussels, Belgium Luciano da Fontura Costa Cybernetic and Vision Research Group, Institute of Fysics of São Carlos, University of São Paulo, São Paulo, Brazil Lena Costaridou Department of Medical Physics, School of Medicine, University of Patras, Patras, Greece Dimitrios I.Fotiadis Department of Computer Science, University of Ioannina, Unit of Medical Technology and Intelligent Information Systems; Biomedical Research InstituteFORTH, Ioannina, Greece Lubomir Hadjiiski Department of Radiology, University of Michigan, Ann Arbor, MI, U.S. Maria Kallergi Department of Radiology, H.Lee Moffitt Cancer Center & Research Institute, University of South Florida, Tampa, FL, U.S. Antonis Katartzis Vrije Universiteit Brussel, Faculty of Applied Sciences, Department of Electronics and Information Processing, Brussels, Belgium Efthyvoulos Kyriacou Department of Computer Science, University of Cyprus, Nicosia, Cyprus; Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus Sarah Lee
Communications and Signal Processing Group, Department of Electrical and Electronic Engineering, Imperial College London, London, U.K. Donatella Mattia Fondazione Santa Lucia IRCCS, Roma, Italia Slawomir J.Nasuto Department of Cybernetics, University of Reading, Reading, U.K. Andrew Nicolaides Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus. George Panayiotakis Department of Medical Physics, School of Medicine, University of Patras, Patras, Greece Marios Pantziaris Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus Athanassios N.Papadopoulos Department of Medical Physics, Medical School, University of Ioannina, Unit of Medical Technology and Intelligent Information Systems; Biomedical Research InstituteFORTH, Ioannina, Greece Sophie Paquerault Center for Devices and Radiological Health, U.S. Food and Drug Administration, Rockville, MD, U.S. Constantinos S.Pattichis Department of Computer Science, University of Cyprus, Nicosia, Cyprus Marios S.Pattichis Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM, U.S. Nicholas E.Patrick Center for Devices and Radiological Health, U.S. Food and Drug Administration, Rockville, MD, U.S. Marina E.Plissiti Department of Computer Science, University of Ioannina, Unit of Medical Technology and Intelligent Information Systems; Biomedical Research InstituteFORTH, Ioannina, Greece Ioannis Pratikakis Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos,” Athens, Greece Virginie F.Ruiz Department of Cybernetics, University of Reading, Reading, U.K. Juan Ruiz-Alzola Medical Technology Center, University of Las Palmas de Gran Canada, Spain Berkman Sahiner Department of Radiology, University of Michigan, Ann Arbor, MI, U.S. Hichem Sahli Vrije Universiteit Brussel, Faculty of Applied Sciences, Department of Electronics and Information Processing, Brussels, Belgium Philipos Sakellaropoulos Department of Medical Physics, School of Medicine, University of Patras, Patras, Greece Serenella Salinari
Dipartimento di Informatica e Sistemistica, University of “La Sapienza,” Roma, Italia Spyros Skiadopoulos Department of Medical Physics, School of Medicine, University of Patras, Patras, Greece Eduardo Suarez-Santana Department of Signals and Communications, University of Las Palmas, Gran Canada, Spain Tania Stathaki Communications and Signal Processing Group, Department of Electrical and Electronic Engineering, Imperial College London, London, U.K. Carl-Fredrik Westin Department of Radiology, Harvard Medical School and Brigham & Women’s Hospital, Cambridge, MA, U.S.
Contents 1 Chapter Computer-Aided Diagnosis of Breast Cancer 1 Heang-Ping Chan, Berkman Sahiner, Nicholas Petrick, Lubomir Hadjiiski, and Sophie Paquerault 52 Chapter Medical-Image Processing and Analysis for CAD Systems 2 Athanassios N.Papadopoulos, Marina E.Plissiti, and Dimitrios I.Fotiadis 87 Chapter Texture and Morphological Analysis of Ultrasound Images of the 3 Carotid Plaque for the Assessment of Stroke Christodoulos I.Christodoulou, Constantinos S.Pattichis, Efthyvoulos Kyriacou, Marios S.Pattichis, Marios Pantziaris, and Andrew Nicolaides 137 Chapter Biomedical-Image Classification Methods and Techniques 4 Virginie F.Ruiz, and Slawomir J.Nasuto Chapter Texture Characterization Using Autoregressive Models with 5 Application to Medical Imaging Sarah Lee and Tania Stathaki Chapter Locally Adaptive Wavelet Contrast Enhancement 6 Lena Costaridou, Philipos Sakellaropoulos, Spyros Skiadopoulos, and George Panayiotakis Chapter Three-Dimensional Multiscale Watershed Segmentation of MR 7 Images Ioannis Pratikakis, Hichem Sahli, and Jan Cornells Chapter A MRF-Based Approach for the Measurement of Skin Thickness in 8 Mammography Antonis Katartzis, Hichem Sahli, Jan Cornells, Lena Costaridou, and George Panayiotakis Chapter Landmark-Based Registration of Medical-Image Data 9 J.Ruiz-Alzola, E.Suarez-Santana, C.Alberola-Lopez, and CarlFredrik Westin Chapter Graph-Based Analysis of Amino Acid Sequences 10 Luciano da Fontura Costa
186
Chapter Estimation of Human Cortical Connectivity with Multimodal 11 Integration of fMRI and High-Resolution EEG Laura Astolfi, Febo Cincotti, Donatella Mania, Serenella Salinari, and Fabio Babiloni
426
221
266
345
371
392
Chapter Evaluation Strategies for Medical-Image Analysis and Processing 12 Methodologies Maria Kallergi Index
466
506
1 Computer-Aided Diagnosis of Breast Cancer Heang-Ping Chan, Berkman Sahiner, Nicholas Petrick, Lubomir Hadjiiski, and Sophie Paquerault 1.1 INTRODUCTION Mammography is currently the only proven and cost-effective method to detect early breast cancer. A mammographic examination generally contains four images, two views for each breast. One is a craniocaudal (CC) view, and the other is a mediolateral oblique (MLO) view. These two views are designed to include most of the breast tissues within the X-ray images. Mammographic interpretation can be con- sidered a two-step process. A radiologist first screens the mammograms for abnor- malities. If a suspicious abnormality is detected, further diagnostic workup is then performed to estimate the likelihood that the abnormality is malignant. Diagnostic workup might include mammograms of additional views such as lateromedial (LM) or exaggerated craniocaudal (XCC) views, magnification views, spot views, as well as ultrasound scanning of the suspicious area. The main mammographic signs of breast cancer are clustered microcalcifications and masses. Microcalcifications are calcium deposits in the breast tissue manifested as clusters of white specks of sizes from about 0.05 mm to 0.5 mm in diameter. Masses have X-ray absorption similar to that of fibroglandular tissue and are man- ifested as focal lowoptical-density regions on mammograms. Some benign breast diseases also cause the formation of clustered microcalcifications and masses in the breast. The mammographic features of the malignant microcalcifications or masses are nonspecific and have a large overlap with those from benign diseases. Because of the nonspecific features of malignant lesions, mammographic interpretation is a very challenging task for radiologists. Studies indicate that the sensi- tivity of breast cancer detection on mammograms is only about 70 to 90% [1–6]. In a study that retrospectively reviewed prior mammograms taken of breast cancer patients before the exam in which the cancer was detected, it was found that 67% (286/427) of the cancers were visible on the prior mammograms and about 26% (112/427) were considered actionable by radiologists [7]. Missed cancers can be caused by detection errors or characterization errors. Detection errors can be attributed to factors such as oversight or camouflaging of the lesions by overlapping tissues. Even if a lesion is detected, the radiologist may underestimate the likelihood of malignancy of the lesion so that no action is taken. This corresponds to a characterization error. On the other hand, the radiologist may overestimate the likelihood of malignancy and recommend benign lesions for biopsy. It has been reported that of the
Medical image analysis method
2
lesions that radiologists recommended for biopsy, only about 15 to 30% are actually malignant [8]. The large number of benign biopsies not only causes patient anxiety, but also increases health-care costs. In addition, the scar tissue resulting from biopsy often makes it more difficult to interpret the patient’s mammograms in the future. The sensitivity and specificity of mammography for detecting a lesion and differentiating the lesion as malignant or benign will need to be improved. It can be expected that early diagnosis and treatment will further improve the chance of survival for breast cancer patients [9–12]. Various methods are being developed to improve the sensitivity and specificity of breast cancer detection [13]. Double reading can reduce the miss rate of radiographic reading [14, 15]. However, double reading by radiologists is costly. Com- puter-aided detection (CAD) is considered to be one of the promising approaches that may improve the efficacy of mammography [16, 17]. Computer-aided lesion detection can be used during screening to reduce oversight of suspicious lesions that warrant further diagnostic workup. Computer-aided lesion characterization can also be used during workup to provide additional information for making biopsy recom- mendation. It has been shown that CAD can improve radiologists’ detection accuracy significantly [18–23]. Receiver operating characteristic (ROC) studies [24, 25] showed that computer-aided characterization of lesions can improve radiologists’ ability in differentiating malignant and benign masses or microcalcifications. CAD is thus a viable cost-effective alternative to double reading by radiologists. The promise of CAD has stimulated research efforts in this area. Many com- puter vision techniques have been developed in various areas of CAD for mam- mography. Examples of work include: detection of microcalcifications [18, 26–38], characterization of microcalcifications [39–49], detection of masses [19, 40, 50–73], and characterization of masses [24, 74–78]. Computerized classification of mammographic lesions using radiologist-extracted features has also been reported by a number of investigators [79– 84]. There are similarities and differ- ences among the computer vision techniques used by researchers. However, it is difficult to compare the performance of different detection programs because the performance strongly depends on the data set used for testing. These studies generally indicate that an effective CAD system can be developed using properly designed computer vision techniques. Efforts to evaluate the usefulness of CAD in reducing missed cancers are ongo- ing. Results of a prospective study by Nishikawa et al. [85] indicated that their CAD algorithms can detect 54% (9/16) of breast cancer in the prior year with four false positives (FPs) per image when the mammograms were called negative but the cancer was visible in retrospect. In our recent study of detection on independent prior films [86], we found that 74% (20/27) of the malignant masses and 57% (4/7) of the malignant microcalcifications were detected with 2.2 mass marks/image and 0.8 cluster marks/image by our computer programs. A commercial system also reported a sensitivity of 77% (88/115) in one study [7] and 61% (14/23) in another study [87] for detection of the cancers in the prior years that were considered actionable in retrospect by expert mammographers. A prospective study of 12,860 patients in a community breast cancer center with a commercial CAD system that had about one mark per image reported a cancer detection rate of 81.6% (40/49), with eight of the cancers initially detected by computer only. This corresponded to a 20% increase in the number of cancers detected
Computer-aided diagnosis of breast cancer
3
(41 vs. 49) when radiologists used CAD. Similar gain in cancer detection has been observed in a premarket retrospective study of another commercial system [23]. These results demonstrate that, even if a CAD system does not detect all cancers present and has some FPs, it can still reduce the missed cancer rate when used as a second opinion by radiologists. This is consistent with the first laboratory ROC study in CAD reported by us in 1990 [18], which demonstrated that a CAD program with a sensitivity of 87% and an FP rate of 0.5 to 4 per image could significantly improve radiologists’ accuracy in detection of subtle microcalcifications. In a recent prospective pilot clinical trial [88] of a CAD system developed by our group, a total of 11 cancers were detected in a screening patient cohort of about 2600 patients. The radiologists detected 10 of the 11 cancers without our CAD system. The CAD system also detected 10 of the 11 cancers. However, one of the computer-detected cancers was different from those detected by the radiologists, and this additional cancer was diagnosed when the radiologist was alerted to the site by the CAD system. In a 1-year follow-up of the cases, it was found that five more cancers were diagnosed in the patient cohort. Our computer system marked two of the five cancers, although all five cancers were deemed not actionable in the year of the pilot study when the mammograms were reviewed retrospectively by an experi- enced radiologist. For classification of malignant and benign masses, our ROC study [24] indicated that a classifier with an area under the ROC curve, Az, of 0.92 could significantly improve radiologists’ classification accuracy with a predicted increase in the positive predictive value of biopsy. Jiang et al. [25] also found in an ROC study that their classifier with an Az of 0.80 could significantly improve radiologists’ characterization of malignant and benign microcalcifications, with a predicted reduction in biopsies. Recently, Hadjiiski et al. [89, 90] performed an ROC study to evaluate the effects of a classifier based on interval-change analysis on radiologists’ classification accu- racy of masses in serial mammograms. They found that when the radiologists took into account the rating of the computer classifier, they reduced the biopsy recom- mendation of the benign masses in the data set while slightly increasing the biopsy recommendation of the malignant masses. This result indicated that CAD improved radiologists’ accuracy in classifying malignant and benign masses based on serial mammograms and has the potential of reducing unnecessary biopsy. In the last few years, full-field digital mammography (FFDM) technology has advanced rapidly because of the potential of digital imaging to improve breast cancer detection. Four manufacturers have obtained clearance from the Food and Drug Administration (FDA) for clinical use. It is expected that digital mammography detectors will provide higher signal-to-noise ratio (SNR) and detective quantum efficiency (DQE), wider dynamic range, and higher contrast sensitivity than digitized film mammograms. Because of the higher SNR and linear response of digital detec- tors, there is a strong potential that more effective feature-extraction techniques can be designed to optimally extract signal content from the direct digital images and improve the accuracy of CAD. The potential of improving CAD accuracy by exploit- ing the imaging properties of digital mammography is a subject of ongoing research. In mammographic screening, it has been reported that taking two views of each breast, a CC and an MLO view, provides a higher sensitivity and specificity than one view for breast cancer detection [2, 91–93]. Radiologists use the two views to confirm true
Medical image analysis method
4
positives (TPs) and to reduce FPs. Current CAD algorithms detect lesions only on a single mammographic view. New CAD algorithms that utilize the correlation of computer-detected lesions between the two views are being developed [69, 94–99]. Our studies demonstrated that the correlated lesion information from two views could be used to reduce FPs and improve detection [100, 101]. Although the development is still at the early stage and continued effort is needed to further improve the two-view correlation techniques, this promising development will be summarized here in the hope that it will stimulate research interests. Another important technique that radiologists use in mammographic interpreta- tion is to compare the current and prior mammograms and to evaluate the interval changes. Interval-change analysis can be used to detect newly developed abnormality or to evaluate growth of existing lesions. Hadjiiski et al. [97, 98] developed a regionalregistration technique to automatically identify the location of a correspond- ing lesion on the same view of a prior mammogram. Feature extraction and classi- fication techniques could then be developed to differentiate malignant and benign lesions using intervalchange information. Interval-change features were found to be useful in improving the classification accuracy. In this chapter, we will concentrate on lesion detection, rather than characterization. Computer vision methods for clas- sification of malignant and benign lesions, including interval-change analysis, can be found in the literature [89, 90, 97, 98]. 1.2 COMPUTERIZED DETECTION OF MICROCALCIFICATIONS Clustered microcalcifications are seen on mammograms in 30 to 50% of breast cancers [102–106]. Because of the small sizes of microcalcifications and the rela- tively noisy mammographic background, subtle microcalcifications can be missed by radiologists. Computerized methods for detection of microcalcifications have been developed by a number of investigators. Chan et al. [18, 26, 27] designed a difference-image technique to detect microcalcifications on digitized mammograms and to extract these features to distinguish true and false microcalcifications. A convolution neural network was developed to further recognize true and false patterns [28]. Wu et al. [107] used the difference-image technique [26] for prescreening of microcalcification sites, and then classified their power-spectra features with an artificial neural network to differentiate true and false microcalcifications. Zhang et al. [36] further modified the detection system by using a shift-invariant neural network to reduce false-positive microcalcifications. Fam et al. [108] and Davies et al. [29] detected microcalcifications using conventional image processing techniques. Qian et al. [30] developed a tree-structure filter and wavelet transform for enhance- ment of microcalcifications. Other investigators trained classifiers to classify microcalcifications and false detections based on morphological features such as contrast, size, shape, and edge gradient [31–35, 109–112]. Zheng et al. [37] used a differenceof-Gaussian band-pass filter to enhance the microcalcifications and then used mul- tilayer feature analysis to identify true and false microcalcifications. Although the details of the various microcalcification-detection algorithms differ, many have sim- ilar major steps.
Computer-aided diagnosis of breast cancer
5
In the first step, the image is processed to enhance the signal-to-noise ratio (SNR) of the microcalcifications. Second, microcalcification candidates are seg- mented from the image background. In the third step, features of the candidate signals are extracted, and a feature classifier is trained or some rule-based methods are designed to distinguish true signals from false signals. In the last step, a criterion is applied to the remaining signals to search for microcalcification clusters. The computer vision methods used in our microcalcification-detection program are dis- cussed in the following subsection as an example. 1.2.1 METHODS 1.2.1.1 Preprocessing Technique Microcalcifications on mammograms are surrounded by breast tissues of varied densities. The background gray levels thus vary over a wide range. A preprocessing technique that can suppress the background and enhance the signals will facilitate segmentation of the microcalcifications from the image. Chan et al. [18, 26–28, 113] first demonstrated that a difference-image technique can effectively enhance microcalcifications on digitized mammograms. In the difference-image technique, a signalenhancement filter enhances the microcalcifications and a signal-suppression filter suppresses the microcalcifications and smoothes the noise. By taking the difference of the two filtered images, an SNRenhanced image is obtained in which the lowfrequency structured background is removed and the high-frequency noise is sup- pressed. When both the signal-enhancement filter and the signal-suppression filter are linear, the difference-image technique is equivalent to band-pass filtering with a fre- quency band adjusted to amplify that of the microcalcifications. Nonlinear filters can also be designed for enhancement or suppression of the microcalcifications. An example of a signal-suppression filter is a median filter, the kernel size of which can be chosen to remove microcalcifications and noise from the mammograms [26]. Other investiga- tors used preprocessing techniques such as wavelet filtering [30] and difference-ofGaussian filters [36] in the initial step of their microcalcification-detection programs. These techniques can be considered variations of the difference-image technique. 1.2.1.2 Microcalcification Segmentation After the SNR enhancement, the background gray level of the mammograms is relatively constant. This facilitates the segmentation of the individual microcalcifi- cations from the background. Our approach is to first employ a gray-level thresh- olding technique to locate potential signal sites above a global threshold. The global threshold is adapted to a given mammogram by an iterative procedure that automat- ically changes the threshold until the number of sites obtained falls within the chosen input maximum and minimum numbers. At each potential site, a locally adaptive gray-level thresholding technique in combination with region growing is then per- formed to extract the connected pixels above a local threshold, which is calculated as the product of the local root-mean-square (RMS) noise and an input SNR threshold. The features of the extracted signals—such as
Medical image analysis method
6
the size, maximum contrast, SNR, and its location—will also be extracted during segmentation. 1.2.1.3 Rule-Based False-Positive Reduction In the false-positive reduction step, we combine rule-based classification with an artificial neural network to distinguish true microcalcifications from noise or arti- facts. The rulebased classification includes three rules: maximum and minimum number of pixels in a calcification, and contrast. The two rules on the size exclude signals below a certain size, which are likely to be noise, and signals greater than a certain size, which are likely to be large benign calcifications. The contrast rule sets an upper bound to exclude potential signals that have a contrast greater than an input number of standard deviations above the average contrast of all potential signals found with local thresholding. This rule excludes the very-high-contrast signals that are likely to be image artifacts and large benign calcifications. After rule-based classification, a convolution neural network (CNN) [28] was trained to further reduce false signals, as detailed in the next subsection. 1.2.1.4 False-Positive Reduction Using Convolution Neural Network Classifier The CNN is based on the neocognitron structure [114] designed to simulate the human visual system. It has been used for detection of lung nodules on chest radio- graphs, detection of microcalcifications on mammograms, and classification of mass and normal breast tissue on mammograms [28, 115, 116]. The general architecture of the CNN used in this study is shown in Figure 1.1. The input to the CNN is a regionof-interest (ROI) image, extracted for each of the potential signal sites. The nodes in the hidden layers are arranged in groups, as are the weights associated with each node; each weight group functions like a filter kernel. The CNN is trained to classify the input ROI as containing a true microcalcification (TP) or a false signal (FP). In the imple- mentation used in this study, the CNN had one input node, two hidden layers, and one output node. All node groups in the two hidden layers were fully connected. Training was performed with an error back propagation delta-bar-delta rule. There were N1 node groups in the first hidden layer, and N2 node groups in the second hidden layer. The kernel sizes of the first group of filters between the input node and the first hidden layer were K1×K1, and those of the second group of filters between the first and second hidden layer were K2×K2. For a CNN, learning is constrained such that forward signal propagation is similar to a spatially invariant convolution operation; the signals from the nodes in the lower layer are convolved with the weight kernel, and the resultant value of the convolution is collected into the corresponding node in the upper layer. This value is further processed by the node through a sigmoidal activation function and produces an output signal that will, in turn, be forward propagated to the subsequent layer in a similar manner. The convolution kernel incorporates the neighborhood information in the input image pattern and transfers the information to the receiving layers, thus providing the pattern-recognition capability of the CNN.
Computer-aided diagnosis of breast cancer
7
The neural-network architecture used in many studies was selected using a manual optimization technique [28] We evaluated the use of automated optimization methods for selecting an optimal CNN architecture [117]. Briefly, three automated
FIGURE 1.1 Schematic diagram of the architecture of a convolution neural network. The input to the CNN is a region-of-interest (ROI) image extracted for each of the detected signals. The output is a scalar that is the relative rating by the CNN representing the likelihood that the input ROI contains a true microcalcification or a false-positive signal. methods, the steepest descent (SD), the simulated annealing (SA), and the genetic algorithm (GA) were compared. Four main parameters of the CNN architecture, N1, N2, K1, and K2, were considered for optimization. The area under the ROC curve, Az, [118] was used to design a cost function. The SA experiments were conducted with four different annealing schedules. Three different parent selection methods were compared for the GA experiments. The CNN was optimized with a set of ROI images extracted
Medical image analysis method
8
from 108 mammograms. The suspected microcalcifications were detected after the initial steps of the microcalcification-detection program [28]. The detected signals were labeled as TP or FP automatically based on the ground truth of the data set. A 16×16-pixel ROI centered at the signal site was extracted for each of the detected locations, and these ROI images were used for training and testing the CNN. The microcalcification-detection program detected more FP ROIs than TP ROIs at the prescreening stage. For classifier training, it is more efficient to have approximately equal numbers of TP and FP ROIs. Therefore, only a randomly selected subset of FP ROI images was used. The selected ROIs were divided into two separate groups, one for training and the other for monitoring the classification accuracy of the trained CNN. Each group contained more than 1000 ROIs. Another data set of 152 mammograms, which was different from the set of 108 mammograms employed for optimization of the CNN, was used for validation of the detection program in combination with the CNN classifier. The optimal architecture (N1N2-K1-K2) was determined to be 14–10–5–7 using the training and validation sets. This optimal CNN architecture was then compared with the CNN architecture of 12–8–5–3 determined by a manual search technique [28]. For comparison of the performance of the CNN of different architectures, an independent data set of 472 digitized mammograms was used. This test data set was selected from the University of South Florida (USF) digitized mammogram database, which is publicly available over the Internet [119]. From the available cases in this database, only malignant cases that were digitized with the Lumisys 200 laser scanner were selected (volumes: cancer_01, cancer_02, cancer_05, cancer_09, and cancer_15). The data set contained 272 biopsy-proven microcalcification clusters, of which 253 were malignant and 19 were benign. There were 184 mammograms free of microcalcifications [119]. All mammograms in the training, validation, and test sets were digitized at a pixel resolution of 0.05×0.05 mm with 4096 gray levels. The images were converted to 0.1×0.1-mm resolution by averaging adjacent 2×2 pixels and subsampling. The detection was carried out on the 0.1×0.1-mm resolution images. 1.2.1.5 False-Positive Reduction Using Clustering A final step to reduce false positives is clustering. This approach is devised based on clinical experiences that the likelihood of malignancy for clustered microcalcifications is generally much greater than sparsely scattered microcalcifications [102– 106]. Chan et al. [28, 113] designed a dynamic clustering procedure to identify clustered microcalcifications. The image is initially partitioned into regions and the number of potential signals in each region is determined. A region with a higher concentration of potential signals is given a higher priority as a starting region to grow a cluster. The cluster grows by searching for new members in its neighborhood one at a time. A signal is included as a new member if it is within a threshold distance from the centroid of the current cluster. The cluster centroid location is updated after each new member is added. The cluster can grow across region boundaries without constraints. Clustering stops when no more new members can be found to satisfy the inclusion criteria. A cluster is considered to be true if the number of members in the cluster is greater than a preselected threshold. The signals that are not found to be in the neighborhood of any clusters will be
Computer-aided diagnosis of breast cancer
9
considered isolated noise points or insignificant calcifications and excluded. The specific parameters or thresholds used in the various steps depend on the spatial and gray level resolutions of the digitized or digital mammograms [28, 113]. It was found that having four detected signals within a clustering diameter of 1 cm provided a high sensitivity for cluster detection. 1.2.2 FROC ANALYSIS OF DETECTION ACCURACY The performance of a computer-aided detection system is generally evaluated by the freeresponse receiver operating characteristic (FROC) analysis [120]. An FROC curve shows the sensitivity of lesion detection as a function of the number of FPs per image. In this study, it was generated by varying the input SNR threshold over a range of values so that the detection criterion varied from lenient (low threshold) to stringent (high threshold). After passing the size and contrast criteria, screening by the trained CNN, and passing the regional-clustering criterion, the detected individual microcalcifications and clusters are compared with the “truth” file of the input image. The number of TP and FP microcalcifications and the number of TP and FP clusters are scored. The scoring method varies among researchers. In our study, the detected signal was scored as a TP microcalcification if it was within 0.5 mm from a true microcalcification in the “truth” file. A detected cluster was scored as a TP if its centroid coordinate was within a cluster radius (5 mm) from the centroid of a true cluster and at least two of its member microcalcifications were scored as TP. Once a true microcalcification or cluster was matched to a detected microcalci- fication or cluster, it would be eliminated from further matching. Any detected microcalcifications or clusters that did not match to a true microcalcification or cluster were scored as FPs. The trade-off between the TP and FP detection rates by the computer program was analyzed as an FROC curve. A low SNR threshold corresponded to a lax criterion with high sensitivity and a large number of FP clusters. A high SNR threshold corresponded to a stringent criterion with a small number of FP clusters and a loss in TP clusters. The detection accuracy of the computer program with and without the CNN classifier could then be assessed by comparison of the FROC curves. To test the performance of the selected optimal architecture, the detection pro- gram was run at seven SNR threshold values varying between 2.6 and 3.2 at increments of 0.1. Figure 1.2a shows the FROC curves of the microcalcificationdetection program using both the manually optimized and automatically optimized CNN architectures. The FP rate was estimated from the computer marks on the 184 normal mammograms that were free of microcalcifications in the USF data set. The automatically optimized architecture outperformed the manually optimized archi- tecture. At an FP rate of 0.7 cluster per image, the film-based sensitivity is 84.6% with the optimized CNN, in comparison with 77.2% for the manually selected CNN. Figure 1.2b shows the FROC curves for the microcalcification-detection programs if clusters having images in both CC and MLO views are analyzed and a cluster is considered to be detected when it is detected in one or both views. This “case-based” scoring has been adopted for the evaluation of some CAD systems [20]. The rationale is that if the CAD system can bring the radiologist’s attention to the lesion on one of the views, it will be unlikely that the radiologist will miss the lesion. For casebased scoring, the sensitivity at 0.7 FPs/image is 93.3% for the
Medical image analysis method
10
automatically opti- mized CNN and 87.0% for the manually selected CNN. This study demonstrates that classification of true and false signals is an important step in the microcalcifi- cation-detection program and that an optimized CNN can effectively reduce FPs and improve the detection accuracy of the CAD system. An automated optimization algorithm such as simulated annealing can find the optimum more efficiently [117, 121–123] than a manual search, which may find only a local optimum because it is difficult to explore adequately a high-dimensional parameter space. The optimization described here is applied to one stage, FP reduc- tion with the CNN, of the detection program. The cost function was based on the Az of the CNN classifier for its performance in differentiating the TP and FP signals. Ideally, one would prefer to optimize all parameters in the detection program together. In such a case, optimizing the performance in terms of the FROC curve will be necessary. The principle of optimizing the entire detection system is similar to that of optimizing the TP-FP classifier, except that a proper cost function has to be designed to guide the optimization. It may be noted that we discuss here a three-stage (training-validation-test) methodology for development and evaluation of CAD system performance. This methodology requires separate data sets for each stage. The training data set is used to select the sets of parameters for the neural network architecture and neural network weights. The validation set is used to evaluate the performance of the selected architectures and identify the architecture with the best performance. Once the architecture is selected using the validation set, the parameters of the detection program are fixed, and no further changes should be made. The performance of the program is then evaluated with an independent test set. The images in this set were used only to assess the performance of the fully specified optimal architecture. If only a small training set and an “independent” test set are used, and the detection performance on the test set is used as a guide to adjust the parameters of the detection program, there is always a bias due to fine-tuning the CAD system to this particular “test” data set that is essentially a validation set. The results achieved with that test set may not be generalizable to other data sets. This is an important consideration for CAD system development. Before a CAD system can be considered for clinical implementation, it is advisable to follow this three-stage methodology and to evaluate the system with an independent random test set that contains a large number of cases with a wide spectrum of characteristics. Otherwise, the test results may not reflect the actual performance of the CAD program in the unknown patient population.
Computer-aided diagnosis of breast cancer
11
FIGURE 1.2 Comparison of test FROC curves for detection of clustered microcalcifications with manually optimized CNN architecture (12–8–5– 3) and automatically optimized CNN
Medical image analysis method
12
architecture (14–10–5–7): (a) filmbased (single view) scoring and (b) case-based (CC and MLO views) scoring. The evaluation was performed using a test data set with 472 images. 1.2.3 EFFECTS OF COMPUTER-AIDED DETECTION ON RADIOLOGISTS' PERFORMANCE One of the important steps in the development of a CAD system is to evaluate whether the computer’s opinion has any impact on radiologists’ performance. ROC methodology is a well-known approach to comparing two diagnostic modalities. The important issues involved in the design of ROC experiments can be found in the literature [118]. We will describe as an example an observer ROC study to evaluate the effects of a computer aid on radiologists’ accuracy in the detection of microcalcifications with and without aid [18]. In the ROC study, a set of 60 mammograms, half of which were normal and the other half of which contained very subtle microcalcifications, was used. The accuracy of the microcalcification-detection program at the time of the study was 87% at 4 FPs/image for this data set. A simulated detection accuracy of 87% at 0.5 FPs/image was also included in the ROC experiment to evaluate the effect of FPs on radiologists’ detection. Seven attending radiologists and eight radiology residents participated as observers. They read the mammograms under three different conditions: one without CAD, the second with CAD having an accuracy of 87% at 4 FPs/image, and the third condition with CAD having an accuracy of 87% at 0.5 FPs/image. The reading for each observer was divided into three sessions, and the reading order of the radiologists using the three conditions was counterbalanced so that no one condition would be read by the observers in a given order more often than the other two conditions. The observers were asked to use a fivepoint confidence rating scale to rate their confidence in detecting a microcalcification cluster in an image. The confidence rating scale was analyzed by ROC methodology. The ROC curves obtained from the observer experiment are shown in Figure 1.3. The average sensitivity over the entire range of specificity is represented by the area under the ROC curve, Az. It was found that the Az improved significantly (p< 0.001) when the radiologists read the mammograms with the computer aid, either at 0.5 FPs/image or at 4 FPs/image, compared with when they read the mammograms without the computer aid. Although the Az of the CAD reading with 0.5 FPs/image was slightly higher than that with 4 FPs/image, the difference did not achieve statistical significance, indicating that the observers were able to discard FPs detected by the computer. This ROC study was the first experiment to demonstrate that CAD has the potential to improve breast cancer detection, thus establishing the significance of CAD research in mammography.
Computer-aided diagnosis of breast cancer
13
FIGURE 1.3 Comparison of the average ROC curves for detection of microcalcifications with and without CAD. LI is the computer performance level of 87% sensitivity at 4 FPs/ per image, and L2 is the simulated computer performance level of 87% sensitivity at 0.5 FPs/ per image. The average ROC curves were obtained by averaging the slope and intercept parameters of the individual ROC curves from the 15 observers. The improvement in the detection accuracy, Az, was statistically significant at p < 0.001 for both CAD conditions. 1.3 COMPUTERIZED DETECTION OF MASSES Mass is another major sign of breast cancer. Masses are imaged as focal density on mammograms. In mammograms of fatty breasts, a dense mass—low-optical-density (white) region surrounded by a darker gray background—can easily be detected by
Medical image analysis method
14
radiologists. However, in most breasts there is fibroglandular tissue that also appears as dense white regions on mammograms, and this camouflaging effect makes it difficult for radiologists to detect the masses. There are several major types of masses, as described by the characteristics of their borders, including well-circum- scribed, ill-defined, and spiculated. Masses with well-circumscribed margins are more likely to be benign cysts or fibroadenomas, whereas masses with ill-defined or spiculated borders have a high likelihood of being malignant. Some CAD researchers designed their mass-detection programs making use of the border char- acteristics of spiculated masses [19, 52, 55, 64, 65, 68]. Karssemeijer et al. employed statistical analysis to develop a multiscale map of pixel orientations. Two operators sensitive to radial patterns of straight lines were constructed from the pixel-orien- tation map. The operators were then used by a classifier to detect stellate patterns in the mammogram [64]. Kobatake et al. used line skeletons and a modified Hough transform to detect the spicules, which are radiating line structures extending from the mass [65, 68]. Finally, Ng et al. used a spine-oriented approach to detect the microstructure of mass spicules [55]. Since a substantial fraction of nonspiculated masses are malignant, detection of nonspiculated masses is as important as detecting spiculated masses. A number of massdetection algorithms were developed to detect masses without focusing on specific border characteristics [52, 54, 56–63, 66, 67, 69–71]. Most of the massdetection programs were applied to a single-view mammogram. The mammogram is first preprocessed with a filter or nonlinear technique to enhance the suspicious regions. The potential signals are segmented from the background based on mor- phological and gray-scale information. Feature descriptors are extracted from the segmented signals. Rule-based classifiers or other linear, nonlinear, or neural-net- work classifiers are then trained to classify the signal candidates as true mass or false positives. Laine et al. applied multiscale wavelet analysis to enhance contrast of a mam- mogram [58, 60]. Petrick et al. used adaptive enhancement, region growing, and feature classification to detect suspicious mass regions in a mammogram [63, 70, 124]. Li et al. employed a modified Markov random field model and adaptive thresholding to segment regions in an image [59]. A fuzzy binary-decision-tree classifier then classified the regions as suspicious or normal. Zheng et al. used Gaussian band-pass filtering to detect suspicious regions and rule-based multilayer topographic-feature analysis to classify the regions [61]. Guliato et al. proposed a fuzzy region-growing method for mass detection [66]. Radiologists often used the approximate symmetry in the distribution of dense tissue in the left and right breasts of a patient to detect abnormal growth. Yin et al. developed a mass-detection method based on this information. Their technique, bilateral subtraction, subtracted corresponding left and right mammogram after the two images were aligned. Morphological and texture features were then extracted from the detected regions to decrease the number of FP detections [54, 56]. Another important technique used by radiologists in mammographic interpretation is to compare current and prior mammograms to detect new density or changes in the existing densities. Computer vision techniques for comparing current with prior mammograms have been proposed. Brzakovic et al. registered the current and prior mammograms using a principal-axis method. The mammograms were then parti- tioned using hierarchical region growing and compared using region statistics [57]. Sanjay-Gopal et al. [96] developed a regional-
Computer-aided diagnosis of breast cancer
15
registration technique in which the mammograms were aligned based on maximizing mutual information between the breast regions on the two images. Polar coordinate systems, based on the nipple and breast centroid locations, were established for both images. The center of the lesion on the current image was then transformed to the prior image. A fan-shaped region, based on the polar coordinate system and centered at the centroid of the lesion, was defined and searched to obtain a final estimate of the mass location in the prior image. Hadjiiski et al. [125, 126] further improved the accuracy of the regionalregistration technique by incorporating a local search method to refine the lesion location. Local search was guided by simplex optimization and a correlation similarity measure. Radiologists routinely use two-view (CC and MLO views) mammograms for lesion detection. Paquerault et al. [100] developed a mass-detection method that fuses the detection on the CC and MLO views to reduce false positives. They demonstrated that the two-view fusion method can improve the detection accuracy for masses on mammograms. In this section, we will discuss our approach as an example of an automated technique for detection of masses using one-view information. A two-view infor- mation-fusion technique is discussed in the next section. 1.3.1 METHODS We have developed a mass-detection program for single-view mammograms. The method is based on the information that masses manifest as density on mammo- grams. It does not presuppose certain shape, size, or border properties for a mass and thus is designed to detect any type of masses. The block diagram for our mass-detection scheme is shown in Figure 1.4. This scheme combines adaptive enhancement with local object-based region-growing and featureclassification techniques for segmentation and detection. We developed a densityweighted contrast enhancement (DWCE) filter as a preprocessing step. The DWCE filter enhances the contrast between the breast structures and the background based on the local breast density. Suspicious structures on the enhanced breast image are identified. Each of the identified structures is then used as the seed point for object-based region growing. The region-growing technique uses gray-scale infor- mation to segment the object borders and to reduce merging between adjacent or overlapping structures. Morphological and texture features are extracted from the grown objects. Rule-based classification and a classifier using linear discriminant analysis (LDA) are used to distinguish breast mass or normal structures based on the extracted features. In order to reduce the large number of initial structures, a first-stage rule-based classifier, based on morphological features, is used to eliminate regions whose shapes are significantly different from breast masses. A second-stage classifier was trained to select useful features and merge them to form a linear discriminant that makes a final decision to distinguish between true masses and normal structures.
Medical image analysis method
16
FIGURE 1.4 Block diagram for the mass-detection scheme. 1.3.1.1 Preprocessing and Segmentation We designed an adaptive filter to enhance the dense structures on digital mammograms. Because most mass lesions have blurred borders, and because commonly used edgeenhancement methods cannot sharpen the mass margins very well, the low-contrast dense breast structures are first enhanced by a nonlinear filter using an enhancement factor that is weighted by the local density [62]. A Laplacian-Gaussian (LG) edge detector is then applied to the enhanced structures to extract the object boundaries. The adaptive filter is an expansion of the adaptive contrast and mean filter of Peli and Lim [127]. The block diagram for the enhancement filter is shown in Figure 1.5. The mammogram is first filtered to derive a contrast image and a density image, IC(x,y) and ID(x, y), respectively. The contrast image is weighted by a multiplication factor that depends on the local value of the density image. Finally, the weighted contrast image undergoes a nonlinear pixelwise transformation to generate the final “enhanced” image. The two-step DWCE filtering is described as The multiplication factor and the nonlinear transformation function used in this application, WD(·) and W(·), can be found in the literature [62]. The DWCE filter suppresses very low-contrast regions, emphasizes low- to medium-contrast regions, and slightly suppresses the high-contrast regions.
Computer-aided diagnosis of breast cancer
17
FIGURE 1.5 Block diagram for the DWCE filter. IW(x, y)=WD(ID(x, y))·IC(x, y) (1.1) IE(x, y)=W(IW(x, y)) (1.2) The suppression of very lowcontrast regions reduces bridging between adjacent breast structures. The enhance- ment of low- to medium-contrast regions accentuates the subtle structures that contain most of the mammographic masses. The slight suppression of the highcontrast regions results in a more uniform intensity distribution of the breast structures. After DWCE filtering, the mammogram should have a relatively uniform background superimposed with enhanced breast structures that can be segmented with Laplacian-Gaussian edge detection [128, 129]. The regions enclosed by the detected edges are considered to be mass candidates. 1.3.1.2 Object Refinement Although the DWCE filtering with LG edge detection can extract breast structures including most of the masses, the borders of the objects are not close to the true object border. The detected object borders are generally within the true object borders because of our attempt to minimize merging between structures. However, many adjacent objects are still found to merge together. The next stage of the massdetection program is designed to refine the object borders and to separate the merged objects. The objectrefinement stage is needed before extraction of morphological and texture features to
Medical image analysis method
18
distinguish true mass and normal breast structures. The purpose of the local refinement stage is to improve the accuracy of object borders found by the DWCE segmentation. For refinement of the objects, seed locations are first identified by finding the local maxima within each object detected in the DWCE stage. The local maxima are determined using the ultimate-erosion technique [130]. These local maxima are then grown into seed objects by using Gaussian smoothing σ=0.4 mm. Each seed object is further grown by selecting all connected pixels with gray values in the range Mi± 0.01 Mi, where Mi is the gray level of the ith local maximum. K-means clustering is then applied to a 25×25-mm background-corrected ROI [116] centered on each seed object to refine the initial object border [131]. The background cor- rection method described by Sahiner et al. was used to estimate the low-frequency background of the ROI [116]. The pixel value of a given pixel on the background image is estimated as the weighted sum of the four pixel values along the edges of the ROI intersecting with a horizontal line and a vertical line passing through the given pixel. The weight for an edge pixel is inversely proportional to the distance from the given pixel to the edge pixel. The estimated background image is subtracted from the ROI to reduce the background variation before K-means clustering. For the K-means clustering, each pixel in the ROI is represented by a feature vector Fi in a multidimensional feature space. In this application, the feature vector is composed of two components: the gray level and a median filtered value (median filter kernel =1×1 mm) of the pixel. The clustering algorithm [132, 133] assigns the class membership of the feature vector Fi of each pixel in an iterative process. The algorithm first chooses the initial cluster center vectors, Co and Cb for the object and the background, respectively. For each feature vector Fi, the Euclidean distance do(i) between Fi and Co, and the Euclidean distance db(i) between Fi and Cb are calculated. If the ratio db(i)/do(i) is larger than a predetermined threshold R, then the vector is temporarily assigned to the group of object pixels; otherwise, it is temporarily assigned to the group of background pixels. Using the new pixel assign- ments, a new object-cluster center vector and a new background-cluster center vector are computed as the mean of the vectors temporarily assigned to the group of object pixels and to the group of background pixels, respectively. This completes one iteration of the clustering algorithm. The iterations continue until the new and old cluster center vectors are the same or the changes are less than a chosen value, which means that the class assignment for each pixel has converged to a stable value. The clustering process does not guarantee connectivity of the pixels assigned to the same class. Therefore, several disconnected objects may be generated in an ROI after clustering, and the object may have holes. The holes within the objects are filled, and the largest connected object among all detected objects in the ROI is selected as the object of interest. Figure 1.6 shows an example of a mammogram demon- strating the DWCEextracted regions and the detected objects before and after clustering is applied.
Computer-aided diagnosis of breast cancer
19
FIGURE 1.6 Example of local object refinement and detection: (a) objects initially detected by DWCE at 800 µm resolution, (b) original mammogram with two of the ROIs; the upper one is normal breast tissue, the lower one is a true mass, (c) the DWCE segmented objects in each ROI, and (d) the final objects after clustering and filling. The true mass and one FP are the detected objects at the output of the system. 1.3.1.3 Feature Extraction and Classification The initial objects from the prescreening DWCE stage include a large number of normal breast structures (false positives). In order to overcome the problems asso- ciated with the large number of objects, we perform the feature classification in two stages. Eleven morphological features are initially used with a threshold and a linear classifier to remove detected normal structures that are significantly different from breast masses. Texturebased classification then follows this morphological-reduc- tion stage. Fifteen global and local multiresolution texture features, based on the spatial gray-level dependence (SGLD) matrices are used as inputs to an LDA clas- sifier, which merges the input feature into a single discriminant score for each detected object. Decision thresholds based on this score and on the maximum number of marks allowed per image are then used to identify potential breast masses. These feature-extraction and classification steps are described briefly below. Further details can be found in the literature [62, 70, 73, 86, 134]. We extracted a number of morphological features from the segmented objects. Eleven of these features are selected for the initial differentiation of the detected struc- tures [63, 70]. Ten of these features are based solely on the binary-object shape extracted by the segmentation. Five of the ten are based on the normalized radial length (NRL). NRL is defined as the Euclidean distance from the centroid of an object to each of its edge pixels
Medical image analysis method
20
and normalized relative to the maximum radial length for the object [74]. The NRL features include the mean, standard deviation, entropy, area ratio, and zero crossing count. The six other morphological features are: number of perimeter pixels, area, perimeter-to-area ratio, circularity, rectangularity, and contrast [70]. The morpho- logical features are used as input variables to a rule-based classifier followed by an LDA classifier. The rule-based classification sets a maximum and minimum value for each morphological feature based on the maximum and minimum feature values found for the breast masses in the training set. The remaining objects after rule-based classi- fication are input to a trained LDA classifier that merges the feature values into a discriminant score. A threshold chosen during training is then applied to the output score to distinguish true masses from normal breast structures. After classification of morphological features, another classifier based on texture features is applied [63, 70, 135, 136]. First, a set of multiresolution texture features is extracted from 100-µm resolution mammograms. The ROIs have a fixed size of 256×256 pixels, and the center of each ROI corresponds to the centroid location of a detected object. If the object is located near the border of the breast and a complete 256×256-pixel ROI cannot be defined, the ROI is shifted until it is entirely inside the breast area and the appropriate edge coincides with the border of the original mammogram. For a given ROI, background correction is first performed to reduce the low-frequency gray-level variation due to the density of the overlapping breast tissue and the X-ray exposure conditions, as described previously for the Kmeans clustering. A more detailed description of this background correction method can be found in the literature [116, 137]. The estimated background image is subtracted from the original ROI to obtain a background-corrected image. Global and local multiresolution texture features derived from the SGLD matri- ces of the background-corrected ROI are used in texture analysis. The SGLD matrix element, pθ,d(i, j), is the joint probability of the occurrence of gray levels i and j for pixel pairs that are separated by a distance d and at a direction θ [138]. In a previous study, we did not observe a significant dependence of the discriminatory power of the texture features on the direction of the pixel pairs for mammographic textures [137]. However, since the actual distance between the pixel pairs in the diagonal direction was a factor of greater than that in the axial direction, the feature values in the axial directions (0° and 90°) and in the diagonal directions (45° and 135°) were grouped separately for each texture feature derived from the SOLD matrix at a given pixel-pair distance. Thirteen texture measures are derived from each SOLD matrix, including cor- relation, entropy, energy (angular second moment), inertia, inverse difference moment, sum average, sum entropy, sum variance, difference average, difference entropy, difference variance, information measure of correlation 1, and information measure of correlation 2. The formulation of these texture measures can be found in the literature [43, 138]. To extract texture features, individual ROIs are first decomposed into different scales by using the wavelet transform with a four-coef- ficient Daubechies kernel. For global texture features, 4 wavelet scales, 14 interpixel distances d, and 2 directions (axial and diagonal) are used to produce 28 different SOLD matrices. A total of 364 global multiresolution texture features are thus calculated for each ROI. To further describe the information specific to the mass and its surrounding normal tissue, a set of local texture features are derived from subregions of each ROI [63, 136, 139]. Five subregions,
Computer-aided diagnosis of breast cancer
21
including an object region with the detected object in the center and four peripheral regions at the corners, are segmented from each ROI. A total of 104 local texture features are calculated from the eight SOLD matrices (4 interpixel distances×2 angles×13 texture features) of the object region. Another 104 local texture features are derived from the eight SGLD matrices of the periphery regions. The final set of local textures includes the 104 features from the object region and an additional 104 features derived as the difference between the corresponding features in the object and the periphery. The total number of global and local texture features is 572. Because the generalizability of classifiers usually degrades with increased dimensionality of the feature space, a stepwise feature-selection procedure is applied to the feature space to select a small subset of features that are effective for the classification task. The stepwise LDA is a commonly used method for selection of useful feature variables from a large feature space. Details on the application of stepwise feature selection can be found in the literature [135, 137, 140]. Briefly, stepwise LDA uses a forward-selection and backward-removal strategy. When a feature is entered into or removed from the model, its effect on the separation of the two classes can be analyzed by one of several criteria. We use the Wilks’s lambda criterion, which minimizes the ratio of the within-group sum of squares to the total sum of squares of the two class distributions. The significance of the change in the Wilks’s lambda is estimated by Fstatistics. In the forward-selection step, the features are entered one at a time. The feature variable that causes the most significant change in the Wilks’s lambda is included in the feature set if its F value is greater than the F-toenter (Fin) threshold. In the featureremoval step, the features already in the model are eliminated one at a time. The feature variable that causes the least significant change in the Wilks’s lambda is excluded from the feature set if its F value is below the F-to-remove (Fout) threshold. The stepwise procedure terminates when the F values for all features not in the model are smaller than the Fin threshold and the F values for all features in the model are greater than the Fout threshold. The number of selected features decreases if either the Fin threshold or the Fout threshold is increased. Therefore, the number of features to be selected can be adjusted by varying the Fin and Fout values. The selected texture features are used as input predictor variables to formulate an LDA classifier. A threshold-discriminating score is used to differentiate between true masses and false positives. In this implementation, all scores in an individual image are scaled before thresholding so that the minimum score in the image is 0 and the maximum score is 1. This scaling minimizes the nonuniformity seen between mass structures in different images. It also results in at least one structure being detected in each image. 1.3.2 FROC ANALYSIS OF DETECTION ACCURACY 1.3.2.1 Data Sets A database of mammograms with known truth is needed for training and testing of CAD algorithms. The ground truth of each case used in the following study was based on biopsy results, and the true mass location was identified by radiologists experienced in mammographic interpretation.
Medical image analysis method
22
1.3.2.1.1 Training Set The clinical mammograms used for training the algorithm parameters, referred to as the training cases, were selected from the files of patients who had a mammo- graphic evaluation and biopsy at our institution. In our clinical practice, a multiplereading paradigm with a resident or fellow previewing each case followed by an official interpretation by an attending radiologist was typically followed during the initial evaluation of each case. The mammograms were acquired with Kodak MinR/MinR or MinR/MRE screen/film systems using dedicated processing. Series of consecutive malignant and consecutive benign mass cases were collected using a computerized biopsy registry. The selection criterion was that a biopsy-proven mass existed on the mammogram. No case-selection bias was used except for the exclusion of microcalcifications cases without a visible mass, architectural distortion cases, and mass cases containing masses larger than 2.5 cm. The data set consisted of 253 mammograms from 102 patients examined between 1981 and 1989. The training set included 128 malignant and 125 benign masses. Sixty-three of the malignant and six of the benign masses were judged to be spiculated by a radiologist qualified by the Mammography Quality Standards Act (MQSA). The mammograms were digitized with a Lumisys DIS1000 laser film scanner with a pixel size of 100µm and 12-bit gray-level resolution. The gray levels were linearly proportional to optical density in the 0.1- to 2.8-optical density unit (O.D.) range and gradually fell off in the 2.8- to 3.5-O.D. range. 1.3.2.1.2 Independent Test Set The performance of a trained CAD algorithm has to be evaluated with independent cases not used for training. Cases were collected from two different institutions and were not used in the training process. Series of consecutive malignant- and consec- utive benignmass cases were collected using a biopsy registry from each institution, in a manner similar to the training-case collection process. The first set of preoperative cases, referred to as Group 1, was selected from the files of 127 patients who had a mammographic evaluation and biopsy at our insti- tution between 1990 and 1999. The Group 1 case came from the same institution as the training cases and contained at least one proven breast mass visible with mammography. Again, a resident or fellow typically previewed each Group 1 case followed by an official interpretation by an attending (prior to MQSA in 1994) or an MQSA radiologist during the initial evaluation of these cases. Each case consisted of a single CC and either an MLO or lateral view of the breast containing the mass. For simplicity, we will refer to all views other than the CC view as the MLO view in the following discussions, with the understanding that this also includes some lateral views. If both breasts of a patient had a mass, each breast was considered to be an independent case. Using this breast-based definition, a total of 138 cases (276 mammograms) were available. The mammograms were acquired with Kodak MinR/MRE screen/film systems using dedicated processing in the years prior to 1997 (154 mammograms) and a Kodak MinR 2000 screen/film system from 1997 on (122 mammograms). Each case contained one or more preoperative breast masses that were identified prospectively during initial clinical evaluation or mammographic interpretation. The independent Group 1 mammograms were digitized with a Lumisys LS 85 laser film scanner at 50-µm and 12-bit gray-level resolution. The
Computer-aided diagnosis of breast cancer
23
gray levels were calibrated to be linearly proportional to optical density in the 0.1- to 4.0O.D. range. The images were reduced to a 100-µm pixel size by averaging 2x2pixel neighborhoods before performing mass detection. Clinical cases from the public database available from the University of South Florida (USF) were also analyzed [119]. We evaluated 142 CC/MLO pairs from 136 patients collected by USF between 1992 and 1998. Each USF case contained at least one proven breast mass visible on mammography. Additional information on the USF database can be found in the literature [119]. For compatibility with the Group 1 database, we only selected USF cases digitized with a Lumisys 200 laser film scanner. This scanner again digitized the images at 50-µm and 12-bit gray-level resolution, but the gray levels were calibrated to be linearly proportional to optical density in the 0.1- to 3.6-O.D. range. In the following discussions, these 142 USF cases that came from a different institution than the training cases are referred to as the Group 2 cases. Lesion-free mammograms of the breast contralateral to a breast containing an abnormality were used to estimate the CAD marker rate for the algorithm. These mammograms are referred to as normal cases in this study. A mammogram was regarded as normal if it did not contain a visible mass during the time of the mammographic exam and upon second review by an MQSA radiologist during data collection. A total of 251 mammograms from the 127 Group 1 patients and 252 mammograms from the 136 Group 2 patients were included as normal cases. There were fewer normal than abnormal mammograms because not all of the contralateral mammograms were digitized, and 7 of the 263 combined Group 1 and Group 2 patients had visible lesions in both the right and left breasts. Table 1.1 summarizes the Group 1 and 2 test cases used to evaluate the massdetection algorithm. It includes the number of malignant and benign masses sepa- rated by whether they were visible in both views or only in a single view. The mammographic size for the Group 1 masses was measured by the radiologist during initial case evaluation. The malignant Group 1 masses had a mean size, standard deviation, and median size of 15.4 mm, 12.0 mm, and 12.0 mm, respectively. The benign Group 1 masses had a mean size, standard deviation, and median size of 13.4 mm, 11.8 mm, and 10.0 mm, respectively. Radiologist-measured mass sizes were not available for the Group 2 cases because we found that the boundary of the masses, hand-drawn by the reviewing radiologists, were much larger than the actual mammographic lesion size. Therefore, mass size information is not reported for the Group 2 cases. 1.3.2.2 True Positive and False Positive One important consideration in the evaluation of the performance of a CAD algo- rithm is the definition of the TPs and FPs. Even if the algorithm is fixed, the reported detection sensitivity and specificity have been found to be dependent on these definitions. For the Group 1 cases, the smallest bounding box containing the entire mass identified by a radiologist was used as the truth. For Group 2, we used a bounding box around the radiologist-outlined mass region provided with each image. Our definition of a TP was based on the percentage of overlap between the bounding box of an identified structure and the bounding box of the true mass. Based on the training set, we chose an overlap threshold of 25%. This value corresponds to the minimum overlap between the bounding
Medical image analysis method
24
box of a detected object and the bounding box of a true mass for the object to be considered a TP detection. The 25% threshold was selected because it was found to match well with TPs identified visually. The detected objects were first labeled automatically by the computer using this criterion. All of the TPs were then visually reviewed to make sure that the program highlighted the true lesion and not a neighboring structure. Marks that were found to match neighboring structures were considered to be FPs. The number of FP marks produced by the algorithm was determined by counting the markings produced in normal cases. We used a total of 251 normal mammograms from Group 1 and 252 normal mammograms from Group 2 to estimate the marker rate. The true-positive fraction (TPF) or sensitivity, calculated from the abnormal cases, and the average number of marks per image, calculated from the normal cases, were determined for a fixed set of thresholds at the final texture-classification stage. The TPF and the average number of marks per mammogram as the decision threshold varied were then used to plot the FROC performance curves for malignant and benign masses in the different data sets. 1.3.2.3 Training and Testing The computer program was trained using the entire training data set of 253 mammograms. This included adjusting the filters, clustering, selected features, and classification thresholds. Once training was completed, the parameters and all thresholds
TABLE 1.1 Summary of Cases, Patients, and Masses in Group 1 and Group 2 Databases Abnormal Malignant Total Database
OneView
Benign
TwoView
Mam Pat Masses mograms ients
OneView
Masses
TwoView
Masses Masses
Normal Mam Patients mograms
Individual Masses Group 1
276
127
2
72
3
78
251
93
Group 2
284
136
5
96
6
63
252
128
Grouped Masses Group 1
128
64
—
64
—
—
251
93
Group 2
184
92
—
92
—
—
252
128
Note: One-view masses correspond to masses visible in only one mammographic view in the pair; two-view masses correspond to masses visible in both mammographic views in the pair. The
Computer-aided diagnosis of breast cancer
25
individual-masses category considers each mass in a mammogram or case as a TP during scoring; the grouped-masses category considers all malignant masses for a mammogram or case together as one TP during scoring.
were fixed for testing. The training data set was then resubstituted into the algorithm and was found to have an image-based (i.e., each mass on each mammogram was considered as an independent sample) training sensitivity of 81% (85% for malignant masses), with 2.9 marks per mammogram on average at this sensitivity level. It is important to note that the detection classifiers considered only classification between breast masses and normal tissue, and not between malignant and benign masses. Therefore, no distinction was made between malignant and benign masses in the training process. 1.3.2.4 Performance of Mass Detection Algorithm The detection performance of a CAD algorithm for mammography can be analyzed on a per-mammogram or per-case basis. In the former, the CC and MLO views are considered independently, so that a lesion visible in the CC view is considered as a TP, and the same lesion in the MLO view is a different TP. In the latter case, a mass is considered to be detected if it is detected on either the CC view, the MLO view, or on both views. The latter evaluation takes into consideration that, in clinical practice, once the computer alerts the radiologist to a cancer in one view, it is unlikely that the radiologist will miss the cancer. The per-case approach is often used by researchers in reporting their CAD performance [20, 141, 142]. Results are also presented for two different TP scoring methods. The individual scoring method considers each mass in a mammogram or case as a different TP. The grouped scoring method considers all malignant masses in a mammogram or case as a single TP [20]. The rationale for group scoring is that a radiologist might not need to be alerted to all malignant lesions in a mammogram or case before taking action. Therefore, multiple detections in a mammogram or case might not significantly enhance the power of CAD. These different definitions of computer detection are included here to illustrate the dependence of performance on the scoring methods. It is therefore important to clearly define the scoring method in reporting or comparing perfor- mance of CAD algorithms. FROC performance curves based on individual mass scoring for the Group 1 cases are shown in Figure 1.7. Similar data are presented for the Group 2 cases in Figure 1.8. These figures include per-case and per-mammogram performance curves for the detection of both the malignant and benign masses, and these are included to show the TPF achievable for a large range of marker rates. It can be seen that the performance for the Group 2 benign cases is much lower than that for the Group 1 benign cases. However, the difference in performance between the Group 1 and Group 2 malignant masses is small. The per-case and per-mammogram FROC performance curves for malignant masses based on grouped-mass scoring is shown in Figure 1.9. These curves show how TPF varies as a function of marker rate based on group scoring, which is expected to be the best clinically relevant measure of algorithm performance. It is evident from the curves that the algorithm provides consistent malignant mass-detection performance for both independent test sets over a wide range of marker rates.
Medical image analysis method
26
In the Group 1 database, 34% (49/146) of the malignant and 5% (8/159) of the benign masses were spiculated. There were 33% (65/197) and 0% (0/132) spiculated
FIGURE 1.7 Group 1 FROC performance curves based on individual mass scoring. The figure includes per-case and per-mammogram performance curves for the detection of both the malignant and benign masses. The curves show the TPF achievable for a large range of mass marker rates.
Computer-aided diagnosis of breast cancer
27
FIGURE 1.8 Group 2 FROC performance curves based on individual mass scoring. The figure includes per-case and per-mammogram performance curves for the detection of both the malignant and benign masses. The curves show the TPF achievable for a large range of mass marker rates.
Medical image analysis method
28
FIGURE 1.9 The Group 1 and Group 2 FROC performance curves based on grouped-mass scoring. The figure only includes per-case and per-mammogram performance curves for the detection of malignant masses. These curves show how TPF varies as a function of the marker rate for group scoring, which is expected to be our best clinically relevant measure of algorithm performance. masses in the Group 2 malignant and benign cases, respectively. A comparison between spiculated and nonspiculated mass performances is shown in Figure 1.10. The curve for spiculated benign mass is not included because of the small number of lesions in this category. The resulting curves indicate that the detection algorithm is better suited to detect spiculated masses. Finally, the sensitivities achieved by the mass-detection algorithm at three fixed normal marker rates were analyzed. These marker rates were selected because they represent potential operating points for clinical implementation of a CAD algorithm based on previously published studies [142, 143]. The results at these fixed marker levels are summarized in Table 1.2. The best estimates for the clinical performance of this mass-detection program are found in the columns for combined grouped malignant masses in the table where 87% (135/156), 83% (130/156), and 77% (120/156) of the
Computer-aided diagnosis of breast cancer
29
malignant cases were detected for marker rates of 1.5, 1.0, and 0.5 marks per mammogram, respectively. 1.4 MASS DETECTION WITH TWO-VIEW INFORMATION As described previously, a CC and an MLO view are routinely taken for each breast during mammographic screening. The two views not only allow most of the breast tissue to be imaged, but also improve the chance that a lesion will be seen in at least one of the views. Radiologists analyze the different mammographic views to detect calcifications and masses that might be a sign of breast cancer and to decide whether
FIGURE 1.10 The combined Group 1 and 2 FROC performance curves for spiculated and nonspiculated masses based on individual mass scoring. The benign spiculated mass curve is not shown because of the small number of cases in this category.
Medical image analysis method
30
TABLE 1.2 Summary of per-Case Malignant Mass Detection Performance at Three Marker Rates per Image True-Positive Fraction (Sensitivity) Data Set
0.5 Marks
1.0 Marks
1.5 Marks
Individual Malignant Group 1
55/74 (74%)
59/74 (80%)
63/74 (85%)
Group 2
76/101 (75%)
83/101 (82%)
84/101 (83%)
131/175 (75%)
142/175 (81%)
147/175 (84%)
Combined
Grouped Malignant Group 1
49/64 (77%)
53/64 (83%)
55/64 (86%)
Group 2
71/92 (77%)
77/92 (84%)
80/92 (87%)
120/156 (77%)
130/156 (83%)
135/156 (87%)
Combined
Note: Each individual mass in a mammogram or case is considered to be a positive for the individual-malignant categories. All malig- nant masses for a mammogram or case are considered together to be one positive for the grouped-malignant category.
to call the patient back for further diagnostic evaluations. They also use the two views to reduce FPs such as overlapping dense tissue in one view that mimics masses. Their interpretation integrates complex criteria of human vision and intelligence, including morphology, texture, and geometric location of any suspicious structures of the imaged breast, combining information from different views, checking differ- ences between the two breasts, and looking for changes between the prior and current mammograms, when available. Clinical studies indicate that lesion detectability in two-view mammograms is more accurate than when only one view is available [2, 14, 93, 144]. CAD algorithms reported in the literature so far use single-view information for detection of lesions, even though the accuracy can be scored and reported using two views. Yin et al. [54] used bilateral subtraction in a prescreening step of a mass detection program to locate mass candidates, but the subsequent image analysis was performed based only on a single view. Recently, Hadjiiski et al. [126, 145] have developed an interval-change analysis of masses on current and prior mammograms and found that the classification accuracy of malignant and benign masses can be improved significantly in comparison with single-image classification. These studies demonstrated the potential of using multiple image information for CAD. However, current CAD algorithms have not utilized one of the most important pieces of information available in a mammographic examination—the correlation of com- puter-detected lesions between the two standard views. This is a very difficult problem for computer vision because the breast is elastic and deformable. The overlapping tissue and the relative position of the breast structures are generally different, even when the breast is compressed in the same view two different times. The change in geometry for an elastic object and the lack of invariant
Computer-aided diagnosis of breast cancer
31
“landmarks” make it difficult, if not impossible, to correctly register two breast images in the same view by any established image-warping technique or by using an analytic model to predict corresponding object locations in the different views of the same breast. Few studies have been conducted on how to find the relationship between structures in different mammographic views. Highnam et al. [146] proposed a breastdeformation model for compressed breasts. Kita et al. [147] used the model for finding corresponding points in two different views. They demonstrated with a data set of 26 cases (a total of 37 lesions) that this method allowed prediction of location in a second view within a band of pixels ±26 mm from an epipolar line. However, assumptions on the parameters and the deformation of a compressed breast had to be made, and the robustness of the model has yet to be validated. More practical approaches, which do not depend on a large number of assumptions, may be pref- erable. Good et al. and Chang et al. reported preliminary attempt of matching computer-detected objects in two views [69, 148]. They demonstrated the feasibility of identifying corresponding objects (Az=0.82) in the two views by exhaustive pairing of the detected objects and feature classification. None of these studies attempted to use the two-view correspondence information to improve lesion detec- tion or classification. During mammographic interpretation, if a suspicious breast mass is found in one view, the radiologist attempts to find the same object in the other available views to identify the object as a true or a false mass. Radiologists commonly consider the distance from the nipple to the center of the suspicious lesion in one view and then search the corresponding object in the second view in an annular region at about the same radial distance from the nipple. Based on this approach, we previously developed a regionalregistration technique to identify corresponding lesion loca- tions on current and prior mammograms of the same view [97, 126, 149]. Automated matching of lesions in current and prior mammograms can facilitate interval-change analysis for classification of malignant and benign masses [150]. We have general- ized this geometric model to localize corresponding lesions within a search region when two-view or three-view mammograms of the same breast are available for lesion detection [99]. The object of interest can be matched with possible corre- sponding objects in the search region using the similarity of feature measures. We have found that the correlated lesion information from two views could be used to reduce FP detections [100, 101, 151]. In the following section, we discuss the use of the regional-registration technique as a basis to correlate lesions in two-view mammograms. The correspondence information is used to reduce false detections produced by a single-view CAD algorithm. The detection accuracy of the two-view scheme was evaluated and compared with the single-view CAD scheme using FROC analysis. 1.4.1 METHODS To merge information from corresponding segmented structures in the two standard views of the same breast, we first assume that a true mass will have a higher chance of being detected in both views. Likewise, we assume that the objects corresponding to the same mass detected in the two different views (a TP-TP pair) will be more similar in their feature measures than a mass object compared with normal tissue (a TP-FP pair) or two false-positives (an FP-FP pair). Object matching is performed in two stages. First, all
Medical image analysis method
32
possible pairings of the detected objects on the two views are determined, taking into account geometric constraints. Second, features are extracted from each object, similarity measures for the features pairs are derived, and a classifier is trained to classify true pairs (TP-TP pairs) from false pairs (TPFP, FP-TP, or FP-FP pairs) using the similarity measures. The two stages are described in the following section. 1.4.1.1 Geometrical Modeling The geometric models for distinguishing the location of an object in the MLO view from that in the CC view or vice versa are described here. For the purpose of studying the geometric relationship between the locations of an object imaged on the two mammographic views, any identifiable objects can be used. We therefore chose twoview mammograms that contained masses, microcalcification clusters, and large benign calcifications identifiable on both views. This data set was different from that used for mass detection, as described later. The locations of the corresponding objects on the two views and the nipple locations were identified on the mammograms by an MQSA radiologist. For a large object such as a mass or a microcalcification cluster,
FIGURE 1.11 Example of the coordinate system used to localize an object in a mammographic view. An automatic boundary-tracking process is used to segment the breast. The nipple location was identified by an MQSAapproved radiologist. The distance of
Computer-aided diagnosis of breast cancer
33
the object from the nipple location is defined by The angle of the mass from the midline of the breast is defined by the angle between the vectors
and
the manually identified “centroid” was taken as its location. A breast-boundary tracking program was used to segment the breast area from the mammogram [152, 153]. Using the nipple location as the origin, concentric circles were drawn, each of which intersected the breast boundary at two points and defined an arc. The locus of the midpoints of these arcs was considered to be the breast midline. The breast length was defined as the distance from the nipple to the point where the midline intersected the chest wall. From these parameters, the polar coordinates (Rx, θx), with x=C (CC view) or M (MLO view), as shown in Figure 1.11, were defined, where Rx was the distance from the nipple to the object center, and θx was the angle between Rx and the line from the nipple to the midpoint of the arc intersecting the object. The relationship between the coordinate of the object on one view and that on the other view was investigated in this coordinate system. Scatter plots of the radial distance and the angle of the radiologist-identified objects on the two views in the data set are shown in Figure 1.12 and Figure 1.13, respectively. It can be seen that there is a high correlation (correlation coefficient = 0.94) of the radial distances of the corresponding objects in the two views. However, the angular coordinates in the two views are basically uncorrelated (correlation coefficient=0.42). A linear model with parameters ar and br was therefore chosen to predict the radial distance of an object in a second view from that in the first view: RM=ar·RC+br (1.3)
Medical image analysis method
34
FIGURE 1.12 CC view vs. MLO view of the radial distances of the identified objects from the nipple location.
FIGURE 1.13 CC view vs. MLO view of the angular coordinates of the identified objects from the breast midline. Because of the variability of the breast tissue caused by compression, the pre- dicted location for an individual case could deviate from its “true” location, as determined by the radiologist, by a wide range. Therefore, a global model was estimated using a set of training cases with radiologist-identified object locations on both views. The model coefficients were obtained by minimizing the mean square error between the true and the predicted coordinates in the second view. The error in this estimation was then used to define an annular search region, which had a center at a radial distance Rx from the nipple as predicted by the model, and a width of ±∆R as estimated from the localization errors observed in the training set. This search region avoids using the entire area of the breast and eliminates many inap- propriate pairings between detected objects on the CC view and the MLO view in the second stage, as discussed below. The model was trained and tested by a cross-validation scheme. The available data set was randomly divided into a training set and a test set in a 3:1 ratio. The training set was used to estimate the model coefficients and the search region width. The test set was used to evaluate the prediction accuracy of the model. Four nonoverlapping partitions that separated the database into training and test sets were considered. The model performance was obtained by combining the results of the four test sets. The geometrical
Computer-aided diagnosis of breast cancer
35
analysis was then used to pair objects detected on the two views of the same breast in the prescreening stage of our mass-detection program as detailed in the following sections.
1.4.1.2 One-View Analysis The single-view approach was used to identify potential breast masses among the suspicious objects. The single-view prescreening used in this study was similar to that discussed in the literature [62, 63, 70] and in the previous sections. The only difference was that the FP reduction step was modified such that a slightly different object-overlap criterion was employed. The block diagram for the single-view massdetection scheme is shown in Figure 1.4. In this study, rule-based classification using morphological features reduced the average number of objects from 37 to about 29 per image and lowered the TP detection sensitivity from 91.1% to 87.9% at this stage. The texture features were then used as the input variables for an LDA classifier. A texture score for each object was obtained from the classifier. Overlap reduction was then applied using these texture scores, as discussed below. During object segmentation, the border of an object is obtained by K-means clustering in a fixed-size region centered on a “seed” object. If the seeds from two objects are close to each other, the two segmented objects can overlap. This occurs when the two detected objects are neighboring structures that overlap in the mammographic view, or they may be part of a large single structure that was initially detected in multiple pieces. An overlap criterion based on the texture scores is imposed to select one of the two overlapping objects as a mass candidate. We used the shape of the segmented objects to estimate the overlapping area between the two neighboring objects on the mammogram. An overlap fraction was defined as (1.4) where O1 and O2 are the segmented areas of the overlapping objects. A threshold on the overlap fraction was chosen such that if the overlap fraction of two objects exceeded the threshold, the object with the higher texture score (i.e., more likely to be a mass candidate) was kept, and the other was discarded as an FP. The sensitivity and the specificity of differentiating true and false masses depend on the selection of the overlap threshold. An overlap threshold of 15% was chosen, which led to an average of 15 objects per image at a detection sensitivity of about 85%. As shown below, the overall detection accuracy was relatively independent of the FP rate in this intermediate stage, so the selection of the 15% overlap threshold was not a critical factor.
Medical image analysis method
36
FIGURE 1.14 Schematic diagram for the proposed two-view fusion scheme. After overlap reduction, our current single-view algorithm employed a final stage of FP reduction based on the texture features, as illustrated in the left branch of the block diagram in Figure 1.14. A threshold was applied to the texture scores to limit the maximum number of objects on an image. A maximum of three objects per image was used in the single-view detection scheme. However, when the detection algorithm was used as a prescreening stage in our two-view fusion approach, this threshold was relaxed to increase sensitivity while retaining a larger number of FPs. The remaining objects after this threshold will be referred to as the prescreening objects in the following discussions. To investigate the dependence of the overall detection accuracy of our two-view detection scheme on the initial number of prescreening objects, three different decision thresholds were selected to obtain a maximum of either 5, 10, or 15 objects per image. To further perform the two-view information-fusion analysis, additional morphological features were extracted from each prescreening object. These morpho- logical features included the 11 morphological features used in the single-view FP reduction, 13 new contrast measures [72], and 7 new shape features. To evaluate this new method, we randomly divided the available cases into training and test sets using a 3:1 training/test ratio. The training set was used to select a subset of useful morphological features using stepwise feature selection and to estimate the coeffi- cients of an LDA classifier. To reduce biases in the classifier, 50 random 3:1 partitions of the cases were employed. A
Computer-aided diagnosis of breast cancer
37
morphological score was obtained for each individual object by averaging the object’s test scores obtained from the different partitions. The morphological score was then combined with the single-view texture score by averaging the two scores. A single combined score thus characterized each prescreening object. This one-view score is further fused with the discriminant score obtained by the two-view scheme, as described in the next subsection. 1.4.1.3 Two-View Analysis The block diagram in Figure 1.14 illustrates our two-view mass-detection scheme and its relationship to the one-view approach. The prescreening objects were further analyzed by the two-view method shown in the right branch of the diagram. All possible pairings between the prescreening objects in the two views of the same breast were determined using the distance from the nipple to the centroid of each object and the previously described geometrical model. Because the location of a given object detected in one view cannot be uniquely identified in the other view, as described in Section 1.4.1.1, an object was initially paired with all objects with centroids located within its defined annular region in the other view. The geometric constraints reduced the number of object pairs that needed to be classified as true or false correspondences in the subsequent steps. A true pair (TP-TP) was defined as the correspondence between the same true masses on the two mammographic views, and a false pair was defined as any other object pairing (TP-FP, FP-TP and FP-FP). For each object pair, the set of 15 texture and 31 morphological features (described previously) was used to form similarity measures. In this study, two simple measures—the absolute difference and the mean—were used. A total of 30 texture measures and 62 morphological measures were thus obtained for each object pair. The absolute difference between the nipple-to-object distances in the CC and MLO views was also included as a feature for differentiating true from false object pairs. Two separate LDA classifiers with stepwise feature selection were trained to classify the true and false pairs using the similarity features in the mor- phological- and texturefeature spaces. The classifiers were trained by randomly dividing the data set into a training set and a test set, again using a 3:1 training/test ratio. Fifty random 3:1 partitions of the cases were used to reduce bias. Individual morphological and texture scores were obtained for each object by averaging the object’s test scores obtained from the different partitionings. The two classification scores were then averaged to obtain one “correspondence” score for each object pair. This score, along with the singleview prescreening score, was used in the fusion step described in the next subsection. 1.4.1.4 Fusion Analysis The fusion of the single-view prescreening scores with the two-view correspondence scores was the final step in the two-view detection scheme. All prescreening object scores were first ranked within a given film from the largest to the smallest. The correspondence scores were ranked in a similar way. These two new rank scores were then merged into a single score for each object in each view. Because an object could have more than one correspondence score, its two-view correspondence score was taken to be the maximum
Medical image analysis method
38
correspondence score among all object pairs in which this object was a member. There can be many variations for the fusion step. In this work, the final discriminant score for an object was obtained by averaging its twoview correspondence-score rank with its oneview prescreening-score rank. The accuracy of the single-view detection scheme and the two-view approach are compared in the following subsection based on their FROC performance curves. To demonstrate the effects of the number of the prescreening objects on the overall detection accuracy of the two-view scheme, the FROC curves obtained with 5,10, and 15 prescreening objects per image are compared. 1.4.2 RESULTS 1.4.2.1 Geometrical Modeling For the geometric modeling of object location on two views, the database consisted of 116 cases with masses, large benign calcifications, or clustered microcalcifications identifiable on both views of the same breast. The mammograms were digitized with a Lumisys 85 film scanner with a pixel size of 50 µm and 12-bit gray levels. Since the geometric modeling was not expected to have accuracy within 1 mm, high-resolution processing was not needed. To reduce processing time, the images were reduced to a pixel resolution of 800×800 µm by averaging 16×16 neighboring pixels and downsampling. For each case, the two standard mammographic views were available. A total of 177 objects were manually selected and marked by an expert radiologist on each of these two views. The nipple location was also identified for each breast image. In the geometrical analysis, we first estimated a prediction model of the radial distance of an object in a second view from its radial distance in the first view using the training set. The model was then used to predict object location from one view to the other for the independent test cases. Because the model did not provide an exact solution, a search region, Rx± ∆R, where Rx was the predicted radial distance and ∆R the half-width of an annular region, was defined. The percentage of the true object centroids enclosed within the search region was measured as a function of the size of 2∆R. Figure 1.15 shows the prediction accuracy as a function of 2∆R for estimating the object radial distance in the MLO view from that in the CC view. The results for predicting the object radial distance in the CC view from that in the MLO view are very similar and are not shown. The training and test curves almost overlap in each case. The difference in the accuracy between searching the object centers in the CC or MLO views is small. About 83% of the object centers are within the search region when the radial width of the search region is ≈40 pixels (32 mm) for either the CC view or the MLO view. The search region, although large, is much smaller than the entire area of the breast. The limited size of the search region reduces the number of object pairs to be analyzed in the two-view detection scheme. To avoid missing any pairs of true masses in the two-view scheme, we chose to set
Computer-aided diagnosis of breast cancer
39
FIGURE 1.15 Prediction of the center of an object in the MLO view from its location in the CC view. Training and test performances are given as a function of the radial width of the annular search region. the radial width of the annular search region to about 80 pixels. This led to a larger number of false pairs, but it was substantially less than what would be detected if the entire breast area was considered. 1.4.2.2 Comparison of One-View and Two-View Analysis For the comparison of the one-view and two-view mass detection schemes, a data set of 169 pairs of mammograms containing masses on both the CC and MLO views was used. The mammograms were obtained from 117 patients, of which 128 pairs were current mammograms (defined as mammograms from the exam before biopsy), and 41 pairs were from exams 1 to 4 years prior to biopsy. A malignant mass was observed in 58 of the 128 current and 26 of the 41 prior image pairs. The 338 mammograms were also digitized with the Lumisys 85 film scanner. The true mass locations on both views were identified by an MQSA radiologist. Three different decision thresholds that retained a maximum of 5, 10, and 15 objects per image after the one-view prescreening stage were used to select mass candidates as inputs to the two-view detection scheme. The FROC curves for the detection of malignant and benign masses on each image, using the two-view fusion technique, are similar for the three thresholds of 5, 10, and 15 prescreening objects per image. This
Medical image analysis method
40
similarity also holds for the FROC curves for detection of malignant masses, as illustrated in Figure 1.16. The improvement in detection by the two-view fusion method therefore seems to be independent of the operating threshold when the maximum number of objects retained per image in the prescreening stage is between 5 and 15. We therefore chose the condition of 10 prescreening objects per image for the following discussion.
FIGURE 1.16 Film-based performances of the two-view detection scheme applied to the current malignant masses. Three initial conditions—depending on the maximum number of retained objects per image (5, 10, and 15 objects per image) at the prescreening stage— were evaluated. The performance of the single-view mass-detection algorithm and the two-view fusiondetection algorithm are compared. The image-based FROC curves for the detection of malignant masses in the data set are shown in Figure 1.17. The corre- sponding casebased FROC curves are shown in Figure 1.18. The FROC curves for detection of the malignant masses on the current and prior mammograms are plotted separately for comparison. It is apparent that the two-view fusion method can improve the detection sensitivity by 10 to 15% in the range of 0.5 to 1.5 FPs/image for the malignant masses on current mammograms. For example, at 1 FPs/image, the two-view algorithm achieved a case-based detection sensitivity of 91%, whereas the current single-view scheme had a 77% sensitivity at the same number of FPs per image in this data set. For the case-based
Computer-aided diagnosis of breast cancer
41
comparison, the detection of prior masses could be improved by more than 5% within the range of 0.5 to 1.2 FPs/image. Alternatively, the two-view fusion can be used as an FP reduction technique. The results indicate that the two-view fusion method is more effective in reducing FPs in the subset of cases containing malignant masses on current mammograms. At a case-based detection sensitivity of 75% for all masses, the number of FPs per image was reduced from 1.5 FPs/image using the single-view detection technique to 1.13 FPs/image using the two-view fusion technique. At a case-based sensitivity of 85% for malignant masses on current mammograms, the number of FPs per image was reduced from 1.5 FPs/image to 0.5 FPs/image (Figure 1.18). This study dem- onstrates that including correspondence information from two mammographic views is a promising approach to improving detection accuracy in a CAD system for detection of breast cancer.
FIGURE 1.17 Comparison of the image-based performance of the oneview and two-view detection methods for the detection of malignant masses on current mammograms and prior mammograms.
Medical image analysis method
42
FIGURE 1.18 Comparison of the case-based performance of the oneview and two-view detection methods for the detection of malignant masses on current mammograms and prior mammograms. 1.5 SUMMARY In this chapter, we discussed some of the computer vision techniques used for computeraided detection (CAD) of breast cancer. We used our studies in this area as examples to illustrate the various methods that may be useful for the development of CAD algorithms in mammography. These examples are by no means exhaustive, and many variations of the methods used in the different stages of the automated detection process can be found in the literature. Although several CAD systems are already commercially available for assisting radiologists in clinical practice, the performances of the CAD systems are not yet ideal. Further investigation is needed to improve the sensitivity and the specificity of the systems. One promising approach to improving the performance of computerized breast cancer detection systems is to incorporate multiple image information, including two views or three views of the same breast, comparison of current and prior mammograms, or comparison of bilateral mammograms, which has been practiced routinely by radiologists in mammographic interpretation. The adaptation of the CAD systems to direct digital mam- mography may also improve lesion detectability.
Computer-aided diagnosis of breast cancer
43
We have focused our discussion on lesion detection. Computer-aided characterization of breast lesions is another important CAD application. CAD techniques for differentiation of malignant and benign lesions have been published in the literature. ROC studies have also been performed to demonstrate the potential of CAD in reducing unnecessary biopsy. For both detection and characterization of breast lesions, a promising direction of research is to combine information from multiple breast-imaging modalities. Ultrasound imaging has been routinely used for diagnos- tic workup of suspicious masses. Contrast-enhanced magnetic-resonance breast imaging is a new approach to differentiating malignant and benign breast lesions and detecting multifocal lesions. A number of new breast-imaging techniques are under development, including threedimensional ultrasound imaging, digital tomosynthesis, breast computed tomography, and single-energy or dual-energy contrastenhanced digital-subtraction mammography. These new techniques hold the promise of improving breast cancer detection and diagnosis. However, they can also drasti- cally increase the amount of information that radiologists have to interpret for each case. A CAD system that can analyze the multimodality images and merge the information will not only improve the accuracy of the computer system, but also provide radiologists with a useful second opinion that could improve the efficiency and effectiveness of breast cancer detection and management. ACKNOWLEDGMENT This work is supported by USPHS Grants CA 48129 and CA 95153 and by U.S. Army Medical Research and Materiel Command (USAMRMC) grants DAMD 17–96–1–6254, and DAMD17–02–1–0214. Berkman Sahiner is also supported by a USAMRMC grant DAMD17–01–1–0328. Lubomir Hadjiiski is also supported by a USAMRMC grant DAMD 17–02–1–0489. Nicholas Petrick and Sophie Paquerault were at the University of Michigan when the work was performed. REFERENCES 1. Hillman, B.J., Fajardo, L.L., Hunter, T.B. et al., Mammogram interpretation by physician assistants, Am. J. Roentgenol., 149, 907, 1987. 2. Bassett, L.W., Bunnell, D.H., Jahanshahi, R. et al., Breast cancer detection: one vs. two views, Radiology, 165, 95, 1987. 3. Wallis, M.G., Walsh, M.T., and Lee, J.R., A review of false negative mammography in a symptomatic population, Clinical Radiol., 44, 13, 1991. 4. Harvey, J.A., Fajardo, L.L., and Innis, C.A., Previous mammograms in patients with impalpable breast carcinomas: retrospective vs. blinded interpretation, Am. J. Roent- genol., 161, 1167, 1993. 5. Bird, R.E., Wallace, T.W., and Yankaskas, B.C., Analysis of cancers missed at screen- ing mammography, Radiology, 184, 613, 1998. 6. Beam, C.A., Layde, P.M., and Sullivan, D.C., Variability in the interpretation of screening mammograms by U.S. radiologists: findings from a national sample, Arch. Intern. Medicine, 156, 209, 1996.
Medical image analysis method
44
7. Birdwell, R.L., Ikeda, D.M., O’Shaughnessy, K.F. et al., Mammographic character- istics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection, Radiology, 219, 192, 2001. 8. Kopans, D.B., The positive predictive value of mammography, Am. J. RoentgenoL, 158, 521, 1992. 9. Smart, C.R., Hendrick, R.E., Rutledge, J.H. et al., Benefit of mammography screening in women ages 40 to 49 years: current evidence from randomized controlled trials, Cancer, 75, 1619, 1995. 10. Byrne, C., Smart, C.R., Cherk, C. et al., Survival advantage differences by age: Evaluation of the extended follow-up of the breast cancer detection demonstration project, Cancer, 74, 301, 1994. 11. Feig, S.A. and Hendrick, R.E., Risk, benefit, and controversies in mammographic screening, in Syllabus: A Categorical Course in Physics Technical Aspects of Breast Imaging, Haus, A.G. and Yaffe, M.J., Eds., Radiological Society of North America, Oak Brook, IL, 1993, p. 119. 12. Seidman, H., Gelb, S.K., Silverberg, E. et al., Survival experience in the breast cancer detection demonstration project, CA Cancer J. Clin., 37, 258, 1987. 13. Sabel, M. and Aichinger, H., Recent developments in breast imaging, Phys. Med. Biol, 41, 315, 1996. 14. Thurfjell, E.L., Lernevall, K.A., and Taube, A.A.S., Benefit of independent double reading in a population-based mammography screening program, Radiology, 191, 241, 1994. 15. Anderson, E.D.C., Muir, B.B., Walsh, J.S. et al., The efficacy of double reading mammograms in breast screening, Clinical Radiol, 49, 248, 1994. 16. Shtern, F, Stelling, C., Goldberg, B. et al., Novel Technologies in Breast Imaging: National Cancer Institute Perspective, Presented at Society of Breast Imaging Con- ference, Orlando, FL, 1995, p. 153. 17. Vyborny, C.J., Can computers help radiologists read mammograms? Radiology, 191, 315, 1994. 18. Chan, H.P., Doi, K., Vyborny, C.J. et al., Improvement in radiologists’ detection of clustered microcalcifications on mammograms: the potential of computer-aided diag- nosis, Acad. Radiol, 25, 1102, 1990. 19. Kegelmeyer, W.P., Pruneda, J.M., Bourland, P.D. et al., Computer-aided mammo- graphic screening for spiculated lesions, Radiology, 191, 331, 1994. 20. Warren Burhenne, L.J., Wood, S.A., D’Orsi, CJ. et al., Potential contribution of computer-aided detection to the sensitivity of screening mammography, Radiology, 215, 554, 2000. 21. Freer, T.W. and Ulissey, M.J., Screening mammography with computer-aided detec- tion: prospective study of 12,860 patients in a community breast center, Radiology, 220, 781, 2001. 22. Zheng, B., Ganott, M.A., Britton, C.A. et al., Soft-copy mammographic readings with different computer-assisted detection cuing environments: preliminary findings, Radi- ology, 221, 633, 2001. 23. Brem, R.F., Baum, J.K., Lechner, M. et al., Improvement in sensitivity of screening mammography with computer-aided detection: a multi-institutional trial, Am. J. Roentgenol., 181, 687, 2003. 24. Chan, H.-R, Sahiner, B., Helvie, M.A. et al., Improvement of radiologists’ character- ization of mammographic masses by computer-aided diagnosis: an ROC study, Radi- ology, 212, 817, 1999. 25. Jiang, Y, Nishikawa, R.M., Schmidt, R.A. et al., Improving breast cancer diagnosis with computer-aided diagnosis, Acad. Radiol., 6, 22, 1999. 26. Chan, H.R, Doi, K., Galhotra, S. et al., Image feature analysis and computer-aided diagnosis in digital radiography: 1, Automated detection of microcalcifications in mammography, Med. Phys., 14, 538, 1987. 27. Chan, H.R, Doi, K., Vyborny, C.J. et al., Computer-aided detection of microcalcifi- cations in mammograms: methodology and preliminary clinical study, Invest. Radiol., 23, 664, 1988.
Computer-aided diagnosis of breast cancer
45
28. Chan, H.R, Lo, S.C.B., Sahiner, B. et al., Computer-aided detection of mammographic microcalcifications: pattern recognition with an artificial neural network, Med. Phys., 22, 1555, 1995. 29. Davies, D.H. and Dance, D.R., Automatic computer detection of clustered calcifica- tions in digital mammograms, Phys. Med. Biol, 35, 1111, 1990. 30. Qian, W, Clarke, L.R, Kallergi, M. et al., Tree-structured nonlinear filter and wavelet transform for microcalcification segmentation in mammography, Proc. SPIE Medical Imaging, 1905, 509, 1993. 31. Nishikawa, R.M., Giger, M.L., Doi, K. et al., Computer-aided detection and diagnosis of masses and clustered microcalcifications from digital mammograms, Proc. SPIE Medical Imaging, 1905,422, 1993. 32. Astley, S., Hutt, I., Adamson, S. et al., Automation in mammography: computer vision and human perception, Proc. SPIE Medical Imaging, 1905, 716, 1993. 33. Bankman, I.N., Christens-Barry, W.A., Kim, D.W. et al., Automated recognition of microcalcification clusters in mammograms, Proc. SPIE Biomed. Image Processing Biomed. Visualization, 1905, 731, 1993. 34. Karssemeijer, N., Recognition of clustered microcalcifications using a random field model, Proc. SPIE Medical Imaging, 1905, 776, 1993. 35. Shen, L., Rangayyan, R.M., and Desautels, J.E.L., Automatic detection and classifi- cation system for calcifications in mammograms, Proc. SPIE Medical Imaging, 1905, 799, 1993. 36. Zhang, W, Doi, K., Giger, M.L. et al., Computerized detection of clustered microcalcifications in digital mammograms using a shift-invariant artificial neural network, Med. Phys., 21, 517, 1994. 37. Zheng, B., Chang, Y.S., Staiger, M. et al., Computer-aided detection of clustered microcalcifications in digitized mammograms, Acad. Radiol., 2, 655, 1995. 38. Gavrielides, M.A., Lo, J.Y., Vargas-Voracek, R. et al., Segmentation of suspicious clustered microcalcifications in mammograms, Med. Phys., 27, 13, 2000. 39. Ackerman, L.V. and Gose, E.E., Breast lesion classification by computer and xeroradiograph, Cancer, 30, 1025, 1972. 40. Kimme, C., O’Laughlin, B.J., and Sklansky, J., Automatic Detection of Suspicious Abnormalities in Breast Radiographs, Academic Press, New York, 1975. 41. Chan, H.R, Niklason, L.T., Ikeda, D.M. et al., Computer-aided diagnosis in mammography: detection and characterization of microcalcifications, Med. Phys., 19, 831, 1992. 42. Chan, H.R, Sahiner, B., Lam, K.L. et al., Classification of malignant and benign microcalcifications on mammograms using an artificial neural network, Proc. World Congress Neural Networks, II, 889, 1995. 43. Chan, H.R, Sahiner, B., Petrick, N. et al., Computerized classification of malignant and benign microcalcifications on mammograms: texture analysis using an artificial neural network, Phys. Med. Biol., 42, 549, 1997. 44. Chan, H.R, Sahiner, B., Lam, K.L. et al., Computerized analysis of mammographic microcalcifications in morphological and texture feature space, Med. Phys., 25, 2007, 1998. 45. Shen, L., Rangayyan, R.M., and Desautels, J.E.L., Application of shape analysis to mammographic calcifications, IEEE Trans. Medical Imaging, 13, 263, 1994. 46. Wu, Y., Freedman, M.T., Hasegawa, A. et al., Classification of microcalcifications in radiographs of pathologic specimens for the diagnosis of breast cancer, Acad. Radiol., 2, 199, 1995. 47. Jiang, Y, Nishikawa, R.M., Wolverton, D.E. et al., Malignant and benign clustered microcalcifications: automated feature analysis and classification, Radiology, 198, 671, 1996. 48. Thiele, D.L., Kimme-Smith, C., Johnson, T.D. et al., Using tissue texture surrounding calcification clusters to predict benign vs. malignant outcomes, Med. Phys., 23, 549, 1996.
Medical image analysis method
46
49. Dhawan, A.R, Chitre, Y, KaiserBonasso, C. et al., Analysis of mammographic microcalcifications using gray-level image structure features, IEEE Trans. Med. Imaging, 15, 246, 1996. 50. Winsberg, R, Elkin, M., Macy, J. et al., Detection of radiographic abnormalities in mammograms by means of optical scanning and computer analysis, Radiology, 89, 211, 1967. 51. Semmlow, J.L., Shadagopappan, A., Ackerman, L.V. et al., A fully automated system for screening mammograms, Comput. Biomed. Res., 13, 350, 1980. 52. Lai, S.M., Li, X., and Bischof, W.F, On techniques for detecting circumscribed masses in mammograms, IEEE Trans. Medical Imaging, 8, 377, 1989. 53. Lau, T.K. and Bischof, W.F, Automated detection of breast tumors using the asym- metry approach, Comput. Biomed. Res., 24, 273, 1991. 54. Yin, RR, Giger, M.L., Doi, K. et al., Computerized detection of masses in digital mammograms: analysis of bilateral subtraction images, Med. Phys., 18, 955, 1991. 55. Ng, S.L. and Bischof, W.F, Automated detection and classification of breast tumors, Comput. Biomed. Res., 25, 218, 1992. 56. Yin, F.F., Giger, M.L., Vyborny, C.J. et al., Comparison of bilateral subtraction and singleimage processing techniques in the computerized detection of mammographic masses, Acad. Radiol., 28, 473, 1993. 57. Brzakovic, D., Vujovic, N., Neskovic, M. et al., Mammogram analysis by comparison with previous screenings, in Digital Mammography, Gale, A.G., Astley, S.M., Dance, D.R. et al., Eds. Elsevier, Amsterdam, 1994, p. 131. 58. Laine, A.R, Schuler, S., Fan, J. et al., Mammographic feature enhancement by multiscale analysis, IEEE Trans. Medical Imaging, 13, 725, 1994. 59. Li, H.D., Kallergi, M., Clarke, L.P. et al., Markov random field for tumor detection in digital mammograms, IEEE Trans. Medical Imaging, 14, 565, 1995. 60. Laine, A.F., Huda, W., Steinbach, E.G. et al., Mammographic image processing using wavelet processing techniques, Eur. Radiol., 5, 518, 1995. 61. Zheng, B., Chang, Y.H., and Gur, D., Computerized detection of masses in digitized mammograms using single-image segmentation and a multilayer topographic feature analysis, Acad. Radiol., 2, 959, 1995. 62. Petrick, N., Chan, H.P., Sahiner, B. et al., An adaptive density-weighted contrast enhancement filter for mammographic breast mass detection, IEEE Trans. Medical Imaging, 15, 59, 1996. 63. Petrick, N., Chan, H.P., Wei, D. et al., Automated detection of breast masses on mammograms using adaptive contrast enhancement and texture classification, Med. Phys., 23, 1685, 1996. 64. Karssemeijer, N. and te Brake, G., Detection of stellate distortions in mammograms, IEEE Trans. Medical Imaging, 15, 611, 1996. 65. Kobatake, H. and Yoshinaga, Y., Detection of spicules on mammogram based on skeleton analysis, IEEE Trans. Medical Imaging, 15, 235, 1996. 66. Guliato, D., Randayyan, R.M., Carnielli, W.A. et al., Segmentation of breast tumors in mammograms by fuzzy region growing, in Proc. 20th Annual International Con- ference of IEEE Engineering in Medicine and Biology Society, Hong Kong, 1998, p. 1002. 67. Kupinski, M.A. and Giger, M.L., Automated seeded lesion segmentation on digital mammograms, IEEE Trans. Medical Imaging, 17, 510, 1998. 68. Kobatake, H., Murakami, M., Takeo, H. et al., Computer detection of malignant tumors on digital mammograms, IEEE Trans. Medical Imaging, 18, 369, 1999. 69. Good, W.F., Zheng, B., Chang, Y.H. et al., Multi-image CAD employing features derived from ipsilateral mammographic views, Proc. SPIE, 3661, 474, 1999. 70. Petrick, N., Chan, H.P., Sahiner, B. et al., Combined adaptive enhancement and region-growing segmentation of breast masses on digitized mammograms, Med. Phys., 26, 1642, 1999. 71. te Brake, G.M. and Karssemeijer, N., Single and multiscale detection of masses in digital mammograms, IEEE Trans. Medical Imaging, 18, 628, 1999.
Computer-aided diagnosis of breast cancer
47
72. te Brake, G.M., Karssemeijer, N., and Hendriks, J.H.C.L., An automatic method to discriminate malignant masses from normal tissue in digital mammograms, Phys. Med. Biol, 45, 2843, 2000. 73. Petrick, N., Chan, H.P., Sahiner, B. et al., Breast cancer detection: evaluation of a mass detection algorithm for computer-aided diagnosis: experience in 263 patients, Radiology, 224, 217, 2002. 74. Kilday, J., Palmieri, R, and Fox, M.D., Classifying mammographic lesions using computeraided image analysis, IEEE Trans. Medical Imaging, 12, 664, 1993. 75. Pohlman, S., Powell, K.A., Obuchowshi, N.A. et al., Quantitative classification of breast tumors in digitized mammograms, Med. Phys., 23, 1337, 1996. 76. Huo, Z.M., Giger, M.L., Vyborny, C.J. et al., Automated computerized classification of malignant and benign masses on digitized mammograms, Acad. Radiol., 5, 155, 1998. 77. Sahiner, B., Chan, H.P., Petrick, N. et al., Design of a high-sensitivity classifier based on a genetic algorithm: application to computer-aided diagnosis, Phys. Med. Biol., 43, 2853, 1998. 78. Sahiner, B., Chan, H.P., Petrick, N. et al., Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis, Med. Phys., 25, 516, 1998. 79. Ackerman, L.V., Mucciardi, A.N., Gose, E.E. et al., Classification of benign and malignant breast tumors on the basis of 36 radiographic properties, Cancer, 31, 342, 1973. 80. Getty, D.J., Pickett, R.M., D’Orsi, C.J. et al., Enhanced interpretation of diagnostic images, Invest. Radiol., 23, 240, 1988. 81. D’Orsi, C.J., Getty, D.J., Swets, J.A. et al., Reading and decision aids for improved accuracy and standardization of mammographic diagnosis, Radiology, 184,619,1992. 82. Wu, Y., Giger, M.L., Doi, K. et al., Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer, Radiology, 187, 81, 1993. 83. Baker, J.A., Kornguth, P.J., Lo, J.Y. et al., Breast cancer: prediction with artificial neural network based on bi-rads standardized lexicon, Radiology, 196, 817, 1995. 84. Lo, J.Y., Markey, M.K., Baker, J.A. et al., Cross-institutional evaluation of bi-rads predictive model for mammographic diagnosis of breast cancer, Am. J. RoentgenoL, 178, 457, 2002. 85. Nishikawa, R.M., Giger, M.L., Wolverton, D.E. et al., Prospective testing of a clinical CAD workstation for the detection of breast lesions on mammograms, in Proc. First International Workshop on Computer-Aided Diagnosis, Chicago, IL, 1999, p. 209. 86. Petrick, N., Chan, H.-R, Sahiner, B. et al., Evaluation of an automated computeraided diagnosis system for the detection of masses on prior mammograms, in Proc. SPIE Medical Imaging, San Diego, 2000, p. 967. 87. Kass, D.A., Gabbay, R., and Siedler, D.E., Results of computer-aided detection (CAD) on the prior screening mammograms of patients with interval breast cancers, Radiology, 217(P), 400, 2000. 88. Helvie, M.A., Hadjiiski, L.M., Makariou, E. et al., Sensitivity of noncommercial computeraided detection system for mammographic breast cancer detection: a pilot clinical trial, Radiology, 231, 208, 2004. 89. Hadjiiski, L.M., Chan, H.R, Sahiner, B. et al., A CAD system for characterization of malignant and benign breast masses in temporal pairs of mammograms and its effects on radiologists’ performance: an ROC study, Radiology, 225(P), 683, 2002. 90. Hadjiiski, L.M., Chan, H.P., Sahiner, B. et al., ROC study: effects of computer-aided diagnosis on radiologists’ characterization of malignant and benign breast masses in temporal pairs of mammograms, Proc. SPIE Medical Imaging, 5032, 94, 2003. 91. Hackshaw, A.K., Wald, N.J., Michell, M.J. et al., An investigation into why two-view mammography is better than one-view in breast cancer screening, Clinical Radiol., 55, 454, 2000. 92. Blanks, R.G., Given-Wilson, R.M., and Moss, S.M., Efficiency of cancer detection during routine repeat (incident) mammographic screening: two- vs. one-view mammography, J. Medical Screening, 5, 141, 1998.
Medical image analysis method
48
93. Blanks, R.G., Wallis, M.G., and Given-Wilson, R.M., Observer variability in cancer detection during routine repeat (incident) mammographic screening in a study of twovs. one-view mammography, J. Medical Screening, 6, 152, 1999. 94. Kita, Y, Highnam, R.P., and Brady, J.M., Correspondence between different view breast X-rays using curved epipolar lines, Computer Vision Image Understanding, 83, 38, 2001. 95. Gopal, S.S., Chan, H.-P, Petrick, N. et al., A regional registration technique for automated analysis of interval changes of breast lesions, Proc. SPIE, 3338,118,1998. 96. Sanjay-Gopal, S., Chan, H.P., Wilson, T. et al., A regional registration technique for automated interval change analysis of breast lesions on mammograms, Med. Phys., 26, 2669, 1999. 97. Hadjiiski, L.M., Chan, H.P., Sahiner, B. et al., Automated identification of breast lesions in temporal pairs of mammograms for interval-change analysis, Radiology, 213(P), 229, 1999. 98. Hadjiiski, L.M., Chan, H.P., Sahiner, B. et al., Interval-change analysis in temporal pairs of mammograms using a local affine transformation, Proc. SPIE, 3979, 847, 2000. 99. Paquerault, S., Sahiner, B., Petrick, N. et al., Prediction of object location in different views using geometrical models, in Proc. 5th International Workshop on Digital Mammography, Toronto, 2001, p. 748. 100. Paquerault, S., Petrick, N., Chan, H.P. et al., Improvement of computerized mass detection on mammograms: fusion of two-view information, Med. Phys., 29, 238, 2002. 101. Sahiner, B., Petrick, N., Chan, H.P. et al., Recognition of lesion correspondence on two mammographic views: a new method of false-positive reduction for computerized mass detection, Proc. SPIE, 4322, 649, 2001. 102. Tabar, L. and Dean, P.B., Teaching Atlas of Mammography, Thieme, New York, 1985. 103. Wolfe, J.N., Analysis of 462 breast carcinomas, AJR, 121, 846, 1974. 104. Murphy, W.A. and DeSchryver-Kecskemeti, K., Isolated clustered microcalcification in the breast: radiologic-pathologic correlation, Radiology, 127, 335, 1978. 105. Millis, R.R., Davis, R., and Stacey, A.J., The detection and significance of calcifica- tions in the breast: a radiological and pathological study, Br. J. Radiol., 49, 12, 1976. 106. Sickles, E.A., Mammographic features of 300 consecutive nonpalpable breast cancers, Am. J. Roentgenol., 146, 661, 1986. 107. Wu, Y., Doi, K., Giger, M.L. et al., Computerized detection of clustered microcalcifications in digital mammograms: applications of artificial neural network, Med. Phys., 19, 555, 1992. 108. Fam, B.W., Olson, S.L., Winter, P.F. et al., Algorithm for the detection of fine clustered calcifications on film mammograms, Radiology, 169, 333, 1988. 109. Mascio, L.N., Hernandez, J.M., and Logan, C.M., Automated analysis for microcalcifications in high-resolution digital mammograms, Proc. SPIE Medical Imaging, 1898, 472, 1993. 110. Brzakovic, D., Brzakovic, P., and Neskovic, M., Approach to automated screening of mammograms, Proc. SPIE, 1905, 690, 1993. 111. Dhawan, A.P., Chitre, Y.S., and Moskowitz, M., Artificial-neural-network-based classification of mammographic microcalcifications using image structure features, Proc. SPIE Medical Imaging, 1905, 820, 1993. 112. Woods, K.S., Solka, J.L., Priebe, C.E. et al., Comparative evaluation of pattern recognition techniques for detection of microcalcifications, Proc. SPIE Medical Imag- ing, 1905, 841, 1993. 113. Chan, H.P, Niklason, L.T., Ikeda, D.M. et al., Digitization requirements in mammography: effects on computer-aided detection of microcalcifications, Med. Phys., 21, 1203, 1994. 114. Fukushima, K., Miyake, S., and Ito, T, Neocognitron: a neural network model for a mechanism of visual pattern recognition, IEEE Trans. Systems Man. Cybernetics, SMC-13, 826, 1983. 115. Lo, S.C.B., Chan, H.P., Lin, J.S. et al., Artificial convolution neural network for medicalimage pattern recognition, Neural Networks, 8, 1201, 1995. 116. Sahiner, B., Chan, H.P., Petrick, N. et al., Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images, IEEE Trans. Medical Imaging, 15, 598, 1996.
Computer-aided diagnosis of breast cancer
49
117. Gurcan, M.N., Sahiner, B., Chan, H.R et al., Selection of an optimal neural network architecture for computer-aided detection of microcalcifications: comparison of auto- mated optimization techniques, Med. Phys., 28, 1937, 2001. 118. Metz, C.E., Some practical issues of experimental design and data analysis in radio- logical ROC studies, Invest. Radiol, 24, 234, 1989. 119. Heath, M., Bowyer, K., Kopans, D. et al., Current status of the digital database for screening mammography, in Digital Mammography, Karssemeijer, N., Thijssen, M., Hendriks, J. et al., Eds., Kluwer Academic, Dordrecht, Netherlands, 1998, p. 457. 120. Bunch, P.C., Hamilton, J.F., Sanderson, G.K. et al., A free response approach to the measurement and characterization of radiographic observer performance, Proc. SPIE, 127, 124, 1977. 121. Gurcan, M.N., Sahiner, B., Chan, H.R et al., Optimal selection of neural network archi- tecture for CAD using simulated annealing, in Proc. 22nd Annual International Confer- ence of IEEE Engineering in Medicine and Biology Society, Chicago, 2000, p. 3052. 122. Gurcan, M.N., Sahiner, B., Chan, H.R et al., Selection of an optimal neural network architecture for computer-aided diagnosis: comparison of automated optimization techniques, Radiology, 217(P), 436, 2000. 123. Gurcan, M.N., Chan, H.R, Sahiner, B. et al., Improvement of Computerized Detection of Microcalcifications Using a Convolution Neural Network Architecture Selected by an Automated Optimization Algorithm, Presented at Medical Image Perception Conference IX, Airlie Conference Center, Warrenton, VA, 2001. 124. Petrick, N., Chan, H.R, Sahiner, B. et al., Automated detection of breast masses on digital mammograms using adaptive density-weighted contrast-enhancement filtering, Proc SPIE Medical Imaging, 2434, 590, 1995. 125. Hadjiiski, L.M., Sahiner, B., Chan, H.R et al., Analysis of temporal change of mammographic features: computer-aided classification of malignant and benign breast masses, Med. Phys., 28, 2309, 2001. 126. Hadjiiski, L.M., Chan, H.R, Sahiner, B. et al., Automated registration of breast lesions in temporal pairs of mammograms for interval change analysis: local affine transfor- mation for improved localization, Med. Phys., 28, 1070, 2001. 127. Peli, T. and Lim, J.S., Adaptive filtering for image enhancement, Optical Eng., 21, 108, 1982. 128. Lunscher, W.H.H.J. and Beddoes, M.R, Optimal edge detection: parameter selection and noise effects, IEEE Trans. Pattern Anal Machine Intelligence, 8, 154, 1986. 129. Marr, D. and Hildreth, E., Theory of edge detection, Proc. Royal Soc. London, Series B, Biological Sci., 207, 187, 1980. 130. Russ, J.C., The Image Processing Handbook, CRC Press, Boca Raton, FL, 1992. 131. Chan, H.-R, Petrick, N., and Sahiner, B., Computer-aided breast cancer diagnosis, chap. 6 in Artificial Intelligence Techniques in Breast Cancer Diagnosis and Prog- nosis, Jain, A., Jain, A., Jain, S. et al., Eds., World Scientific, River Edge, New Jersey, 2000, p. 179. 132. Sahiner, B., Chan, H.R, Petrick, N. et al., Classification of mass and normal breast tissue: an artificial neural network with morphological features, Proc. World Congress Neural Networks, II, 876, 1995. 133. Sahiner, B., Chan, H.R, Petrick, N. et al., Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue on mammo- grams, Med. Phys., 23, 1671, 1996. 134. Petrick, N., Sahiner, B., Chan, H.R et al., Preclinical evaluation of a CAD algorithm for early detection of breast cancer, in Proc. 5th International Workshop on Digital Mammography, Toronto, 2001, p. 328. 135. Wei, D., Chan, H.P., Helvie, M.A. et al., Classification of mass and normal breast tissue on digital mammograms: multiresolution texture analysis, Med. Phys., 22, 1501, 1995.
Medical image analysis method
50
136. Wei, D., Chan, H.P., Petrick, N. et al., False-positive reduction technique for detection of masses on digital mammograms: global and local multiresolution texture analysis, Med. Phys., 24, 903, 1997. 137. Chan, H.P., Wei, D., Helvie, M.A. et al., Computer-aided classification of mammographic masses and normal tissue: linear discriminant analysis in texture feature space, Phys. Med. Biol, 40, 857, 1995. 138. Haralick, R.M., Shanmugam, K., and Dinstein, I., Texture features for image classi- fication, IEEE Trans. Systems, Man, Cybernetics, SMC-3, 610, 1973. 139. Wei, D., Chan, H.P., Helvie, M.A. et al., Multiresolution texture analysis for classi- fication of mass and normal breast tissue on digital mammograms, Proc. SPIE Medical Imaging, 2434, 606, 1995. 140. Norusis, M.J., SPSS for Windows, release 6, professional statistics software, SPSS Inc., Chicago, 1993. 141. Birdwell, R.L., Ikeda, D.M., O’Shaughnessy, K.F. et al., Mammographic character- ization of 111 missed cancers later detected by screening mammography, Radiology, 213(P), 240, 1999. 142. Brem, R.F., Schoonjans, J.M., Hoffmeister, J. et al., Evaluation of breast cancer with a computer-aided detection system by mammographic appearance, histology and lesion size, Radiology, 217(P), 400, 2000. 143. Castellino, R.A., Roehrig, J., and Zhang, W, Improved computer-aided detection (CAD) algorithm for screening mammography, Radiology, 217(P), 400, 2000. 144. Thurfjell, E., Mammography screening: one vs. two views and independent double reading, Acta Radiologica, 35, 345, 1994. 145. Hadjiiski, L.M., Sahiner, B., Chan, H.P. et al., Analysis of temporal change of mammographic features for computer-aided characterization of malignant and benign masses, Proc. SPIE, 4322, 661, 2001. 146. Highnam, R.P., Kita, Y., Brady, J.M. et al., Determining correspondence between views, in Proc. 4th International Workshop on Digital Mammography, Nijmegen, Netherlands, 1998. 147. Kita, Y., Highnam, R.P., and Brady, J.M., Correspondence between two different views of Xray mammograms using simulation of breast deformation, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, 1998, p. 700. 148. Chang, Y.H., Good, W.F., Sumkin, J.H. et al., Computerized localization of breast lesions from two views: an experimental comparison of two methods, Acad. Radiol., 34, 585, 1999. 149. Gopal, S.S., Chan, H.-P, Sahiner, B. et al., Evaluation of interval change in mam- mographic features for computerized classification of malignant and benign masses, Radiology, 205(P), 216, 1997. 150. Hadjiiski, L.M., Sahiner, B., Chan, H.P. et al., Computer-aided classification of malignant and benign breast masses by analysis of interval change of features in temporal pairs of mammograms, Radiology, 217(P), 435, 2000. 151. Sahiner, B., Gurcan, M.N., Chan, H.P. et al., The use of joint two-view information for computerized lesion detection on mammograms: improvement of microcalcification detection accuracy, Proc. SPIE Medical Imaging, 4684, 754, 2002. 152. Morton, A.R., Chan, H.P., and Goodsitt, M.M., Automated model-guided breast segmentation algorithm, Med. Phys., 23, 1107, 1996. 153. Zhou, C., Chan, H.P., Petrick, N. et al., Computerized image analysis: breast seg- mentation and nipple identification on mammograms, in Proc. Chicago 2000: World Congress on Medical Physics and Biomedical Engineering, Chicago, paper no. TH, 2000.
2 Medical-Image Processing and Analysis for CAD Systems Athanassios N.Papadopoulos, Marina E.Plissiti, and Dimitrios I.Fotiadis 2.1 INTRODUCTION Over the last 15 years, several research groups have focused on the development of computerized systems that can analyze different types of medical images and extract useful information for the medical professional. Most of the proposed methods use images acquired during a diagnostic procedure. Such images are acquired using a variety of techniques and devices, including conventional radiography, computerized tomography, magnetic resonance imaging, ultrasound, and nuclear medicine. Computerized schemes have been widely used in the analysis of one-dimensional medical signals such as Electrocardiogram (ECG), Electromyogram (EMG), Electroencephalogram (EEG), etc. However, the majority of medical signals are two-dimensional representations. Computerized systems designed for the automated detection and characterization of abnormalities in these images can provide medical experts with useful information. Such systems are commonly referred to as computer-aided detection/diagnosis systems (CAD). A computer-aided detection procedure does not provide a medical diagnosis. Rather, the computerized system is developed to detect signs of pathology in medical images by extracting features that are highly correlated with the type and the characteristics of the abnormality or the disease under investigation. If a specific area in a radiological image meets the requirements, the computerized scheme identifies it, and the radiologist can review it to improve the accuracy of the detection procedure. On the other hand, computer-aided diagnosis schemes, based on the same or additional features, characterize the identified region according to its pathology. A CAD system is defined as a combination of image-processing techniques and intelligent methods that can be used to enhance the medical interpretation process, resulting in the development of more efficient diagnosis. The computer outcome assists radiologists in image analysis and diagnostic decision making. In addition, a CAD system could direct the radiologist’s attention to regions where the probability of an indication of disease is greatest. A CAD system provides reproducible and quite realistic outcomes. In this chapter, we review two of the most common procedures in CAD systems. The first is related to microcalcification detection and classification in mammograms. In this procedure, features of microcalcifications are extracted, and intelligent meth- ods are then used to classify these features. The second procedure is based on the fusion of
Medical-image processing and analysis for CAD systems
53
intravascular ultrasound and biplane angiographies aiming at the threedimensional (3-D) reconstruction of an artery. 2.2 BASICS OF A CAD SYSTEM Most of the automated CAD approaches include feature-extraction procedures. However, several studies of semi-automated approaches have been reported wherein radiologists manually perform feature-mining procedures by employing various featureextraction modules [1, 2]. CAD systems can be classified in two categories according to their objectives: (a) those that are used to detect regions of pathology and (b) those that are used to classify the findings based on their features, which indicate their histological nature. The role of these computerized systems is to improve the sensitivity of the diagnostic process and not to make decisions about the health status of the patient. However, the “D” in CAD should stand for “diagnosis” [3], although several reports in literature utilize the word “detection” [4], which is undoubtedly an essential part of the diagnostic procedure. For the design and development of an automated CAD system, several issues must be considered, including the quality of the digitized images, the sequence of the processing steps, and the evaluation methodology. Most of the studies use filmscreen images that are digitized using high-performance film digitizers. Recent studies employ high-quality medical images obtained directly in digital format using advanced imaging systems (filmless technology). The specific characteristics of the film digitizer significantly influence the quality of the image. In the case of filmscreen technology, the maximum optical density of the film is a critical parameter in the quality of the final digitized image. In cases where the upper limit of the optical density is low, an estimation of noise is possible during the digitization procedure, especially on the background area (air) of the image. Utilization of filmscreen systems with higher optical densities might lead to the reduction of such noise due to digitization. 2.2.1 COMPUTER-AIDED METHODOLOGIES IN MAMMOGRAPHY Mammography is one of the radiological fields where CAD systems have been widely applied because the demand for accurate and efficient diagnosis is so high. The presence of abnormalities of specific appearance could indicate cancerous cir- cumstances, and their early detection improves the prognosis of the disease, thus contributing to mortality reduction [5]. However, diagnostic process is complicated by the superimposed anatomical structures, the multiple tissue background, the low signal-to-noise ratio, and variations in the patterns of pathology. Thus, the analysis of medical images is a complicated procedure, and it is not unusual for indications of pathology, such as small or low-contrast microcalcifications, to be missed or misinterpreted by radiologists. On the other hand, clinical applications require real- time processing and accuracy in diagnosis. Based on these high standards in diag- nostic interpretation, numerous intelligent systems
Medical image analysis method
54
have been developed to provide reliable automated CAD systems that can be very helpful, providing a valuable “second opinion” to the radiologist [6, 7]. 2.2.2 HISTORICAL OVERVIEW Computerized analysis of radiological images first appeared in the early 1960s [8, 9]. One of the first studies employing computers in the area of mammography was published by Winsberg et al. in 1967 [10]. In this approach, the right- and left-breast shapes were compared to detect symmetry differences. Computation of local image characteristics from corresponding locations with high variations indicated the pres- ence of a disease. Ackerman et al. [11] defined four computer-extracted features for the categorization of mammographic lesions as benign or malignant. Another study by the same research group [12] proposed a computational procedure for the pro- cessing of a feature set with 30 characteristics that are obtained by radiologists for the classification of lesions according to their malignancy. At the same time, several other works targeting detection and characterization of microcalcification clusters appeared in the literature. Wee et al. [13] classified microcalcification clusters as benign or malignant using the approximate horizontal length, the average internal gray level, and the contrast of individual microcalcifications. The cluster pattern together with features such as size, density, and morphological characteristics of the cluster were also used for microcalcification characterization [14]. In the late 1970s, Spiesberger [15] was the first to propose an automated system for the detection of microcalcifications. At the end of the 1980s, the literature was enriched by studies reporting several imageprocessing algorithms and computational processes that provided satisfactory descriptions and efficient procedures for the detection of microcalcifications [16–18]. In 1990, Chan et al. reported that under controlled circumstances, a CAD system can significantly improve radiologists’ accuracy in detecting clustered microcalcifi- cations [19]. 2.2.3 CAD ARCHITECTURE CAD systems proposed in the literature are based on techniques from the field of computer vision, image processing, and artificial intelligence. The main stages of a typical CAD scheme are: preprocessing, segmentation, feature analysis (extraction, selection, and validation), and classification utilized either to reduce false positives (FPs) or to characterize abnormalities (Figure 2.1). A description of the methods employed in each stage is given in the following sections. 2.2.4 PREPROCESSING In this stage, the subtle features of interest are enhanced and the unwanted characteristics of the image are de-emphasized. The enhancement procedure results in a better description of the objects of interest, thus improving the sensitivity of the detection system and leading to better characterization in the case of diagnosis. The enhancement of the contrast of the regions of interest, the sharpening of the abnor- malities’ boundaries, and the suppression of noise is performed in this stage. Several
Medical-image processing and analysis for CAD systems
55
methodologies have been reported in the literature based on conventional imageprocessing techniques, region-based algorithms, and enhancement through the trans- formation of original image into another feature space. Global processing can be performed, or local adjusting enhancement parameters can be used to accommodate the particularity of different image areas. Morphological, edge-detection, and band-pass filters have been utilized. An enhanced representation can be obtained using subtraction procedures on the pro- cessed image [18]. One of the earliest contrast-enhancement methodologies was the modification of image histogram [20] and its equalization [21]. The resulting image contains equally distributed brightness levels over the gray-level scale. Because the mammogram contains areas of different intensity, a global modification is poor.
FIGURE 2.1 CAD architecture. Performance can be improved utilizing local adjustments of the processing param- eters (adaptive histogram equalization) [22]. Another technique restricts the meth- odology to certain contrast values to increase the effective range of contrast in the specific areas (contrast-limited adaptive histogram equalization) [23]. Unsharp masking is a routinely used procedure to enhance the fine-detail struc- tures. A high-spatial-frequency component multiplied by a weight factor is added on the original image. In the case of linear unsharp filtering, the above parameters are constant throughout the entire image. In nonlinear methodologies, the weighting factor depends on the intensity of the examined region (background/foreground), or it can be applied differently in different resolution levels in multiscale approaches [24]. Contrast stretch is a rescaling of image gray levels based on linear or nonlinear transformations. In linear transformations, the difference between the background and foreground areas is increased to improve the contrast of both areas. Introducing a
Medical image analysis method
56
nonlinear transformation, the contrast of the different parts of the image is modified, selectively enhancing the desired gray levels. In most medical images, objects of interest have nonstandard intensities, thus the selection of a proper “intensity win- dow” is not sufficient for contrast enhancement. The adaptive neighborhood contrast-enhancement method improves the contrast of objects or structures by modifying the gray levels of the neighborhood (contextual region) of each pixel from which the object is composed. After the identification of homogeneous areas (using, for example, a growing technique) several conditions are imposed to downgrade unconventional high-contrast areas or low-level noise and to enhance regions surrounded by variable background [25]. Techniques that enhance regions of interest by estimating their difference from their background areas are called region-based enhancement techniques. Typical region growing techniques, which employ contrast and statistical conditions, result in the definition of the extent and the shape of the objects [26]. Multiresolution methods, based mainly on wavelet analysis, are used to enhance the features of mammographic images [27]. A multiscale analysis of the original mammogram to several subband images provides the advantage of studying each subband independently using scale characteristics. Each subband provides informa- tion based on different scales resulting in the representation of high- or low-fre- quency elements on separate images. Thus, noise or similar type components of the image can be described in high resolution (small scale), while subtle objects with defined extent or large masses are described in medium-resolution and low-resolution levels (medium and coarse scales), respectively. Hence, the significant image features can be selectively enhanced or degraded in different resolution levels [28]. Further- more, adaptive approaches in wavelet enhancement techniques that ensure the avoid- ance of the utilization of global parameters have been reported [29]. Fuzzy-logic techniques are also used for contrast enhancement of microcalcifications [30]. Global information (brightness) is employed to transform an image to a fuzzified version using a function, while local information (geometrical statistics) is employed to compute the nonuniformity. Methods that are based on deterministic fractal geometry have been used to enhance mammograms [31–33]. A fractal-image model was developed to describe mammographic parenchymal and ductal patterns using a set of parameters of affine transformations. Microcalcification areas were enhanced by taking the difference between the original image and the modeled image. 2.2.5 SEGMENTATION In this stage, the original mammographic image is segregated into separate parts, each of which has similar properties. The image background, the tissue area, and the muscle or other areas can be separated because they are characterized using generic features. Moreover, apart from the generic classification of image regions, a CAD segmentation procedure can identify regions containing small bright spots that appeared in groups and that correspond to probable microcalcifications and their clusters. The complexity of a segmentation procedure depends on the nature of the original image and the characteristics of the objects that have to be identified. A mammographic image contains
Medical-image processing and analysis for CAD systems
57
several regions having different attenuation coeffi- cients and optical densities, resulting in intensity variations. In addition, because a mammogram is a two-dimensional (2-D) representation of a 3-D object, the over- lying areas develop a complex mosaic composed of bright regions that may or may not be a real object. Thus, the implementation of a global single threshold or a set of fixed thresholds that defines intensity ranges is not an efficient segmentation procedure. Moreover, the employment of a global intensity threshold usually increases the number or the size of the selected regions introducing noise, which makes the procedure inefficient because noise removal requires further treatment. In any case, after the first partitioning has been achieved, region-growing techniques, following specific homogeneity and differentiation criteria, can be utilized to define the real extent and the exact borders of the segmented region. To overcome the limitations of a global thresholding methodology, local thresholding criteria must be utilized from the beginning. The definition of the parameters that satisfy the demands of the segmentation algorithm increase the efficiency of the technique. The corresponding measures were calculated for a specific window size. Some of the local thresholding criteria are: The mean intensity values plus/minus a number of standard deviation (SD) values of intensity [16] The difference of the intensity value of a seed pixel from the maximum and minimum intensity values of pixels that belong to a specific neighborhood around a seed pixel [34] A contrast measure equal to the difference of intensity between object and background region [35] An object is selected only if the feature value belongs to the highest 2% of the values obtained. In a similar but more flexible way, adaptive filtering methodologies have been proposed, defining parameters or measures adjusted to a specific area. A feature called prediction error (PE) is the difference between the actual pixel value and the weighted sum of the eight nearest-neighbor pixels [36]. If PE follows a Gaussian distribution, calcifications are not present. Functions using first, second, and third moments of the PE are used to generate a threshold value that reveals the presence of the microcalcifications. In another study [37], given a local maximum pixel value x0,y0, an edge pixel is given by the value of x, y that maximizes the difference in pixel values between pixels at x, y and x0, y0, divided by the distance between the two pixels. Mathematical morphology filtration has been used to segment the microcalcifications. Classical erosion and dilation transformations, as well as their combinations such as open, close, and top-hat transformations, are employed [38]. In statistical approaches, several histogram-based analysis and Markov random field models are used [39, 40]. Markov random fields have been used to classify pixels to background, calcification, line/edge, and film-emulsion errors [41]. Multiscale analysis based on several wavelet transformations has been used to enable the segmentation process to be performed using the different scales-levels [42, 43]. Furthermore, as in the preprocessing module, techniques have been applied exploit- ing fractal [44] and fuzzylogic methodologies [45].
Medical image analysis method
58
2.2.6 FEATURE ANALYSIS (EXTRACTION, SELECTION, AND VALIDATION) In this stage, several features from the probable microcalcification candidates are extracted to reduce false positives. In any segmentation approach, a considerable number of normal objects are recognized as pathological, which results in reduced efficiency of the detection system. To improve the performance of the scheme, several image features are calculated in an effort to describe the specific properties or characteristics of each object. The most descriptive of these features are processed by a classification system to make an initial characterization of the segmented samples.
TABLE 2.1 Features for the Detection and Characterization of Microcalcifications and Their Clusters Microcalcification (MC) Cluster Classification Features
Radiologists’ Characterization Features
Number of MCs in cluster
Cluster elements (separable/countable)
Cluster area
Cluster size
Mean MC area
MC size
SD of MC area
Shape of elements within cluster
Mean MC compactness
Shape of elements within cluster
Mean MC elongation
Shape of elements within cluster
SD of MC elongation
Shape of elements within cluster
SD of MC intensity
Density of calcifications
Mean MC background intensity
Density of calcifications
Mean contrast
Contrast of calcifications
Cluster eccentricity
Shape of cluster
Mean distance from cluster centroid
Calcification distribution
Neighboring with a larger cluster
Cluster distribution
Cluster entropy
Calcification distribution
Spreading of MCs in cluster
Calcification distribution
Cluster elongation
Cluster shape
Mean local MC background
Density of calcifications
Mean MC intensity
Density of calcifications
SD of MC compactness
Shape of elements within cluster
SD of distances from cluster centroid
Calcification distribution
Area of the cluster convex hull
Shape of cluster
Medical-image processing and analysis for CAD systems
Length of the cluster convex hull
59
Shape of cluster
Although the number of calculated features derived from different feature spaces is quite large, it is difficult to identify the specific discriminative power of each one. Thus, a primary problem is the selection of an effective feature set that has high ability to provide a satisfactory description of the segmented regions. Early studies utilized features that were similar to the features that radiologists employ during their diagnosis. However, as mentioned previously, additional features not employed by the doctors also have high discrimination power. Table 2.1 provides a list of typical morphological features of individual microcalcification and their clusters. Specific features could be extracted, such as the surround region dependence matrix (SRDM), gray-level run length (GLRL), and gray-level difference (GLD) [46]. Laplacian or Gaussian filtration can be used in the validation of features [47]. Using wavelet analysis, features such as energy, entropy, and norms of differences among local orientations can be extracted [48]. The use of a large number of features does not improve the classification performance. Indeed, the use of features without discriminative power increases the complexity of the characterization process. In addition, the probability of misclassification increases with the number of features. Moreover, the prediction variability is larger, and the classifier is sensitive to outliers. Finally, the more features included in a given classifier, the greater is the dimension of a training set needed for the same degree of reliability [49]. The selection of the optimal feature subset is a laborious problem. Only an exhaustive search over all subsets of features can provide the system with a reliable subset. Usually, the criterion of selecting an efficient subset of features is the minimization of misclassification probability (classification error). However, for the testing of a subset, a classifier must be chosen, and it is important to consider that different classifiers and different methods for the estima- tion of error rate could lead to the selection of a different feature subset. One of the most important issues of a mammographic CAD system is the selection of a standard feature set and the classification method that is used to extract regions of pathological interest while minimizing false-positive findings. The selec- tion of the appropriate features can be based on “weighting factors” proposed by radiologists [50– 53] or on algorithmic procedures that identify the most discriminant features. The feature space can be a transformed space that has lower dimension than the original, although its discriminating power could be higher. To achieve this, PCA (principal component analysis), which is based on the elimination of features that contribute less, can be used [54, 55]. Alternatively, the most discriminative features can be selected, reducing in this way the size of the feature set. Several methods have been proposed, such as: Stepwise discriminant analysis [56] Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) [57] Genetic algorithms [58]
Medical image analysis method
60
Stepwise discriminant analysis is based on the sequential trial of different feature subsets. The one that results in the smallest error rate is chosen as the most convenient [59–61]. Sequential forward selection is a bottom-up search procedure where one feature at a time is added to the feature set. At each stage, the feature to be included in the feature set is selected from among the remaining features [57, 62, 63]. Genetic algorithms have been used to select features that could enhance the performance of a classifier (for distinguishing malignant and benign masses). In the same way, genetic algorithms have been used to optimize the feature set for the characterization of microcalcifications [64, 65]. 2.2.7 CLASSIFICATION SYSTEM (REDUCTION OF FALSE POSITIVES OR CHARACTERIZATION OF LESIONS) Diagnosis is an integrated medical procedure that is defined as the art or act of recognizing the presence of a disease from its signs or symptoms. During the entire process, especially in the case of differential diagnosis, it is obvious that there are several dilemmas for the rejection or acceptance of probable diseases. Thus, a classification system is an essential part of a CAD system. Classification schemes range from techniques that classify lesions according to their different types (stellate, circumscribed masses, or calcifications) [66] to techniques that produce binary diagnosis, characterizing the findings as malignant or benign. The classifiers that are utilized in the area of the detection of mammographic microcalcification are those employed in most of the medical image-analysis procedures. They could be categorized in the following classes: Conventional classifiers Artificial neural networks Fuzzy-logic systems Support-vector machines 2.2.7.1 Conventional Classifiers 2.2.7.1.1 Rule-Based Systems (Decision Trees) The decision tree is one of the most widely used techniques for the extraction of inductive inference. As a learning method, it aims at the definition of an approxi- mating discrete-value target function in which the acquired knowledge is represented as a decision tree. The architecture of the classifier includes a set of “if-then” rules. A decision-tree scheme includes a main root node, from where the classification procedure starts, and several leaf nodes where the classification of the instance is given. Each node in the tree specifies a check of an attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this specific attribute. An instance is categorized beginning from the root node and, by checking the attribute specified by this node, moving down to the specific tree branch that is responsible for the value of this attribute. A similar procedure is replicated if a new tree is rooted at the new node.
Medical-image processing and analysis for CAD systems
61
From the early studies of microcalcification detection and characterization in mammography, rule-based systems provide a remarkable assistance in the simulation of the diagnosis process carried out by a radiologist [67, 68]. Although, the conver- sion of medical rules to “if-then” rules is a feasible task, the development of a highperformance system has not been achieved. This is due to the absence of attributevalue pair representations in medical data and the lack of disjunctive descriptions or large data sets for system training that include all the specific disease cases. 2.2.7.1.2 Bayesian Quadratic and Linear Classifiers (Statistical) A Bayesian classifier is based on the approximation of the class-conditional probabilistic density functions (PDFs). Each PDF expresses the frequency of occurrence of each sample in the feature space. Typically, an unknown sample is classified to a class with the highest value of its PDF. The problem is that the precise approxi- mation of the PDFs has to be defined [62]. Quadratic and linear classifiers are statistical (parametric) methods that utilize Gaussian distributions for the PDFs. The mean vector and the covariance matrix are estimated from the training set of each class. In the case of a Bayesian quadratic classifier (BQC), the classification boundary forms a quadratic curve. In the case of a Bayesian linear quadratic (BLQ) classifier, instead of using different covariance matrices for the individual classes, one unified covariance matrix is used for all classes, and the classification border is a straight line. 2.2.7.1.3 Nonparametric When the underlying distributions of the samples are quite complex, additional techniques can be employed to approximate the PDFs. The K-nearest neighbor and the Parzen estimate belong to this category. In the K-nearest-neighbor technique, the classification boundary is directly constructed instead of calculating the PDFs [69]. For an unknown sample, distances to the individual training samples are calculated, and the major class in the nearest K samples is selected. The Parzen estimate method is used if the distribution is complex, and its generation is quite difficult. Numerous kernel functions that describe the individual training samples are summed up to calculate the complex PDF [70]. 2.2.7.2 Artificial Neural Networks (ANNs) A neural network is a structure that can be adjusted to produce a mapping of relationships among the data from a given set of features. For a given set of data the unknown function, y=f(x), is estimated utilizing numerical algorithms. The main steps in using ANNs are: First, a neural-network structure is chosen in a way that should be considered suitable for the type of the specific data and the underlying process to be modeled. The neural network is trained using a training algorithm and a sufficiently representative set of data (training data set). Finally, the trained network is evaluated with different data (test data set), from the same or related sources, to validate that the acquired mapping is of acceptable quality.
Medical image analysis method
62
Several types of neural networks have been reported, such as feedforward [12, 20, 36, 43, 48, 55, 57], radial basis function [71], Hopfield [72], vector quantization, and unsupervised types such as self-organizing maps [73]. A review of the role of the neural networks in image analysis is reported by Egmont et al. [74]. Because feedforward backpropagation and radial basis function neural networks are the most common, a brief description of these network architectures can be meaningful. Typically, a neural network is a structure involving weighted interconnections among neurons (nodes), which are typically nonlinear scalar transformations. Figure 2.2 shows an example of a two-hiddenlayer neural network with three inputs, x={x1, x2, x3}, that feed each of the five neurons composing the first hidden layer. The five outputs from this layer feed each of the three neurons that compose the second hidden layer, which, in a similar way, are fed into the single-output-layer neuron, yielding the scalar output, The layers of neurons are called hidden because their outputs are not directly seen in the data. The inputs to the neural network are feature vectors with dimensions equal to the amount of the most significant features. Several training algorithms are implemented before selecting the one that is “most suitable” for the network training. Gradient descent, resilient back-propagation, conjugate gradient, quasi-Newton, and Levenberg-Marquardt are some of the most common training methods [75]. develop more reliable classification approaches. Fuzzy set theory is an approach to resolve this problem. Initially, fuzzy sets are integrated into rule-based expert sys- tems to improve the performance of decision-support systems. Fuzzy procedures can also be used to automatically generate and tune the membership functions on the definition of different classes. Image-processing techniques have been reported employing different feature sets defined in a fuzzy way. Intelligent methodologies and pattern-recognition techniques have been used to introduce fuzzy clustering and fuzzy neural-network approaches [76].
2.2.7.3 Fuzzy-Logic Systems Classification reduces the nonstatistical uncertainty. Statistical uncertainty can be handled in several ways, so the nonstatistical uncertainty must be decreased to However, fuzzy sets can be utilized in more than one stage of a classifier design. Fuzzy inputs can also be used, wherein the original input values are converted to a more “blurry” version. For instance, instead of using the exact values of the feature vector, a new vector consisting of feature values expressing the degree of member- ship of the specific value to the fuzzy sets (e.g., small, medium, large) can be used. Fuzzy reasoning can be utilized in classification processes in which the inferences are not strictly defined. The categories in a medical classification procedure are exclusive. Thus, every sample belongs to a specific category. However, in some cases, an unknown sample belongs to more than one class, but with a different degree of membership. In such cases, the classification scheme is based on the utilization of fuzzy classes.
Medical-image processing and analysis for CAD systems
63
FIGURE 2.2 A feedforward neural network with three inputs and two hidden layers com- posed of five and three neurons, respectively, and one output neuron. 2.2.7.4 Support-Vector Machines Another category of classification methods that has recently received considerable attention is the support-vector machine (SVM) [77–80]. SVMs are based on the
FIGURE 2.3 A nonlinear SVM maps the data from the feature space D to the
Medical image analysis method
64
high-dimensional feature space F using a nonlinear function. definition of an optimal hyperplane that separates the training data to achieve a minimum expected risk. In contrast to other classification schemes, an SVM aims to minimize the empirical risk Remp while maximizing the distances (geometric margin) of the data points from the corresponding linear decision boundary (Figure 2.3). Remp is defined as (2.1) where is the training vector belonging to one of two classes / is the number of training points indicates the class of xi f is the decision function The training points in the space RN are mapped nonlinearly into a higher dimensional space F by the function (a priori selected): RN→F. It is in this space (feature space) where the decision hyperplane is computed. The training algorithm uses only the dot products ((xi)·(xj)) in F. If there exists a “kernel function” K such that K(xi, xj) =(xi)-(xj), then only the knowledge of K is required by the training algorithm. The decision function is defined as (2.2) where ai represents the weighting factors and b denotes the bias. After training, the condition ai>0 is valid for only a few examples, while for the others ai=0. Thus, the final discriminant function depends only on a small subset of the training vectors, which are called support vectors. Several types of kernels have been reported in the literature, such as the polynomial type of degree p K(xi, x)=(xi·x+1)p (23) and the Gaussian kernel (2.4) where σ is the kernel width. 2.2.8 EVALUATION METHODOLOGIES
Medical-image processing and analysis for CAD systems
65
The evaluation of a classification system is one of the major issues in measuring the system’s performance. From the early beginning, researchers have utilized sev- eral performance indexes to estimate the diagnostic system’s ability to distinguish accurately the samples in their classes. True-positive (TP) rate and false-positive (FP) rate are indexes that partially indicate the classification performance of a system. The TP rate represents the percentage of “diseased” samples that are correctly classified as “diseased,” and the FP rate represents the percentage of normal samples that are incorrectly classified as “diseased.” However, in most of the statistical classification systems, the adjustment of certain algorithmic parameters can modify their operating points, resulting in the achievement of different pairs of TP and FP rates. Such behavior introduces questions about the selection of the appropriate training parameters of the system and results in difficulties in evaluating the system’s actual performance for different degrees of confidence. The receiver operating characteristic (ROC) methodology is the most widely used scheme for evaluating the performance of a CAD system. ROC analysis over- comes the problem of a fixed selection of the classification parameters. A 2-D graphical representation of all corresponding single points, expressing each pair of TP and FP rates, gives the overall performance of the system. It is generated by plotting the true-positive rate (sensitivity) against the false-positive rate (1-specific- ity) for various threshold values (Figure 2.4). The ROC curve represents the trade- off between the TP/FP values and changes in the criterion for positivity [81]. The area under curve (AUC, Az) is a measure of the diagnostic performance of the classifier. The Az value defines the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. It is possible for a classifier with a lower Az value to have higher classification ability, in a specific point, than another having higher Az value. Nevertheless, the Az value is an efficient measure of the classification performance. Alternative evaluation methodologies are the free ROC (FROC) [82] and the locationspecific ROC (LROC) [83]. In the FROC technique, the detection outcome of a CAD system, for each image, contains normal or abnormal objects that are characterized as TP or FP findings if they are in the area of real or fake detections, respectively. The FROC curve is created by a plot of TP rate vs. the number of false positive samples per image. In the case of the LROC methodology, only one object is contained in each image or, in the case of a normal exam, none. The annotation of the database is performed by radiologists, who localize the abnormalities on each image. A simpler version of the FROC method is the alternative free-ROC (AFROC) technique [84]. ROC methodologies impose limitations in their application to dif- ferent medical diagnostic systems such as limited data sets, independence of samples, the lack of categorical rating in the characterization, and the absence of indexes that can characterize the detection difficulty of a specific sample [85, 86]. A unified ROC methodology that can be used efficiently for all CAD systems does not exist.
Medical image analysis method
66
FIGURE 2.4 ROC curves indicating the performance of three different classification systems. 2.2.9 INTEGRATED CAD SYSTEMS The research tasks that have been proposed for more than 15 years in the area of computer-aided detection in mammography have been integrated into efficient clin- ical devices that can provide useful information to radiologists. To date, three CAD systems have been approved by the U.S. Food and Drug Administration as clinical devices valuable in detection of pathological areas/objects in mammography. These systems are the ImageChecker (R2 Technology) [87], the Second Look Digital/AD (CADx Medical Systems) [88], and MammoReader (Intelligent Systems Software) [89]. However, other systems have also been developed and are being clinically evaluated. Some of these systems are: Mammex TR (Scanis Inc.) [90], the Promam (DBA Systems Inc.) [91], and the MedDetect (LMA & RBDC) [92]. The perfor- mances of the clinically approved systems have been evaluated by several research groups or organizations [93– 96].
Medical-image processing and analysis for CAD systems
67
2.3 COMPUTER-AIDED METHODOLOGIES FOR THREEDIMENSIONAL RECONSTRUCTION OF AN ARTERY The modules of a CAD system for the detection and characterization of abnormalities in mammography have been described in Section 2.2. Those systems take advantage of the specific appearance of the breast tissue depicted utilizing X-rays. However, similar image-analysis and artificial-intelligence techniques can be applied in med- ical images obtained by different imaging modalities. One such case is intravascular ultrasound (IVUS) images, which are acquired using ultrasonic signals to depict the inner structure of arteries. Detection of the actual borders of the lumen and plaque in vessels is crucial in defining the severity of arterial disease. Diagnostic ultrasound has become the most common imaging modality, and the number of clinical appli- cations for ultrasound continues to grow. Coronary artery disease is the most common type of heart disease and the leading cause of death both in men and women in Europe and the U.S. The main cause of coronary artery disease is atherosclerosis, which results in hardening and thickening of the inner lining of arteries. Deposits of fatty substances, cholesterol, cellular waste products, calcium, and other substances build up in the arterial wall, resulting in the development of atheromatic plaque. As a consequence, partial or total obstruction of blood flow in the artery can occur, which can lead to heart attack. Early diagnosis and accurate assessment of plaque position and volume are essential for the selection of the appropriate treatment. Biplane coronary angiography has been used as the “gold standard” for the diagnosis of coronary narrowings and guiding coronary interventions. On the other hand, intravascular ultrasound (IVUS) is an interventional technique that produces tomographic images of the arterial segments. These techniques are considered to be complementary because the first provides information about the lumen width and the vessel topology, while the second permits direct visualization of the arterial wall morphology. Today, IVUS is used extensively as a routine clinical examination that assists in selecting and evaluating therapeutic intervention such as angioplasty, atherectomy, and stent placement. The aim of IVUS and angiographical image processing is the extraction of valuable diagnostic information about the nature of alternations of lining of arteries and the three-dimensional vessel morphology. Quantitative estima- tions of plaque thickness, volume, and position in the arterial wall are obtained from the processing of the acquired images. Sophisticated modeling techniques combining images from both modalities allow the three-dimensional (3-D) reconstruction of the arterial segment and provide useful geometrical and positional information about the shape of the lumen in 3-D space. The following sections describe several auto- mated methods for quantitative analysis of IVUS images and techniques for the extraction of three-dimensional vessel models with fusion of IVUS and angiographical data.
Medical image analysis method
68
FIGURE 2.5 (a) Cross-sectional pattern appearance of IVUS images; (b) borders of interest in IVUS image. 2.3.1 IVUS IMAGE INTERPRETATION IVUS is an invasive catheter-based imaging technique that provides 360° radial images in a plane orthogonal to the long axis of the catheter. IVUS image sequences consist of cross-sectional images of the arterial segment and are acquired with the insertion of a catheter in the vessel. The reflection of the ultrasound beam as it passes through the different layers and the scattering of the material give rise to a typical image pattern that can then be used to identify different regions in IVUS images. Figure 2.5 shows a schematic diagram of the cross-sectional anatomy of an artery as well as an original depiction in IVUS images. There are two key landmarks in IVUS images that assist in the correct interpre- tation of arterial structure: the lumen/intima border and the media/adventitia border. Each one is recognized in IVUS images by its location and its characteristic appear- ance. As seen in Figure 2.5(b), the first bright interface beyond the catheter itself is the lumen/intima border. Moreover, the media is usually a discrete thin layer that is generally darker than intima and adventitia. The appearance of intima, media, and adventitia follows a doubleecho pattern showing a circumferentially oriented par- allel bright-dark-bright echo pattern that is referred to as the “typical” three-layered appearance. In IVUS images of normal arteries, the three-layered appearance may not be visible because the intima may be too thin or there may be sufficient collagen and elastin in the media of some arterial segments for it to blend with the surrounding layers. In addition, in highly diseased vessels, the media may be very thin to register as a separate layer on ultrasound images. It is more likely that the media is clearly defined over only a part of the vessel circumference. In such cases or in noisy images, the identification of the media/adventitia border is obtained by the transition in “texture” of regions corresponding to plaque and adventitia. In sequential IVUS frames, plaque can be distinguished from blood flowing in
Medical-image processing and analysis for CAD systems
69
the lumen, because plaque echoes exhibit a constant pattern, while blood has a highly speckled and changing pattern over time. Besides the information about the amount and distribution of the plaque, IVUS images provide a detailed description of plaque composition. The ultrasonic appear- ance of atherosclerotic plaque depends on its composition, and several components of plaque can be identified in IVUS images. During clinical imaging, several practical methods are used to enhance the appearance of the different parts of the vessel. Saline injections help real-time visualization of luminal border [97]. Injection of echo-contrast is another useful technique for the detection of vessel borders [98]. Although these injections assist in the better visualization of the arterial segment, they can also interrupt continuous recording or even increase intracoronary pressure, which will result in erroneous geometric measurements of the vessel components. 2.3.2 AUTOMATED METHODS FOR IVUS ROI DETECTION The vast amount of data obtained by a single IVUS sequence renders manual processing a tedious and time-consuming procedure. Furthermore, manually derived data are difficult to reproduce because interobserver and intraobserver variability can reach up to 20% [99]. Accurate automated methods for the detection of the regions of interest in IVUS images improve the reproducibility and the reliability of quantitative measures of coronary artery disease. Those methodologies usually take advantage of the characteristic appearance of the arterial anatomy in twodimensional IVUS images and the connectivity of frames in the entire IVUS sequence. 2.3.2.1 IVUS Image Preprocessing IVUS frames contain noise, and the actual boundaries of regions of interest (ROIs) are difficult to identify in many cases. A preprocessing step is essential in removing speckles and artifacts that can interfere with the detection of desired boundaries. Usually, in IVUS images, calibration marks are included for quantitative measurements because they provide useful information about the real dimensions of the vessel. To remove all of the bright pixels constituting the calibration markers, substitution of their gray-level value by the average or the median value evaluated in the neighborhood of each pixel must be carried out [100, 101]. This operation may be preceded by automated identification of the mark location based on the expected position and isolation of the corresponding pixels using thresholding techniques [100]. Furthermore, the detection of regions of interest in IVUS images is restricted by the existence of weak edges, and image enhancement is required. To enhance image features, common image-processing techniques are used: median filtering [99, 101–103], Gaussian smoothing [101, 102], and nonlinear diffusion filtering based on Euclidean shortening [102]. Repeated application of these filtering techniques is acceptable for noise reduction. For contrast enhancement, a local columnwise his- togram stretching can also be used [99]. A technique for blood noise reduction (BNR) in the imaged vessel lumen has also been proposed [104]. This technique results in the edge enhancement of highfrequency
Medical image analysis method
70
IVUS images using a combination of spatial and temporal filtering, before an automated algorithm for border detection is applied. The ratio between the highand low-frequency components is calculated using a fast Fourier transform, and pixels are assigned as blood speckle or as tissue by thresholding this ratio. Different filtering techniques are applied to blood and tissue. A limitation of the BNR algo- rithm arises from the hypothesis that tissue tends to be more consistent over time and space than blood noise. However, pulsating motion of the arterial wall during the cardiac cycle may disguise temporal or spatial fluctuations in the signals from the vessels and thus affect the performance of the method. Many techniques include a coordinate transformation [99, 100, 104, 108] to restore the original polar format of the image data from the Cartesian values. This results in the “straightening” of the borders of the regions of interest in IVUS images. The coordinate transformation allows rectangular kernel sizes and linear convolution (kernel motion) paths, and assists in the construction of searching graphs for the extraction of the desired region borders. 2.3.2.2 IVUS Image Segmentation Segmentation of IVUS images is a difficult task because of their complexity. The efficiency of segmentation methods, which include a combination of thresholding techniques, region growing, or dynamic contour models, has been examined in several studies [99, 102]. In addition, more sophisticated techniques that exploit the expected pattern of the regions of interest in IVUS data have been proposed [100, 101, 103, 104, 106, 108, 109, 111, 129]. Some of the earlier work on segmentation of IVUS images was based on heuristic graph-searching algorithms to identify an optimal path in a two-dimensional graph corresponding to the desired border [100, 104, 105]. For the accurate identification of borders using graph searching, an appropriate cost function associated with the graph is necessary. Sonka et al. [100] have developed a method for detecting the internal and external elastic lamina and plaque-lumen interface. First, the searching space in the image, which includes the vessel except for the inner area of the lumen, is determined. After the application of two different edge operators, the resulting edge subimages are resampled in directions perpendicular to the outer or inner boundary of the ROI. Those images are used to construct the laminae-border detection graph and the plaque-border detection graph. Different cost functions are used for the detection of each border. A compromise between the edge information of the image and the a priori knowledge obtained from the shape of the ROI is achieved in the cost function. After assigning the appropriate cost in all nodes of each graph, the optimal path forming a closed boundary is defined as the path with the minimum sum of costs of all nodes of the path. The previously described BNR algorithm was combined with a graph-searching method for the detection of external elastic membrane (EEM) and lumen borders [104]. Gray images are converted into edge ones in rectangular format. A searching graph is constructed, with costs associated to the larger dynamic change of gray level, the direction of change, and the likelihood of intensity in a given ROI. A different searching strategy is performed for the detection of each border, and the path with the minimum
Medical-image processing and analysis for CAD systems
71
accumulative cost is generated, considering the continuity in connecting nodes. Finally, the searched paths are mapped back to the original image to form the desired border. A texture-based approach has also been proposed [111] for the segmentation of IVUS images. Textural operators were used to separate different tissue regions, and morphological processing was used to refine extracted contours. The first step of the method is the extraction of texture features and the association of the feature vector to every image point. A neighborhood of 15×15 pixels was used for the extraction of the fraction of image in runs (FOUR) measure and the mean gray-level measure (MGL). A histogram for each measure was constructed, and a threshold t for both histograms was automatically selected that maximizes the interclass vari- ance between regions separated by threshold t. Thus, since the lumen area is char- acterized by the absence of textural properties, all pixels with measure FOIIR(x,y) below the threshold tFOIIR are classified into the lumen region. Accordingly, all pixels with texture measures MGL(x, y) above the threshold tMGL are grouped into the adventitia region. Afterward, a contour refinement was performed to remove errors due to noise or distortions. A priori knowledge about the size and shape of the blood vessel is used for the removal of inadequate-shaped objects and the selection of appropriate structuring elements for the morphological processing that follows, which results in improvement of the detected contours. Methods that are based on the expected similarity of the regions of interest in adjacent IVUS frames and that take into account the fact that the sequence of frames constitutes a three-dimensional object have also been proposed [106–110]. Li et al. [106] used a combination of transversal and longitudinal contour-detection tech- niques on the entire IVUS image sequence. The first step in this technique is the reconstruction of longitudinal views of the vessel, using two perpendicular planes, parallel to the longitudinal axis of vessel. In these planes, the contours corresponding to vessel and lumen borders are automatically detected using a minimum-cost algo- rithm. The longitudinal contours intersecting the planes of the transverse images are represented as edge points, guiding the final automated contour detection in the cross-sectional IVUS images by defining the positions that the border line should pass. A cost matrix is constructed for each IVUS image, with very low values corresponding to the predefined four points. With the application of the minimumcost algorithm on the cost matrix, an optimal closed curve passing through these points is obtained, which forms the border of the region of interest. The same strategy is adopted for several studies on IVUS images [107, 109, 110]. A similar method that also includes stent detection in transversal images has been proposed [108]. The stent-contour detection is performed only in the transversal images because the appearance of the stent struts is much less regular in longitudinal planes. First, the image is polar-transformed using the catheter as the center. A cost matrix is used, whose element values depend on the intensity and the distance of the corresponding pixel toward the catheter, and weight factors are also determined. An initial model is created by applying the minimum-cost algorithm on the matrix, and a second detection of the strut location is performed, resulting in a more refined stent-contour detection. Longitudinal vessel detection is also performed, and the vessel contours are detected simultaneously. In particular, both sides of the vessel are searched for the selection of a strong transition at one side and a transition on the other side that best match with the morphologic continuity and the geometric characteristics of the vessel. Stent restrictions
Medical image analysis method
72
are also used, forcing the vessel contour to lie outside the already detected stent. In this way, limitations on contour detection that arise from the presence of calcified plaque or side branches in images are overcome. The lumen contours are detected in the same longitudinal images using information about the location of the catheter, the previously defined vessel contour, and the gradient of the image. The contour detection in transversal images is guided by the attraction points extracted from longitudinal contours. Segmentation methods based on active contour models have also been proposed for the processing of IVUS images [101, 103, 112]. The main advantage of active contour models (“snakes”), compared with traditional edge detection approaches, is that they incorporate spatial and image information for the extraction of smooth borders of the regions of interest. An initial estimation of the wanted border must be given as well as the curve deformations to obtain the final optimum shape. Thus, isolated artifacts are ignored when they interfere with the smoothness of the curve. A snake deforms under the influence of internal and external forces [113]. The position of the snake can be represented by the curve v(s)=[x(s), y(s)], where
[0, 1] is the arc length, and x, y are the Cartesian coordinates of each point of the curve. The energy of the snake is given as (2.5) where Eint represents the internal energy of the snake due to bending, and Eimage is derived from image data. The use of active-contour principles is suitable for border detection in IVUS images because the desired borders are overall piecewise smooth with a low-variance curvature. Algorithms that are based on active-contour models have to overcome one major limitation arising from the classical snake properties. In particular, they must ensure that the initial contour is placed close enough to the desired solution to avoid unwanted convergence into a wrong (local) minimal solution. A method based on active-contour models is described in the literature [103]. The initial estimation of the ROI border is given by the observer at the first frame of the IVUS sequence, near the desired boundaries. The image force is appropriately modified to force the snake to rest at points that separate large homogeneous regions (placed on the boundary of lumen/media and media/adventitia). The minimization of the energy function is performed by a Hopfield neural network [114]. The method is further modified to detect the outer vessel boundary when calcium is present [129]. Under the perspective that the sequence of IVUS frames constitutes a threedimensional object, active-contour principles in 3-D space can be used to extract the desired lumen and media/adventitia borders. An algorithm based on activecontour models in 2-D and its extension in 3-D is described in the literature [112]. The initial contour is placed around the IVUS catheter, and it can be represented by r=r(θ), The contour evolves under the influence of three forces: the internal force, the image force, and the balloon force. Thus
Medical-image processing and analysis for CAD systems
73
Ftotal (r)=Fint (r)+Fimage+Fbal(r) (2– 6) The “balloon” force is added in the energy of the active-contour model and causes the contour to inflate until the desired borders are detected. The application of the 2-D algorithm results in a set of contours, which are then combined to form a 3-D surface and used as the initial guess for the 3-D algorithm, in which appropriate modifications to the forces and the representation of the contour are introduced. A three-dimensional segmentation technique has been developed [101] for the detection of luminal and adventitial borders in IVUS sequences. The method is based on the deformation of a template by the features present in the 3-D image. This algorithm is a 3-D extension of the digital dynamic contour (DDC) model reported by Lobregt and Viergever [115]. The model comprises vertices (which are associated with net force, acceleration, and velocity) connected with edges. While the vertices of the model move, the initial contour deforms under the influence of internal and external forces and a third dumping force that helps to bring the model to rest. The contour obtains its final shape when the velocity and the acceleration of the vertices become zero. Expanding the DDC algorithm in three dimensions, a cylindrical shape is adopted as the initial surface model and it is allowed to deform under the influence of the same three forces. The model is composed of vertices, determined in individual contours, and connections between them are then defined. The internal force applied to this model depends on transverse and longitudinal curvature vectors. Its compo- nents are given by: (2.7) and (2.8) is a unit radial vector at vertex Vi,j. The magnitudes of transverse and where longitudinal internal forces are properly defined. The external force is the gradient of a 3D potential field that results from the preprocessing of IVUS images, and it can be decomposed into two tangential and one radial component. The damping force is a decelerating force acting at vertex Vi,j and is proportional to and directed opposite to vertex velocity vi,j. 2.3.3 LIMITATIONS IN QUANTITATIVE IVUS IMAGE ANALYSIS Many restrictions in automated segmentation of IVUS images derive from the quality of the image, such as the lack of homogeneity of regions of interest and the shadowed regions that are produced by the presence of calcium. The complicated structure of human vessels and the different components in each part result in an image with highintensity variations, even in regions corresponding to the same tissue. In addi- tion,
Medical image analysis method
74
calcified, hard-plaque regions are typically identified by high-amplitude echo signals with complete distal shadowing. Consequently, it is not possible to identify the morphology of the outer layers of the arterial segment, and in the absence of contextual information from image frames adjacent in space and time, single-frame IVUS images are difficult to analyze, even for the most experienced observers. It must be reported that systolic-diastolic image artifacts frequently limit the clinical applicability of automated analysis systems. A method of limiting cyclic artifacts in IVUS images is based on electrocardiogram-gated (ECG-gated) image acquisition, which is extensively used to overcome the problem of vessel distensibility and cardiac movement. The principle of ECG-gated image acquisition is described by von Birgelen et al. [109]. A workstation is used for the reception of a video input from the IVUS machine and the ECG signal from the patient. Upper and lower limits for acceptable RR intervals, i.e., the time duration between two consecutive QRS complexes, are defined (mean value ±50 msec) before image acquisition begins. Images are acquired 40 msec after the peak of the R wave, digitized, and stored in the computer. If an RR interval is too long or too short, images are rejected, and the transducer does not move until the image can be acquired during a heart cycle with the appropriate RR interval length. After an image is acquired, the IVUS transducer is withdrawn in axial 0.2-mm increments through the stationary imaging sheath to acquire the next image at that site. In general, ECGgated image acquisition, when combined with an automated boundary detection method results in much smoother vessel boundaries. In many cases, images of IVUS sequence are excluded from further analysis because of the problems they exhibit. Common problems in IVUS sequences are poor image quality, side branch attachments in the vessel under examination, exten- sive nonuniform rotational distortion, extensive calcification of the vessel wall, and excessive shadows caused by stent struts. The accuracy of the proposed segmentation algorithms would ideally be deter- mined by the comparison of borders extracted automatically with the real borders of the regions of interest. However, it is difficult to assess the accuracy and the reliability of the suggested methods because the precise size and shape of the arterial segment is unknown in vivo. For that reason, the manual tracing is used as the “gold standard,” and the information that is often used is the location of these borders as given by experienced observers, who generally have different opinions. 2.3.4 PLAQUE CHARACTERIZATION IN IVUS IMAGES Plaque composition was shown to correlate with clinical variables in atherosclerotic coronary artery disease [116, 117]. The composition of the plaque can be identified in IVUS images, as demonstrated in several studies in comparison with histology [118, 119]. The classification of plaque in regions of soft (cellular), hard (fibrocalcific), and calcified plaque is based on the characteristic appearance of each one in IVUS images. The components of soft plaque (highly cellular areas of intimal hyperplasia, cholesterol, thrombus, and loose connective tissue types) in IVUS images are regions of low contrast and homogeneous texture. On the other hand, regions of hard plaque, which may also contain calcium, are characterized by bright echoes (similar to adventitia), heterogeneous texture, and they are often trailed by shadowed areas.
Medical-image processing and analysis for CAD systems
75
An automated method for assessing plaque composition in IVUS images has been proposed by Zhang et al. [105]. The method proposed by Sonka et al. [100] was used to detect the borders of the lumen and media/adventitia in the entire IVUS sequence. To assess plaque composition, narrow wedges, called elementary regions, were defined in plaque regions, and a classification label was assigned to them, describing a soft or hard plaque. To classify elementary regions, several texturefeature measurements were computed. Gray-level-based texture descriptors—such as histogram contrast, skewness, kurtosis, dispersion, variance, and the radial profile property—are calculated for each elementary region. Co-occurrence matrices were used, and such features as energy, entropy, maximum probability, contrast, and inverse difference moment were computed. Two run-length features, such as short primitives emphasis and long primitives emphasis as well as Brownian fractal dimen- sion were also calculated. After having calculated these features, correlated ones were removed, and among all features, radial profile, long run emphasis, and the fractal dimension were identified as providing the best features for classifying soft and hard plaques in IVUS images. These features were used for the training of a classifier with piecewise linear discrimination functions. Afterward, each elementary region was classified as containing soft or hard plaque. For the hard-plaque regions, a further classification of hard plaque and shadow subregions was performed. When the classification had been applied on the entire IVUS sequence, the plaque type of each pixel was determined as the majority type among the pixels of the same spatial location in a total of seven consecutive frames. In the study of Vince et al. [120], the efficacy of texture-analysis methods in identifying plaque components was assessed in vitro. IVUS images were captured, and regions of interest were identified by microscopic examination of the histological sections. Three plaque classes were considered: calcified, fibrous (dense collagenous tissue), and necrotic core (lipidic pool with evident necrosis). Texture-analysis procedures were applied in the region of interest, and the following statistical techniques were evaluated: first-order statistics, Haralick’s method, Laws’s texture energies, neighborhood gray-tone difference matrices (NGTDM), and the texturespectrum method. The selection of these methods was based on their ability to differentiate soft tissue and textural patterns in two-dimensional gray-scale images. After the implementation of these approaches, classification of texture features was performed. The clustering ability of each of the examined texture-analysis techniques was assessed. Haralick’s method demonstrated tight clustering of calcified, fibrous, and necrotic regions with no overlap.
FIGURE 2.6 (a) Estimation of the three-dimensional trajectory path from
Medical image analysis method
76
the biplane angiographical data; (b) mapping of IVUS frames along the pullback path in three-dimensional space. 2.3.5 THREE-DIMENSIONAL RECONSTRUCTION Three-dimensional reconstruction of the vessel based on IVUS yields more infor- mation than two-dimensional IVUS imaging alone in the visualization and assess- ment of coronary artery disease and the choice of intervention. To produce threedimensional renderings of vessel geometry, approaches that rely exclusively on IVUS data perform a straight stacking of adjacent frames [107, 121, 122]. However, these approaches do not account for the real spatial geometry of the coronary artery, completely neglecting the influence of the vessel curvature, which induces an error in quantitative measurements of the vessel [123]. In general, the determination of the position in 3-D space of an object, whose shape and size are unknown, requires more than one view. For that reason, techniques have recently been developed [124–127] to reconstruct the true spatial geometry by combining IVUS and biplane angiography. These two modalities are well comple- mentary and suitable for fusion, since biplane angiography provides longitudinal projections of the vessel lumen, while IVUS provides transversal cross-sections of the lumen and the wall. The main concept of these approaches is illustrated in Figure 2.6. From the angiographical data, a reconstruction of the catheter path during its pullback in 3D space (i.e., the pullback path) is obtained, and IVUS images are placed appropri- ately along this path. The steps of this procedure are depicted in Figure 2.7. Several sources of errors can affect the accuracy of the 3-D vessel model. Apart from the problems that each modality is associated with, problems that are closely related to the fusion between both image modalities—such as the determination of the pullback path, the estimation of the catheter twist, and the absolute orientation of IVUS frame sequence—need to be resolved. The accurate estimation of the pullback path in 3-D space is important for the correct positioning and orientation of the IVUS images in 3-D space. The pullback path in the biplane angiograms can be approximated either by the vessel centerline or by the location of the ultrasound transducer in the vessel. In the first case, problems of overshadowed catheters are overcome, but an angular error occurs whenever the catheter centerline is off the lumen centerline [125]. However, in the second case, a sequence of biplane angiograms needs to be recorded over the entire IVUS catheter pullback length. Longi- tudinal catheter twist is an interframe distortion that affects the rotational orientation of the IVUS frames along the pullback path. Consequently, the reconstructed plaque may be located incorrectly at the inner side of the vessel bend while it is actually located at the outer bend. Finally, it is essential to determine the correct absolute axial orientation of the resulting IVUS frame set. The problem is comparable to fitting a sock on a leg [126]. While the leg is stable (catheter path), the sock (axial orientation of the IVUS frame set) can freely be rotated around the leg, and it fits optimally only in one axial orientation.
Medical-image processing and analysis for CAD systems
77
FIGURE 2.7 Basic steps of fusion procedure of IVUS and angiographical data. One of the earliest studies for three-dimensional reconstruction of vessel mor- phology from X-ray projections and IVUS data was proposed by Pellot et al. [124]. A welldefined acquisition protocol was used, and couples of X-ray control projections/IVUS images were acquired for each position of the transducer as it was manually withdrawn from small distances in the vessel. For the extraction of IVUS transversal contours, a fuzzy classification technique was performed followed by mathematical morphology operators. A dynamic tracking algorithm was applied on angiographical images to extract the vessel longitudinal contours. A geometric model was adopted for the representation of the acquisitions into a unique reference frame. The registered IVUS contours were linearly interpolated to extract a regularly sam- pled 3-D surface with the same resolution as angiography. This 3-D surface consti- tutes an approximate geometric reconstruction of the vessel using IVUS and X-ray images. The 3-D registered data are then combined with the X-ray densitometric information to refine the preliminary contours at each slice. For that purpose, the researchers used a probabilistic reconstruction process using Markovian modeling associated with a simulated annealing-based optimization algorithm. Prause et al. [125] focused on the estimation of IVUS catheter twist during pullback. They report an algorithm for the calculation of tortuosity-induced catheter twist that is based on sequential triangulation of the three-dimensional pullback path. In brief, the method is described as follows. Each frame is described by its location at the entire IVUS sequence. The con- secutive IVUS frames i and i+1 are located halfway between three sequential points Pi, Pi+1, Pi+2 of the pullback path, at points Si=(Pi+Pi+1)/2 and Si+1=(Pi+1+Pi+2)/2. The images are
Medical image analysis method
78
perpendicular to the tangent vectors and Pi+l. To determine the orientation of IVUS image i+1, the already known orientation of image i is used. Thus, the orientation of image i+1 is determined by rotating image i around the normal vector at the center of the circumscribed circle of the triangle (Pi, Pi+1, Pi+2). Then, the center of image i+1 is shifted to point Si+1. If the points Pi, Pi+1, Pi+2 are collinear, the calculation of image i+1 reduces to a simple translation along the pullback path. An important advantage of this approach is that if there are single images in the pullback sequence, rotationally adjusted by anatomic landmarks, the orientation of the remaining frames is fixed or can be interpolated. Another method for three-dimensional reconstruction of the vessel based on the fusion of IVUS and angiographical images has been proposed by Wahle et al. [126]. Angiographical images were processed to estimate the geometry, extract the catheter path, and reconstruct the three-dimensional trajectory. The geometry is initially extracted from the parameters as read from the device and refined afterward from a set of given reference points. For the extraction of the catheter path in biplane angiograms, the most distal location of the transducer and the location at or proximal to the end of pullback are interactively marked. The path of the catheter as well as the two edges of the vessel lumen outline can be extracted with the use of dynamic programming. The threedimensional reconstruction of trajectory is obtained using a well-established and validated three-dimensional reconstruction approach [128]. IVUS image-processing includes extraction of regions of interest using the previ- ously described algorithm [100]. The fusion process starts with the localization of IVUS frames on the 3-D path, assuming constant pullback speed and a fixed number of images per millimeter of pullback. The local behavior of the pullback path in 3-D can be described using the Serret-Frenet formulas, and based on this theory, an analytical model of the catheter is obtained. The relative twist is estimated using the method proposed by Prause et al. [125] and the amount of twisting, i.e., the presumed error if the torsion is not considered during the reconstruction, is estimated using a reference plane. Quanti- fication of the relative twist is estimated, i.e., the presumed error if the torsion is not considered during the reconstruction. For the estimation of the absolute orien- tation in 3-D space, the bending behavior of the catheter is used as a reference. The IVUS catheter tends to take a position of minimum bending energy inside a tortuous vessel, resulting in an out-of-center position of the catheter relative to the inner lumen. Three-dimensional out-of-center vectors are generated from the contour center to the catheter position. A single correction angle φcorr is determined and applied to all IVUS frames relative to the initial contour. After the 3-D mapping of the IVUS data, a surface model of the vessel can be displayed. The validation of the method included computer simulation, in which the method showed excellent results, in phantom and in vitro studies, that uncovered influence from several sources of distortion caused mainly by mechanical components of the setup. A method for 3-D reconstruction of complex blood-vessel geometries from IVUS images has been proposed by Subramanian et al. [127]. This technique uses biplane fluoroscopy to image the catheter tip, at a few important points along the length of the vessel, to estimate the curvature of the vessel. A reference direction is determined and maintained throughout the acquisition. Afterward, the 3-D coordinates of the catheter tip are determined from the X-ray images, the path is estimated, and the IVUS images are located along the path. The catheter tip is located manually within each X-ray image. A
Medical-image processing and analysis for CAD systems
79
coordinate system (x, y, z) is used, where the vessel’s longitu- dinal axis is approximately along the z-axis and the two X-ray projection images determine the (x, z) and (y, z) coordinates, respectively. After recovering the 3-D points that represent the locations of the catheter tip, they are converted to physical units (mm) and are normalized so that the first point becomes the origin. The catheter path is estimated by fitting an interpolating cube spline (Kochanek-B artels spline) through the points. The location of each IVUS frame along the catheter path is determined by uniformly sampling the spline in a number of points that are equal to the number of IVUS images to be used. Each IVUS image is positioned so that the catheter tip is on the spline and the image is orthogonal to the tangent vector at this point. The orientation of an IVUS image on the catheter path is estimated using two unit vectors and orthogonal to the catheter path. The vectors and
are calculated by ( 2. 9)
and ( 2. 10 ) where i=1,…, n–1 where × indicates vector cross-product, and where the vectors are the tangent vectors at each point. The initial vector arbitrary, so that it does not coincide with
can be
Each of the images is rotated by an amount
depending on the path of the catheter tip. Finally, the 3-D volume is determined by determined by associating the echo intensity at all lattice points of the volume. In vitro validation of the method gave very promising results. 2.4 CONCLUSIONS Medical imaging modalities provide useful information about the internal structure and function of the human body. Specific modalities are utilized to depict different tissues. Identification and characterization of pathological findings require a lot of effort and skill on the part of the radiologist. The complexity of the examined images in many cases requires a second opinion or further analysis to avoid misinterpreta- tions. CAD systems can provide additional data that can increase the efficiency of interpretations. Extensive tests and additional research projects aimed at improving CAD’s performance are under evaluation in an effort to increase doctors’ confidence level in CAD systems. CAD systems in mammography, and especially microcalcification detection and diagnosis, could provide remarkable support as a “second opinion” tool, improving the
Medical image analysis method
80
effectiveness of the decision-making procedure. However, further study is needed to eliminate falsely detected objects. An improvement of segmentation and classifi- cation algorithms in CAD systems could increase the performance of the schemes in the classification and characterization of pathological findings as malignant or benign. Such progress would increase the benefits of these systems by eliminating or minimizing unnecessary biopsies. Further testing is needed using the large data- bases available to researchers as well as the original mammograms that are obtained from the clinical routine and from screening-population projects. The contribution of CAD systems is also important in the interpretation of medical data obtained by other imaging modalities. In the interpretation of intravascular ultrasound images, CAD systems are beneficial because they can efficiently identify possible abnormalities that might not be recognized by the expert observer. Real-time depiction of the arterial wall, determination of plaque composition, and quantitative measurements obtained during clinical routine are considered to be critical components of a CAD system. Sophisticated methods for automatically extracting useful information from IVUS images are still under development, and 3-D reconstruction of the vessel has become available. The methods described in this chapter provide a more comprehensive understanding and a more circumstantial characterization of coronary artery disease, which could result in better and lessinvasive patient treatment. Today, medical-image-processing techniques are used in several CAD systems. The processing of images from different modalities must be characterized by high performance if they are to be utilized in clinical environments. The use of CAD systems in medical applications addresses a well-recognized clinical weakness of the diagnostic process and also complements the radiologists’ perceptive abilities. However, the increased interest and striking expansion of research in the field of CAD systems provides fertile conditions for further development. More sophisticated and productive approaches might lead to high-efficiency CAD systems that will be essential components in modern diagnostic practices. Those systems will be based on resourceful imageprocessing techniques followed by intelligent analysis meth- ods. Further evaluation of the diagnostic performance of the proposed systems is an important task that should be conducted under clinical conditions. REFERENCES 1. Baker, J.A. et al., Breast cancer: prediction with artificial neural network based on BI-RADS standardized lexicon, Radiology, 196, 817, 1995. 2. Wu, Y. et al., Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer, Radiology, 187, 81, 1993. 3. Sahiner, B. et al., Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size, Med. Phys., 27, 1509, 2000. 4. Roehrig, J. and Castellino, R.A., The promise of computer-aided detection in digital mammography, Eur. J. Radiol., 31, 35, 1999. 5. Smith, R.A., Epidemiology of Breast Cancer in a Categorical Course in Physics: Technical Aspects of Breast Imaging, 2nd ed., RSNA Publication, Oak Brook, IL, 1993, p. 21. 6. Nishikawa, R.M. et al., Performance of a prototype clinical intelligent mammography workstation, ‘96, in Digital Mammography, Doi, K., Giger, M.L., Nishikawa, R.M., and Schmith, R.A., Eds., Elsevier, Amsterdam, 1996, p. 93.
Medical-image processing and analysis for CAD systems
81
7. Roehrig, J. et al., Clinical results with R2 ImageChecker system, in Digital Mam- mography, Karssemeijer, N., Thijssen, M., Hendriks, J., and van Erning, L., Eds., Kluwer Academic Publishers, Dordrecht, Netherlands, 1998, p. 395. 8. Becker, H. et al., Digital computer determination of a medical diagnostic index directly from chest X-ray images, IEEE Trans. Biomed. Eng., BME-11, 67, 1964. 9. Meyers, P. et al., Automated computer analysis of radiographic images, Radiology, 83, 1029, 1964. 10. Winsberg, F. et al., Detection of radiographic abnormalities in mammograms by means of optical scanning and computer analysis, Radiology, 89, 211, 1967. 11. Ackerman, L.V. and Gose, E.E., Breast lesion classification by comport and xerog- raphy, Cancer, 30, 1025, 1972. 12. Ackerman, L.V. et al., Classification of benign and malignant breast tumors on the basis of 36 radiographic properties, Cancer, 31, 342, 1973. 13. Wee, W.G. et al., Evaluation of mammographic calcifications using a computer program, Radiology, 116, 717, 1975. 14. Fox, S.H. et al., A computer analysis of mammographic microcalcifications: global approach, in Proc. IEEE 5th International Conference on Pattern Recognition, IEEE, New York, 1980, p. 624. 15. Spiesberger, W., Mammogram inspection by computer, IEEE Trans. Biomed. Eng., 26, 213, 1979. 16. Chan, H.P. et al., Image feature analysis and computer-aided diagnosis in digital radiography: automated detection of microcalcifications in mammography, Med. Phys., 14, 538, 1987. 17. Fam, B.W. et al., Algorithm for the detection of fine clustered calcifications on film mammograms, Radiology, 169, 333, 1988. 18. Chan, H.P. et al., Computer-aided detection of microcalcifications in mammograms: methodology and preliminary clinical study, Invest. Radiol., 23, 664, 1988. 19. Chan, H.P. et al., Improvement in radiologists’ detection of clustered microcalcifica- tions on mammograms: the potential of computer-aided diagnosis, Invest. Radiol., 25, 1102, 1990. 20. Pisano, E.D. et al., Image processing algorithms for digital mammography: a pictorial essay, Radiographic, 20, 1479, 2000. 21. Bick, U. et al., A new single image method for computer-aided detection of small mammographic masses, in Proc. CAR: Computer Assisted Radiology, Lemke, H.U., Inamura, K., Jaffe, C.C., and Vannier, M.W., Eds., Springer, Berlin, 1995, p. 357. 22. Pizer, S.M., Zimmerman, J.B., and Staab, E.V., Adaptive gray-level assignment in CT scan display, J. Comput. Assist. Tomogr., 8, 300, 1984. 23. Pizer, S.M. et al., Adaptive histogram equalization and its variations, Comp. Vision, Graphics, Image Processing, 35, 355, 1987. 24. Vuylsteke, P. and Schoeters, E., Multiscale image contrast amplification (MUSICA), Proc. SPIE, 2167, 551, 1994. 25. Rangayyan, R.M. et al., Improvement of sensitivity of breast cancer diagnosis with adaptive neighborhood contrast enhancement of mammograms, IEEE Trans. Inf. Technol Biomed., 1, 161, 1997. 26. Morrow, W.M. et al., Region-based contrast enhancement of mammograms, IEEE Trans. Medical Imaging, 11, 392, 1992. 27. Mallat, S., A theory for multiresolution signal decomposition: the wavelet represen- tation, IEEE Trans. Pattern Analysis Machine Intelligence, 11,7, 1989. 28. Laine, A.F. et al., Mammographic feature enhancement by multiscale analysis, IEEE Trans. Medical Imaging, 13, 725, 1994. 29. Sakellaropoulos, P., Costaridou, L., and Panayiotakis, G., A wavelet-based spatially adaptive method for mammographic contrast enhancement, Phys. Med. Biol., 48,787, 2003. 30. Cheng, H.D., Yui, M.L., and Freimanis, R.I., A novel approach to microcalcification detection using fuzzy-logic technique, IEEE Trans. Medical Imaging, 17, 3, 1998.
Medical image analysis method
82
31. Li, H. et al., Fractal modeling of mammogram and enhancement of microcalcifications, lEEE Nucl Sci. Symp. Medical Imaging Conf., 3, 1850, 1996. 32. Li, H., Liu, K.J.R., and Lo, S.C.B., Fractal modelling and segmentation for the enhancement of microcalcifications in digital mammograms, IEEE Trans. Medical Imaging, 16, 785, 1997. 33. Li, H., Liu, K.J.R., and Lo, S.B., Fractal modeling of mammograms and enhancement of microcalcifications, in Proc. IEEE Medical Imaging Conference, Anaheim, 1996, p. 1850. 34. Shen, L., Rangayyan, R.M., and Desautels, J.E.L., Detection and classification of mammographic calcifications, Int. J. Pattern Recognition Artif. Intelligence, 1, 1403, 1993. 35. Zheng, B. et al., Computer-aided detection of clustered microcalcifications in digi- tized mammograms, Acad. Radiol, 2, 655, 1995. 36. Gurcan, M.N., Yardimci, Y., and Cetin, A.E., Digital Mammography, Kluwer Aca- demic Publishers, Nijmegen, Netherlands, 1998, p. 157. 37. Bankman, I.N. et al., Segmentation algorithms for detecting microcalcifications in mammograms, IEEE Trans. Inf. Technol. Biomed., 1, 141, 1997. 38. Betal, D., Roberts, N., and Whitehiuse, G.H., Segmentation and numerical analysis of microcalcifications on mammograms using mathematical morphology, Br. J. Radiol., 70, 903, 1997. 39. Karssemeijer, N., Adaptive noise equalisation and recognition of microcalcification clusters in mammograms, Int. J. Pattern Recognition Artif. Intelligence, 7,1357,1993. 40. Chen, C.H. and Lee, G.G., On digital mammogram segmentation and microcalcifi- cation detection using multiresolution wavelet analysis, Graph. Mod. Image Proc., 59, 349, 1997. 41. Veldkamp, W.J.H. and Karssemeijer, N., in Digital Mammography Nijmegen 98, Karssemeijer, N., Thijssen, M., Hendriks, J., and van Erning, L., Eds., Kluwer Aca- demic Publications, Amsterdam, 1998, p. 160. 42. Strickland, R.N. and Hahn, H.I., Wavelet transform methods for object detection and recovery, IEEE Trans. Image Process., 6, 724, 1997. 43. Netsch, T. and Peitgen, H.O., Scale-space signatures for the detection of clustered microcalcifications in digital mammograms, IEEE Trans. Medical Imaging, 18, 774, 1999. 44. Lefebvre, F. et al., A fractal approach to the segmentation of microcalcifications in digital mammograms, Med. Phys., 22, 381, 1995. 45. Cheng, H.D. and Xu, H., A novel fuzzy-logic approach to mammogram contrast enhancement, Inf. Sci., 148, 167, 2002. 46. Kim, J.K. et al., Detection of clustered microcalcifications on mammograms using surrounding region dependence method and artificial neural network, J. VLSI Signal Process., 18, 251, 1998. 47. Te Brake, G.M. and Karssemeijer, N., Single and multiscale detection of masses in digital mammograms, IEEE Trans. Medical Imaging, 18, 628, 1999. 48. Yu, S. and Guan, L., A CAD system for the automatic detection of clustered microcalcifications in digitized mammogram films, IEEE Trans. Medical Imaging, 19, 115, 2000. 49. Kupinski, M.A. and Giger, M.L., Feature selection with limited datasets, Med. Phys., 26, 2176, 1999. 50. Karssemeijer, N., Adaptive noise equalisation and recognition of microcalcification clusters in mammograms, Int. J. Pattern Recognition Artif. Intelligence, 7,1357,1993. 51. Chan, H.P. et al., Improvement in radiologists’ detection of clustered microcalcifica- tions on mammograms: the potential of computer-aided diagnosis, Invest. Radiol., 25, 1102, 1990. 52. Strickland, R.N. and Hahn, H.I., Wavelet transforms for detecting microcalcifications in mammograms, IEEE Trans. Medical Imaging, 15, 218, 1996. 53. Netsch, T., A scale-space approach for the detection of clustered microcalcifications in digital mammograms, in Digital Mammography ‘96, Proc. 3rd Int. Workshop Digital Mammography, Univ. of Chicago, Chicago, 1996, p. 301. 54. Haykin, S., Neural Networks: A Comprehensive Foundation, 2nd ed., Macmillan College Publishing, New York, 1998.
Medical-image processing and analysis for CAD systems
83
55. Papadopoulos, A., Fotiadis, D.I., and Likas, A., An automatic microcalcifications detection system based on a hybrid neural network classifier, Artif. Int. Med., 25, 149, 2002. 56. Sahiner, B. et al., Design of a high-sensitivity classifier based on genetic algorithm: application to computer-aided diagnosis, Phys. Med. Biol, 43, 2853, 1998. 57. Yu, S. and Guan, L., A CAD system for the automatic detection of clustered microcalcifications in digitized mammogram films, IEEE Trans. Medical Imaging, 19, 115, 2000. 58. Sahiner, B. et al., Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue, Med. Phys., 23, 1671, 1996. 59. Sahiner, B. et al., Effects of sample size on feature selection in computer-aided diagnosis, in Proc. SPIE Medical Imaging, 3661, 499, 1999. 60. Gavrielides, M.A., Lo, J.Y., and Floyd, C.E., Parameter optimization of a computeraided diagnosis scheme for the segmentation of microcalcification clusters in mam- mograms, Med. Phys., 29, 475, 2002. 61. Chan, H.P. et al., Improvement of radiologists’ characterization of mammographic masses by using computer-aided diagnosis: an ROC study, Radiology, 212, 817,1999. 62. Woods, K.S. et al., Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography, Int. J. Pattern Recognition Artif. Intelligence, 7, 1417, 1993. 63. Dhawan, A.P., Analysis of mammographic microcalcifications using gray-level image structure features, IEEE Trans. Medical Imaging, 15, 246, 1996. 64. Leichter, I. et al., Optimizing parameters for computer-aided diagnosis of microcal- cifications at mammography, Acad. Radiol., 7, 406, 2000. 65. Chan, H.P. et al., Computerized analysis of mammographic microcalcifications in morphological and texture feature spaces, Med. Phys., 25, 2007, 1998. 66. Qi, H. and Snyder, W.E., Lesion detection and characterization in digital mammography by Bezier histograms, J. Digital Imaging, 12, 81, 1998. 67. Chan, H.P. et al., Image feature analysis and computer-aided diagnosis in digital radiography: 1, automated detection of microcalcifications in mammography, Med. Phys., 14, 538, 1987. 68. Davies, D.H. and Dance, D.R., Automated computer detection of clustered calcifica- tions in digital mammograms, Phys. Med. Biol, 35, 1111, 1990. 69. Bhagale, T., Desai, U.B., and Sharma, U., An unsupervised scheme for detection of microcalcifications on mammograms, IEEE Int. Conf. Image Processing, 2000, p. 184. 70. Fukunaga, K., Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, New York, 1990. 71. Tsujii, O., Freedman, M.T., and Mun, S.M., Classification of microcalcifications in digital mammograms using trend-oriented radial basis function neural network, Pat- tern Recognition, 32, 891, 1999. 72. Raghu, P.P. and Yegnanarayana, B., Multispectral image classification using Gabor filters and stochastic relaxation neural network, Neural Networks, 10, 561, 1997. 73. Markey, M.K. et al., Self-organizing map for cluster analysis of a breast cancer database, Artif. Intel. Med., 27, 113, 2003. 74. Egmont-Petersen, M., de Ridder, D., and Handels, H., Image processing with neural networks: a review, Pattern Recognition, 35, 2279, 2002. 75. Bishop, C.M., Neural Networks for Pattern Recognition, Oxford University Press, Oxford, U.K., 1996. 76. Verma, B. and Zakos, J., A computer-aided diagnosis system for digital mammograms based on fuzzy-neural and feature-extraction techniques, IEEE Trans. Inf. Technol. Biomed., 5, 46, 2001. 77. Burges, C.J.C., A tutorial on support vector machines for pattern recognition, Knowl- edge Discovery Data Mining, 2, 1, 1998. 78. Cristianini, N. and Shawe-Taylor, J., An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, London, 2000.
Medical image analysis method
84
79. Bazzani, A. et al., Automated detection of clustered microcalcifications in digital mammograms using an SVM classifier, in Proc. 8th Eur. Symp. Artif. Neural Net- works, Bruges, Belgium, 2000, p. 195. 80. El-Naqa, I. et al., Support vector machine learning for the detection of microcalcifi- cations in mammograms, IEEE Trans. Medical Imaging, 21, 1552, 2002. 81. Metz, C.E., ROC methodology in radiologic imaging, Invest. Radiol, 21, 720, 1986. 82. Chakraborty, D., Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data, Med. Phys., 16, 561, 1989. 83. Swensson, R.G., Unified measurement of observer performance in detecting and localizing target objects on images, Med. Phys., 23, 1709, 1996. 84. Chakraborty, D.P. and Winter, L., Free-response methodology: alternative analysis and a new observer-performance experiment, Med. Phys., 174, 873, 1990. 85. Chakraborty, D., Statistical power in observer-performance studies: comparison of the receiver operating characteristic and free-response methods in tasks involving localization, Acad. Radiol., 9, 147, 2002. 86. Metz, C.E., Evaluation of CAD methods, in Computer-Aided Diagnosis in Medical Imaging, Doi, K., MacMahon, H., Giger, M.L., and Hoffmann, K.R., Eds., Excerpta Medica International Congress Series, Vol. 1182, Elsevier Science, Amsterdam, 1999, p. 543. 87. R2 Technology Co., ImageChecker; available on-line at http://www.r2tech.com/. (March 7, 2005). 88. CADx Medical Systems, Second Look Digital/AD; available on-line at http://www.%20cadxmed.com/. (March 7, 2005). 89. Intelligent Systems Software, MammoReader; available on-line at http://www.icadmed.com/. (March 7, 2005). 90. Scanis, Mammex MammoCAD; available on-line at http://www.scanis.com/. (March 7, 2005). 91. Williams, L., Prescott, R., and Hartswood, M., Computer-aided cancer detection and the UK national breast screening programme, in Proc. 4th Int. Workshop on Digital Mammography, Karssemejer, N., Thijssen, M., Hendriks, J., and van Erning, L., Eds., Kluwer Academic Publications, Amsterdam, 1998. 92. MedDetect; available on-line at http://www.meddetectids.com/. (March 7, 2005). 93. National Health Service Breast Screening Programme (NHSBSP), Computer-Aided Detection In Mammography, Working Party of the Radiologists, Quality Assurance Coordinating Group, NHSBSP publication no. 48, NHSBSP, 2001. 94. Malich, A. et al., Reproducibility: an important factor determining the quality of computeraided detection (CAD) systems, Eur. J. Radiol., 36, 170, 2000. 95. Burhenne, L.J. et al., Potential contribution of computer-aided detection to the sen- sitivity of screening mammography, Radiology, 215, 554, 2000. 96. Malich, A. et al., Tumor detection rate of a new commercially available computeraided detection system, Eur. Radiol., 11, 2454, 2001. 97. Yock, P.G. and Fitzgerald, P.J., Intravascular ultrasound imaging, in Cardiac Catheterization, Angiography and Intervention, Bairn, D.S. and Grossman, W., Eds., Wil- liams & Wilkins, Baltimore, 1996, chap. 22. 98. Cachard, C. et al., Ultrasound contrast agent in intravascular echography: an in vitro study, Ultrasound Med. Biol., 23, 705, 1997. 99. Meier, D.S. et al., Automated morphometry of coronary arteries with digital image analysis of intravascular ultrasound, Am. Heart J., 133, 681, 1997. 100. Sonka, M. et al., Segmentation of intravascular ultrasound images: a knowledgebased approach, IEEE Trans. Medical Imaging, 14, 719, 1995. 101. Shekhar, R. et al., Three-dimensional segmentation of luminal and adventitial borders in serial intravascular ultrasound images, Comput. Med. Imag. Grap., 23, 299, 1999. 102. Bouma, C.J. et al., Automated lumen definition from 30 MHz intravascular ultrasound images, Med. Image Anal., 1, 263, 1997.
Medical-image processing and analysis for CAD systems
85
103. Plissiti, M.E., Fotiadis, D.I., and Michalis, L.K., 3-D reconstruction of stenotic cor- onary arterial segments using intravascular ultrasound and angiographic images, in XVIIIth ISB Congr. Int. Soc. Biomechanics, ETH, Zurich, 2001, p. 224. 104. Takagi, A. et al., Automated contour detection for high-frequency intravascular ultra- sound imaging: a technique with blood noise reduction for edge enhancement, Ultra- sound Med. Biol, 26, 1033, 2000. 105. Zhang, X., McKay, C.R., and Sonka, M., Tissue characterization in intravascular ultrasound images, IEEE Trans. Medical Imaging, 17, 889, 1998. 106. Li, W. et al., Semi-automatic contour detection for volumetric quantification of intracoronary ultrasound, in Proc. Comput. Cardiol. 1994, IEEE Computer Society Press, Washington, DC, 1994, p. 277. 107. von Birgelen, C. et al., Computerized assessment of coronary lumen and atheroscle- rotic plaque dimensions in three-dimensional intravascular ultrasound correlated with histomorphometry, Am. J. Cardiol, 78, 1202, 1996. 108. Dijkstra, J. et al., Automatic border detection in intravascular ultrasound images for quantitative measurements of the vessel, lumen and stent parameters, Int. Congr. Ser., 1230, 916, 2001. 109. von Birgelen, C. et al., Electrocardiogram-gated intravascular ultrasound image acqui- sition after coronary stent deployment facilitates on-line three-dimensional recon- struction and automated lumen quantification, JACC, 30, 436, 1997. 110. Hagenaars, T. et al., Reproducibility of volumetric quantification in intravascular ultrasound images, Ultrasound Med. Biol, 26, 367, 2000. 111. Mojsilovic, A. et al., Automatic segmentation of intravascular ultrasound images: a texturebased approach, Ann. Biomed. Eng., 25, 1059, 1997. 112. Kovalski, G. et al., Three-dimensional automatic quantitative analysis of intravascular ultrasound images, Ultrasound Med. Biol, 26, 527, 2000. 113. Kass, M., Witkin, A., and Terzopoulos, D., Snakes: active contour models, Int. J. Comput. Vision, 1, 321, 1987. 114. Zhu, Y. and Yan, H., Computerized tumor boundary detection using a Hopfield neural network, IEEE Trans. Medical Imaging, 16, 55, 1997. 115. Lobregt, S. and Viergever, M.A., Discrete dynamic contour model, IEEE Trans. Medical Imaging, 14, 12, 1995. 116. Mintz, G.S. et al., Determinants and correlates of target lesion calcium in coronary artery disease: a clinical, angiographic and intravascular ultrasound study, JACC, 29, 268, 1997. 117. Rasheed, Q. et al., Correlation of intracoronary ultrasound plaque characteristics in atherosclerotic coronary artery disease patients with clinical variables, Am. J. Cardiol., 73, 753, 1994. 118. Rasheed, Q. et al., Intracoronary ultrasound-defined plaque composition: computeraided plaque characterization and correlation with histologic samples obtained during directional coronary atherectomy, Am. Heart J., 129, 631, 1995. 119. De Feyter, P.J., Mario, C.D., and Serruys, P.W., Quantitative Coronary Imaging, Barjesteh Meeuwes & Co. and Thoraxcentre, Erasmus University, Rotterdam, Neth- erlands, 1995. 120. Vince, D.G. et al., Comparison of texture analysis methods for the characterization of coronary plaques in intravascular ultrasound images, Comput. Med. Imag. Grap., 24, 221, 2000. 121. Shiran, A. et al., Serial volumetric intravascular ultrasound assessment of arterial remodeling in left main coronary artery disease, Am. J. Cardiol., 83, 1427, 1999. 122. Weissman, N.J. et al., Three-dimensional intravascular ultrasound assessment of plaque after successful atherectomy, Am. Heart J., 130, 413, 1995. 123. Schuurbiers, J.C.H. et al., On the IVUS plaque volume error in coronary arteries when neglecting curvature, Ultrasound Med. Biol, 26, 1403, 2000. 124. Pellot, C. et al., An attempt to 3-D reconstruction vessel morphology from X-ray projections and intravascular ultrasounds modeling and fusion, Comput. Med. Imag. Grap., 20, 141, 1996.
Medical image analysis method
86
125. Prause, G.P.M. et al., Towards a geometrically correct 3-D reconstruction of tortuous coronary arteries based on biplane angiography and intravascular ultrasound, Int. J. Cardiac Imaging, 13, 451, 1997. 126. Wahle, A. et al., Geometrically correct 3-D reconstruction of intravascular ultrasound images by fusion with biplane angiography: methods and validation, IEEE Trans. Medical Imaging, 18, 686, 1999. 127. Subramanian, K.R. et al., Accurate 3-D reconstruction of complex blood vessel geometries from intravascular ultrasound images: in vitro study, J. Med. Eng. Technol., 24, 131, 2000. 128. Wahle, A. et al., Assessment of diffuse coronary artery disease by quantitative analysis of coronary morphology based upon 3-D reconstruction from biplane angiograms, IEEE Trans. Medical Imaging, 14, 230, 1995. 129. Plissiti, M.E. et al., An automated method for lumen and media/adventitia border detection in a sequence of IVUS frames, IEEE Trans. Inf. Technol. Biomed., 8, 131, 2004.
3 Texture and Morphological Analysis of Ultrasound Images of the Carotid Plaque for the Assessment of Stroke Christodoulos I.Christodoulou, Constantinos S.Pattichis, Efthyvoulos Kyriacou, Marios S.Pattichis, Marios Pantziaris, and Andrew Nicolaides 3.1 INTRODUCTION There is evidence that carotid endarterectomy in patients with asymptomatic carotid stenosis will reduce the incidence of stroke [1]. The current practice is to operate on patients based on the degree of internal carotid artery stenosis of 70 to 99% as shown in X-ray angiography [2]. However, a large number of patients may be operated on unnecessarily. Therefore, it is necessary to identify patients at high risk, who will be considered for carotid endarterectomy, and patients at low risk, who will be spared from an unnecessary, expensive, and often dangerous operation. There are indications that the morphology of atherosclerotic carotid plaques, obtained by high-resolution ultrasound imaging, has prognostic implications. Smooth surface, echogenicity, and a homogeneous texture are characteristics of stable plaques, whereas irregular surface, echolucency, and a heterogeneous texture are character- istics of potentially unstable plaques [3–6]. The objective of the work described in this chapter was to develop a computeraided system based on a neural network and statistical pattern recognition techniques that will facilitate the automated characterization of atherosclerotic carotid plaques, recorded from high-resolution ultrasound images (duplex scanning and color flow imaging), using texture and morphological features extracted from the plaque images. The developed system should be able to automatically classify a plaque into (a) symptomatic (because it is associated with ipsilateral hemispheric symptoms) and (b) asymptomatic (because it is not associated with ipsilateral hemispheric events). As shown in this chapter, it is possible to identify a group of patients at risk of stroke based on texture features extracted from high-resolution ultrasound images of carotid plaques. The computer-aided classification of carotid plaques will contribute toward a more standardized and accurate methodology for the assessment of carotid plaques. This will greatly enhance the significance of noninvasive cerebrovascular tests in the identification of patients at risk of stroke. It is anticipated that the system will also contribute toward the advancement of the quality of life and efficiency of health care. An introduction to ultrasound vascular imaging is presented in Subsection 3.1.1, followed by a brief survey of previous work on the characterization of carotid plaque. In
Medical image analysis method
88
Section 3.2, the materials used to train and evaluate the system are described. In Section 3.3, the modules of the multifeature, multiclassifier carotid-plaque classification system are presented. Image acquisition and standardization are covered in Subsection 3.3.1, and the plaque identification and segmentation module is described in Subsection 3.3.2. Subsections 3.3.3 and 3.3.4 outline, respectively, the feature extraction and feature selection. The plaque-classification module with its associated calculations of confidence measures is presented in Subsection 3.3.5, and the classifier combiner is described in Subsection 3.3.6. In the following Sections 3.4 and 3.5 the results are presented and discussed, and the conclusions are given in Section 3.6. Finally, in the appendix at the end of the chapter, the implementation details are given for the algorithms used to extract texture features. 3.1.1 ULTRASOUND VASCULAR IMAGING The use of ultrasound in vascular imaging became very popular because of its ability to visualize body tissue and vessels in a noninvasive and harmless way and to visualize in real time the arterial lumen and wall, something that is not possible with any other imaging technique. B-mode ultrasound imaging can be used to visualize arteries repeatedly from the same subject to monitor the development of atherosclerosis. Monitoring of the arterial characteristics like the vessel lumen diameter, the intima media thickness (IMT) of the near and far wall, and the morphology of atherosclerotic plaque are very important in assessing the severity of atherosclerosis and evaluating its progression [7]. The arterial wall changes that can be easily detected with ultrasound are the end result of all risk factors (exogenous, endogenous, and genetic), known and unknown, and are better predictors of risk than any combination of conventional risk factors. Extracranial atherosclerotic disease, known also as atherosclerotic disease of the carotid bifurcation, has two main clinical manifestations: (a) asymptomatic bruits and (b) cerebrovascular syndromes such as amaurosis fugax, transient ischemic attacks (TIA), or stroke, which are often the result of plaque erosion or rupture, with subsequent thrombosis producing occlusion or embolization [8, 9]. Carotid plaque is defined as a localized thickening involving the intima and media in the bulb, internal carotid, external carotid, or common femoral arteries (Figure 3.1). Recent studies involving angiography, high-resolution ultrasound, thrombolytic therapy, plaque pathology, coagulation studies, and more recently, molecular biology have implicated atherosclerotic plaque rupture as a key mecha- nism responsible for the development of cerebrovascular events [10–12]. Athero- sclerotic plaque rapture is strongly related to the morphology of the plaque [13]. The development and continuing technical improvement of noninvasive, high-reso- lution vascular ultrasound enables the study of the presence of plaques, their rate of progression or regression, and most importantly, their consistency. The ultrasonic characteristics of unstable (vulnerable) plaques have been determined [14, 15], and populations or individuals at increased risk for cardiovascular events can now be identified [16]. In addition, high-resolution ultrasound facilitates the identification of the different ultrasonic characteristics of unstable carotid plaques associated with amaurosis fugax,
Texture and morphological analysis
89
FIGURE 3.1 (Color figure follows p. 274.) (a) An ultrasound B-scan image of the carotid artery bifurcation with the atherosclerotic plaque outlined; (b) the corresponding color image of blood flow through the carotid artery, which physicians use to identify the exact plaque region.
Medical image analysis method
90
TIAs, stroke, and different patterns of computed tomography (CT) brain infarction [14, 15]. This information has provided new insight into the pathophysiology of the different clinical manifestations of extracranial atherosclerotic cerebrovascular disease using noninvasive methods. Different classifications have been proposed in the literature for the character- ization of atherosclerotic plaque morphology, resulting in considerable confusion. For example, plaques containing medium- to high-level uniform echoes were clas- sified as homogeneous by Reilly [17] and correspond closely to Johnson’s [18] dense and calcified plaques, to Gray-Weale’s [19] type 3 and 4, and to Widder’s [20] type I and II plaques (i.e., echogenic or hyperechoic). A recent consensus on carotid plaque characterization has suggested that echodensity should reflect the overall brightness of the plaque, with the term “hypoechoic” referring to echolucent plaques [21]. The reference structure to which plaque echodensity should be compared with is blood for hypoechoic plaques, the sternomastoid muscle for the isoechoic, and the bone of the adjacent cervical vertebrae for the hyperechoic ones. 3.1.2 PREVIOUS WORK ON THE CHARACTERIZATION OF CAROTID PLAQUE There are a number of studies trying to associate the morphological characteristics of the carotid plaques as shown in the ultrasound images with cerebrovascular symptoms. A brief survey of these studies is given below. Salonen and Salonen [3], in an observational study of atherosclerotic progres- sion, investigated the predictive value of ultrasound imaging. They associated ultra- sound observations with clinical endpoints, risk factors for common carotid and femoral atherosclerosis, and predictors of progression of common carotid athero- sclerosis. On the basis of their findings, the assessment of common carotid athero- sclerosis using B-mode ultrasound imaging appears to be a feasible, reliable, valid, and cost-effective method. Geroulakos et al. [2] tested the hypothesis that the ultrasonic characteristics of carotid artery plaques are closely related to symptoms and that the plaque structure may be an important factor in producing stroke, perhaps more than the degree of stenosis. In their work, they manually characterized carotid plaques into four ultra- sonic types: echolucent, predominantly echolucent, predominantly echogenic, and echogenic. An association was found of echolucent plaques with symptoms and cerebral infarctions, which provided further evidence that echolucent plaques are unstable and tend to form embolisms. El-Barghouty et al. [4], in a study with 94 plaques, reported an association between carotid plaque echolucency and the incidence of cerebral computed tomog- raphy (CT) brain infarctions. The gray-scale median (GSM) of the ultrasound plaque image was used for the characterization of plaques as echolucent (GSM≤32) and echogenic (GSM>32). Iannuzzi et al. [22] analyzed 242 stroke and 336 transient ischemic attack (TIA) patients and identified significant relationships between carotid artery ultrasound plaque characteristics and ischemic cerebrovascular events. The results suggested that the features more strongly associated with stroke were either the occlusion of the ipsilateral carotid artery or wider lesions and smaller minimum residual lumen diameter. The
Texture and morphological analysis
91
features that were more consistently associated with TIAs included low echogenicity of carotid plaques, thicker plaques, and the presence of longitudinal motion. Wilhjelm et al. [23], in a study with 52 patients scheduled for endarterectomy, presented a quantitative comparison between subjective classification of the ultra- sound images, first- and second-order statistical features, and a histological analysis of the surgically removed plaque. Some correlation was found between the three types of information, where the best-performing feature was found to be the contrast. Polak et al. [5] studied 4886 individuals who were followed up for an average of 3.3 years. They found that hypoechoic carotid plaques, as seen on ultrasound images of the carotid arteries, were associated with increased risk of stroke. The plaques were manually categorized as hypoechoic, isoechoic, or hyperechoic by independent readers. Polak et al. also suggested that the subjective grading of the plaque characteristics might be improved by the use of quantitative methods. Elatrozy et al. [24] examined 96 plaques (25 symptomatic and 71 asymptomatic) with more than 50% internal carotid artery stenosis. They reported that plaques with GSM < 40, or with a percentage of echolucent pixels greater than 50%, were good predictors of ipsilateral hemispheric symptoms related to carotid plaques. Echolucent pixels were defined as pixels with gray-level values below 40. Furthermore, Tegos et al. [25], in a study with 80 plaques, reported a relationship between microemboli detection and carotid plaques having dark morphological characteristics on ultrasound images (echolucent plaques). Plaques were characterized using first-order statistics and the gray-scale median of the ultrasound plaque image. AbuRahma et al. [6], in a study with 2460 carotid arteries, correlated ultrasonic carotid plaque morphology with the degree of carotid stenosis. As reported, the higher the degree of carotid stenosis, the more likely it is to be associated with ultrasonic heterogeneous plaque and cerebrovascular symptoms. Heterogeneity of the plaque was more positively correlated with symptoms than with any degree of stenosis. These findings suggest that plaque heterogeneity should be considered in selecting patients for carotid endarterectomy. Asvestas et al. [26], in a pilot study with 19 carotid plaques, indicated a signif- icant difference of the fractal dimension between the symptomatic and asymptomatic groups. Moreover, the phase of the cardiac cycle (systole/diastole) during which the fractal dimension was estimated had no systematic effect on the calculations. This study suggests that the fractal dimension, estimated by the proposed method, could be used as a single determinant for the discrimination of symptomatic and asymp- tomatic subjects. In most of these studies, the characteristics of the plaques were usually subjec- tively defined or defined using simple statistical measures, and the association with symptoms was established through simple statistical analysis. In the work we are about to describe in this chapter, a large number of texture and morphological features were extracted from the plaque ultrasound image and were analyzed using multifeature, multiclassifier methodology.
Medical image analysis method
92
3.2 MATERIALS A database of digital ultrasound images of carotid arteries was created such that for each gray-tone image, there was also a color image indicating the blood flow. The color images were necessary for the correct identification of the plaques as well as their outlines. The carotid plaques were labeled as symptomatic after one of the following three symptoms was identified: stroke, transient ischemic attack, or amaurosis fugax. Two independent studies were conducted. In the first study with Data Set 1, a total of 230 cases (115 symptomatic and 115 asymptomatic) were selected. Two sets of data were formed at random: one for training the system and another for evaluating its performance. For training the system, 80 symptomatic and 80 asymptomatic plaques were used, whereas for evaluation of the system, the remaining 35 symp- tomatic and 35 asymptomatic plaques were used. A bootstrapping procedure was used to verify the correctness of the classification results. The system was trained and evaluated using five different bootstrap sets, with each training set consisting of 160 randomly selected plaques and the remaining 70 plaques used for evaluation. In the second study, where the morphology features were investigated, a new Data Set 2 of 330 carotid plaque ultrasound images (194 asymptomatic and 136 symptomatic) were analyzed. For training the system, 90 asymptomatic and 90 symptomatic plaques were used; for evaluation of the system, the remaining 104 asymptomatic and 46 symptomatic plaques were used. 3.3 THE CAROTID PLAQUE MULTIFEATURE, MULTICLASSIFIER SYSTEM The carotid plaque classification system was developed following a multifeature, multiclassifier pattern-recognition approach. The modules of the system are described in the following subsections and are illustrated in Figure 3.2. In the first module, the carotid plaque ultrasound image was acquired using duplex scanning, and the gray level of the image was manually standardized using blood and adventitia as reference. In the second module, the plaque region was identified and manually outlined by the expert physician. In the feature-extraction module, ten different texture and shape feature sets (a total of 61 features) were extracted from the segmented plaque images of Data Set 1 using the following algorithms: statistical features (SF), spatial gray-level-dependence matrices (SGLDM), gray-level difference statistics (GLDS), neighborhood gray-tone-difference matrix (NGTDM), statistical-feature matrix (SFM), Laws’s texture energy measures (TEM), fractal dimension texture analysis (FDTA), Fourier power spectrum (FPS), and shape parameters.
Texture and morphological analysis
93
FIGURE 3.2 Flowchart of the carotid plaque multifeature, multiclassifier classification system. (From Christodoulou, C.I. et al., IEEE Trans. Medical Imaging, 22, 902–912, 2003. With permission.) Following the feature extraction, several feature-selection techniques were used to select the features with the greatest discriminatory power. For the classification, a modular neural network using the unsupervised self-organizing feature map (SOM) classifier was implemented. The plaques were classified into two types: symptomatic or asymptomatic. For each feature set, an SOM classifier was trained, and ten different classification results were obtained. Finally, in the system combiner, the ten classification results were combined using: (a) majority voting and (b) weighted averaging of the ten classification results based on a confidence measure derived from the SOM. For the sake of comparison, the above-described modular system was also implemented using the KNN statistical classifier instead of the SOM. 3.3.1 IMAGE ACQUISITION AND STANDARDIZATION The protocols suggested by the ACSRS (asymptomatic carotid stenosis at risk of stroke) project [1] were followed for the acquisition and quantification of the imaging data. The ultrasound images were collected at the Irvine Laboratory for Cardiovas- cular Investigation and Research, Saint Mary’s Hospital, U.K., by two ultrasonographers using an ATL (model HDI3000, Advanced Technology Laboratories, Leichworth, U.K.) duplex scanner with a 4- to 7-MHz multifrequency probe. Longitudinal scans were performed using duplex scanning and color flow imaging [27]. B-mode scan settings were adjusted so that the maximum dynamic range was used with a linear postprocessing curve. The position of the probe was adjusted so that the ultrasonic beam was vertical to the artery wall. The time gain compensation (TGC) curve was adjusted (gently sloping) to produce uniform intensity of echoes on the screen, but it was vertical in the lumen of the artery,
Medical image analysis method
94
where attenuation in blood was minimal, so that echogenicity of the far wall was the same as that of the near wall. The overall gain was set so that the appearance of the plaque was assessed to be optimal and noise appeared within the lumen. It was then decreased so that at least some areas in the lumen appeared to be free of noise (black). The resolution of the images was on the order of 700×500 pixels, and the average size and standard deviation of the segmented images was on the order of 350±100×100±30 pixels. The scale of the gray level of the images was in the range from 0 to 255. The images were standardized manually by adjusting the image so that the median graylevel value of the blood was between 15 and 20 and the median gray-level value of the adventitia (artery wall) was between 180 and 200 [27]. The image was linearly adjusted between the two reference points, blood and adventitia. This standardization using blood and adventitia as reference points was necessary to extract comparable results when processing images obtained by different operators and equipment and vascular imaging laboratories. 3.3.2 PLAQUE IDENTIFICATION AND SEGMENTATION The plaque identification and segmentation tasks are quite difficult and were carried out manually by the expert physician. The main difficulties are due to the fact that the plaque cannot be distinguished from the adventitia based on brightness level difference, or using only texture features, or other measures. Also, calcification and acoustic shadows make the problem more complex. The identification and outlining of the plaque were facilitated using a color image indicating the blood flow (see Figure 3.1). All plaque images used in this study were outlined using their corre- sponding color blood flow images. This guaranteed that the plaque was correctly outlined, which was essential for extracting texture features characterizing the plaque correctly. The procedure for carrying out the segmentation process was established by a team of experts and was documented in the ACSRS project protocol [1]. The correctness of the work carried out by the single expert was monitored and verified by at least one other expert. However, the extracted texture features depend on the whole of the plaque area and are not significantly affected if a small portion of the plaque area is not included in the region of interest. Figure 3.1 illustrates an ultrasound image with the outline of the carotid plaque and the corresponding color blood flow image. Figure 3.3 illustrates a number of examples of symptomatic and asymptomatic plaques that were segmented by an expert physician.
Texture and morphological analysis
95
FIGURE 3.3 Examples of segmented symptomatic and asymptomatic plaques. Selected tex- ture values are given for the following features: median (2), entropy (14), and coarseness (36). (The numbers in parentheses denote the serial feature number as listed in Table 3.1.) 3.3.3 FEATURE EXTRACTION Texture features, shape parameters, and morphological features were extracted from the manually segmented ultrasound plaque images to be used for the classification of the carotid plaques. Texture contains important information that is used by humans for the interpretation and the analysis of many types of images. It is especially useful for the analysis of natural scenes, since they mostly consist of textured surfaces. Texture refers to the spatial interrelationships and arrangement of the basic elements of an image [28]. Visually, these spatial interrelationships and arrangements of the image pixels are seen as variations in the intensity patterns or gray tones. Therefore, texture features have to be derived from the gray tones of the image. Although it is easy for humans to recognize
Medical image analysis method
96
texture, it is quite a difficult task to define texture so that it can be interpreted by digital computers. In this work, ten different texture-features sets were extracted from the plaque segments using the algorithms described in Appendix 3.1. Some of the extracted features capture complementary textural properties. However, features that were highly dependent on or similar to features in other feature sets were identified through statistical,analysis and eliminated. The implementation details for the texture-feature-extraction algorithms can be found in Appendix 3.1 at the end of the chapter 3.3.3.1 Statistical Features (SF) The following statistical features were computed [29]: (1) mean value, (2) median value, (3) standard deviation, (4) skewness, and (5) kurtosis. 3.3.3.2 Spatial Gray-Level-Dependence Matrices (SGLDM) The spatial gray-level-dependence matrices as proposed by Haralick et al. [30] are based on the estimation of the second-order joint conditional probability density functions that two pixels (k, l) and (m, n) with distance d in direction specified by the angle θ have intensities of gray-level i and gray-level j. Based on the probability density functions, the following texture measures [30] were computed: (1) angular second moment, (2) contrast, (3) correlation, (4) sum of squares: variance, (5) inverse difference moment, (6) sum average, (7) sum variance, (8) sum entropy, (9) entropy, (10) difference variance, (11) difference entropy, and (12, 13) information measures of correlation. For a chosen distance d (in this work d=1 was used, i.e., 3×3 matrices) and for angles θ=0°, 45°, 90°, and 135°, we computed four values for each of the 13 texture measures. In this work, the mean and the range of these four values were computed for each feature, and they were used as two different feature sets. 3.3.3.3 Gray-Level Difference Statistics (GLDS) The GLDS algorithm [31] uses first-order statistics of local property values based on absolute differences between pairs of gray levels or of average gray levels to extract the following texture measures: (1) contrast, (2) angular second moment, (3) entropy, and (4) mean. These features were calculated for displacements δ=(0, 1), (1, 1), (1, 0), (1, −1), where δ≡(∆x, ∆y), and their mean values were taken. 3.3.3.4 Neighborhood Gray-Tone-Difference Matrix (NGTDM) Amadasun and King [28] proposed the neighborhood gray-tone-difference matrix to extract textural features that correspond to visual properties of texture. The following features were extracted, for a neighborhood size of 3×3: (1) coarseness, (2) contrast, (3) busyness, (4) complexity, and (5) strength.
Texture and morphological analysis
97
3.3.3.5 Statistical-Feature Matrix (SFM) The statistical-feature matrix [32] measures the statistical properties of pixel pairs at several distances within an image, which are used for statistical analysis. Based on the SFM, the following texture features were computed: (1) coarseness, (2) contrast, (3) periodicity, and (4) roughness. The constants Lr, Lc, which determine the maximum intersample spacing distance, were set in this work to Lr=Lc=4. 3.3.3.6 Laws’s Texture Energy Measures (TEM) For Laws’s TEM extraction [33, 34], vectors of length l=7, L=(1, 6, 15, 20, 15, 6, 1), E=(−1, −4, −5, 0, 5, 4, 1), and S=(−1, −2, 1, 4, 1, −2, −1) were used, where L performs local averaging, E acts as edge detector, and S acts as spot detector. If we multiply the column vectors of length l by row vectors of the same length, we obtain Laws’s l×l masks. In order to extract texture features from an image, these masks are convoluted with the image, and the statistics (e.g., energy) of the resulting image are used to describe texture. The following texture features were extracted: (1) LL, texture energy from LL kernel, (2) EE, texture energy from EE kernel, (3) SS, texture energy from SS kernel, (4) LE, average texture energy from LE and EL kernels, (5) ES, average texture energy from ES and SE kernels, and (6) LS, average texture energy from LS and SL kernels. 3.3.3.7 Fractal Dimension Texture Analysis (FDTA) Mandelbrot [35] developed the fractional Brownian motion model to describe the roughness of natural surfaces. The Hurst coefficient H(k) [34] was computed for image resolutions k=1, 2, 3, 4. A smooth surface is described by a large value of the parameter H, whereas the reverse applies for a rough surface. 3.3.3.8 Fourier Power Spectrum (FPS) The radial sum and the angular sum of the discrete Fourier transform [31] were computed to describe texture. 3.3.3.9 Shape Parameters The following shape parameters were calculated from the segmented plaque image: (1) X-coordinate maximum length, (2) Y-coordinate maximum length, (3) area, (4) perimeter, and (5) perimeter2/area. 3.3.3.10 Morphological Features Morphological image processing allows the detection of the presence of specific patterns, called structural elements, at different scales. The simplest structural ele- ment for nearisotropic detection is the cross ‘+’ consisting of five image pixels. Using the cross ‘+’ as a structural element, pattern spectra were computed for each plaque image as defined in the literature [36–38]. After computation, each pattern spectrum was normalized.
Medical image analysis method
98
All features of the ten feature sets were normalized before use by subtracting their mean values and dividing by their standard deviations. 3.3.4 FEATURE SELECTION The selection of features with the highest discriminatory power can reduce the dimensionality of the input data and improve the classification performance. A simple way to identify potentially good features is to compute the distance between the two classes for each feature as (3.1)
where m1 and m2 are the mean values, and σ1 and σ2 are the standard deviations of the two classes [39]. The best features are considered to be the ones with the greatest distance. The mean and standard deviation for all the plaques, as well as for the symptomatic and asymptomatic groups, were computed, and the distance between the two classes for each feature was calculated as described in Equation 3.1. The features were ordered according to their interclass distance, and the features with the greatest distance were selected to be used for the classification. Another way to select features and reduce dimensionality is through principal component analysis (PCA) [40]. In PCA, the data set is represented by a reduced number of uncorrelated features while retaining most of its information content. This is carried out by eliminating correlated components that contribute only a small amount to the total variance in the data set. In this study, the 61-feature vector was reduced to nine transformed parameters by retaining only those components that contributed more than 2% to the variance in the data set. A new feature set comprising the nine PCA parameters was used as input to the SOM and the KNN classifiers. 3.3.5 PLAQUE CLASSIFICATION Following the computer-aided feature extraction and selection, feature classification was implemented based on multifeature, multiclassifier analysis. The SOM classifier and the KNN classifier were used to classify the carotid plaques into one of the following two types: 1.Symptomatic because of ipsilateral hemispheric symptoms 2.Asymptomatic because they were not connected with ipsilateral hemispheric events The different features sets described in Subsection 3.3.3 were used as input to the classifier. 3.3.5.1 Classification with the SOM Classifier The SOM was chosen because it is an unsupervised learning algorithm where the input patterns are freely distributed over the output-node matrix [41]. The weights are adapted without supervision in such a way that the density distribution of the input data is
Texture and morphological analysis
99
preserved and represented on the output nodes. This mapping of similar input patterns to output nodes that are close to each other represents a discretization of the input space, allowing a visualization of the distribution of the input data. The output nodes are usually ordered in a two-dimensional grid, and at the end of the training phase, the output nodes are labeled with the class of the majority of the input patterns of the training set assigned to each node. In the evaluation phase, an input pattern is assigned to the output node with the weight vector closest to the input vector, and it is said to belong to the class label of the winning output node where it has been assigned. Beyond the classification result, a confidence measure was derived from the SOM classifier characterizing how reliable the classification result was. The confidence measure was calculated based on the classes of the nearest neighbors on the selforganizing map. For this purpose, the output nodes in a neighborhood window centered at the winning node were considered. The confidence measure was computed for five different window sizes: 1×1, 3×3, 5×5, 7×7, and 9×9. For each one of the ten feature sets, a different SOM classifier was trained. The implementation steps for calculating the confidence measure were as follows: Step 1: Train the classifier. An SOM classifier is trained with the training set, using as input one of the ten feature sets. Step 2: Label the nodes on the SOM. Feed the training set to the SOM classifier again and label each output node on the SOM with the number of the symptomatic or asymptomatic training input patterns assigned to it. Step 3: Apply the evaluation set. In the evaluation phase, a new input pattern is assigned to a winning output node. The number of symptomatic and asymptomatic training input patterns assigned to each node in the given neighborhood window (e.g., 1×1,…, 9×9) around the winning node are counted. Step 4: Compute the confidence measure and classify plaque. Calculate the confidence measure as the percentage of the majority of the training input patterns to the total number of the training input patterns in the given neighborhood window. To set its range from 0 to 1 (0=low confidence, 1=high confidence), the confidence measure is calculated more specifically as conf=2 (max{SN1, SN2}/(SN1+SN2))–1 (3.2) where SNm is the number of the input patterns in the neighborhood window for the two classes m={1, 2}: (3.3)
where L is the number of the output nodes in the R×R neighborhood window with L=R2 (e.g., L=9 using a 3×3 window), and Nmi is the number of the training patterns of the class m assigned to the output node i. Wi=
Medical image analysis method
100
1(2 di), is a weighting factor based on the distance di of the output node i to the winning output node. Wi gives the output nodes close to the winning output node a greater weight than the ones farther away (e.g., in a 3×3 window, for the winning node Wi=1, for the four nodes perpendicular to the winning node Wi=0.5 and for the four nodes diagonally located around Wi=0.3536, etc). The evaluation input pattern was classified to the class m of the SNm with the greatest value as symptomatic or asymptomatic. 3.3.5.2 Classification with the KNN Classifier For comparison reasons, the KNN classifier was also used for the carotid plaque classification. To classify a new pattern in the KNN algorithm, its k nearest neighbors from the training set are identified. The new pattern is classified to the most frequent class among its neighbors based on a similarity measure that is usually the Euclidean distance. In this work, the KNN carotid plaque classification system was implemented for values of k=1, 3, 5, 7, and 9, and it was tested using for input the ten different feature sets. In the case of the KNN, the confidence measure was simply computed as given in Equation 3.2 and Equation 3.3, with SNm being the number of the nearest neighbors per class m. 3.3.6 CLASSIFIER COMBINER In the case of difficult pattern-recognition problems, the combination of the outputs of multiple classifiers, using for input multiple feature sets extracted from the raw data, can improve the overall classification performance [42]. In the case of noisy data or of a limited amount of data, different classifiers often provide different generalizations by realizing different decision boundaries. Also, different feature sets provide different representations of the input patterns containing different classification information. Selecting the best classifier or the best feature set is not necessarily the ideal choice, because potentially valuable information contained in the less successful feature sets or classifiers may not be taken into account. The combination of the results of the different features and the different classifiers increases the probability that the errors of the individual features or classifiers will be compensated by the correct results of the rest. Furthermore, according to Perrone [43], the performance of the combiner is never worse than the average of the individual classifiers, but it is not necessarily better than the best classifier. Also, the error variance of the final result is reduced, making the whole system more robust and reliable. The use of a confidence measure to establish the reliability of the classification result can further improve the overall performance by weighting the individual classification results before combining. In this work, the usefulness of combining neural-network classifiers was investigated in the development of a decision-support system for the classification of carotid plaques. Two multifeature modular networks, one using the SOM classifier and one using the KNN classifier, were implemented. The first ten feature sets, described in Subsection 3.3.3, were extracted from the plaque ultrasound images of Data Set 1 and were inputted into ten SOM or KNN classifiers. The ten classification results were combined using: (a) majority voting and (b) weighted averaging based on a confidence measure.
Texture and morphological analysis
101
3.3.6.1 Majority Voting In majority voting, the input plaque under evaluation was classified as symptomatic or asymptomatic by the ten classifiers using as input the ten different feature sets. The plaque was assigned to the majority of the symptomatic or asymptomatic votes of the ten classification results obtained at the end of step 4 of the algorithm described in Subsection 3.3.5. The diagnostic yield was computed for the five window sizes: 1×1, 3×3, 5×5, 7×7, and 9×9. 3.3.6.2 Weighted Averaging Based on a Confidence Measure In combining with the use of a confidence measure, the confidence measure was computed from the ten SOM classifiers, as given in Equation 3.2. When combining, the confidence measure decided the contribution of each feature set to the final result. The idea is that some feature sets may be more successful for specific regions of the input population. The implementation steps for combining using weighted averaging were as follows: Step 1: Assign negative confidence measure values to the symptomatic plaques. If an input plaque pattern was classified as symptomatic, as given in step 4 of the algorithm described in Subsection 3.3.5, then its confidence measure is multiplied by –1, whereas the asymptomatic plaques retain their positive values. Step 2: Calculate the average confidence. Calculate the average of the n confidence measures that is the final output of the system combiner as (3.4)
Step 3: Classify plaque. If symptomatic, else if asymptomatic.
then the plaque is classified as then the plaque is classified as
The final output of the system combiner is the average confidence, and its values are ranging from –1 to 1. Values of conf close to zero mean low confidence of the correctness of the final classification result, whereas values close to –1 or 1 indicate a high confidence. In the case of the KNN classifier the n classification results were combined in a similar way to that of the SOM classifier, i.e., (a) with majority voting and (b) by averaging of the n confidence measures. The algorithmic steps described in the previous subsections for the SOM classifier apply for the KNN classifier as well. When averaging, the final diagnostic yield was the average of the n confidence measures obtained when using the n different feature sets.
Medical image analysis method
102
3.4 RESULTS 3.4.1 FEATURE EXTRACTION AND SELECTION In Data Set 1, a total of 230 (115 symptomatic and 115 asymptomatic) ultrasound images of carotid atherosclerotic plaques were examined. Ten different texturefeature sets and shape parameters (a total of 61 features) were extracted from the manually segmented carotid plaque images as described in Subsection 3.3.3 [39, 44]. The results obtained through the feature-selection techniques described in Subsection 3.3.4 and the selected features with the highest discriminatory power are given in Table 3.1 [39]. The mean and standard deviation for all the plaques, and for the symptomatic and asymptomatic groups, were computed for each individual feature. Furthermore, the distance between the two classes was computed as described in Subsection 3.3.4 in Equation 3.1, and the features were ordered according to their interclass distance. The best features were the ones with the greatest distance. As shown in Table 3.1, for all features the distance was negative, which means that the feature values of the two groups overlapped. The high degree of overlap in all features makes the classification task of the two groups difficult. The best texture features, as tabulated in Table 3.1, were found to be: the coarseness of NGTDM, with average and standard deviation values for the symptomatic plaques 9.3±8.2 and for the asymptomatic plaques 21.4±14.9; the range of values of angular second moment of SGLDM with 0.0095±0.0055 and 0.0050 ±0.0050 for the symptomatic and the asymptomatic plaques, respectively; and the range of values of entropy also of SGLDM with 0.28±0.11 and 0.36±0.11 for the symptomatic and the asymptomatic plaques, respectively. Features, from other feature sets that also performed well were: the median gray-level value (SF), with average values for the symptomatic plaques 15.7±16.6 and for the asymptomatic plaques 29.4±22.9; the fractal value H1, with 0.37±0.08 and 0.42±0.07 for the symptomatic and the asymptomatic plaques, respectively; the roughness of SFM, with 2.39±0.13 and 2.30±0.10 for the symptomatic and the asymptomatic plaques, respectively; and the periodicity also of SFM, with 0.58±0.08 and 0.62±0.06 for the symptomatic and the asymptomatic plaques, respectively. In general, texture in symptomatic plaques tends to be darker, with higher contrast, greater roughness, and with less local uniformity in image density and being less periodical. In asymptomatic plaques, texture tends to be brighter, with
TABLE 3.1 Statistical Analysis of 61 Texture and Shape Features Computed from 230 (115 Symptomatic and 115 Asymptomatic)
Texture and morphological analysis
103
Ultrasound Images of Carotid Atherosclerotic Plaques of Data Set 1 Symptomatic No. Texture Feature
Asymptomatic
Distance
Mean, Std.Dev., Mean, Std.Dev., m1 σ1 m2 σ2
Rank Order
Statistical Features (SF) 1
Mean
28.61
16.78
41.16
22.38
0.449
17
2
Median
15.71
16.62
29.40
22.87
0.484
10
3
Standard deviation
36.39
11.80
40.04
11.30
0.224
45
4
Skewness
2.790
1.548
2.083
1.429
0.335
34
5
Kurtosis
15.57
13.44
10.87
12.94
0.251
42
Spatial Gray-Level-Dependence Matrices (SGLDM): Mean Values 6
Angular second moment
0.1658
0.1866
0.0646
0.1201
0.456
11
7
Contrast
324.8
143.9
267.3
82.2
0.347
30
8
Correlation
0.812
0.138
0.876
0.104
0.372
26
9
Sum of squares: variance
1315.2
1081.3
1621.8
957.5
0.212
48
10
Inverse difference moment
0.4856
0.1827
0.3545
0.1613
0.538
6
11
Sum average
57.091
33.671
82.675
44.953
0.456
13
12
Sum variance
4,936.2
4,288.7
6,219.8
3,803.3
0.224
44
13
Sum entropy
3.759
1.163
4.619
1.000
0.561
5
14
Entropy
4.730
1.619
5.972
1.456
0.570
4
15
Difference variance
280.5
119.8
219.7
65.8
0.445
12
16
Difference entropy
2.210
0.613
2.504
0.495
0.373
27
17
Information measures
-0.417
0.051
−0.404
0.048
0.192
50
18
of correlation
0.937
0.062
0.965
0.034
0.399
20
Spatial Gray-Level-Dependence Matrices (SGLDM): Range of Values 19 Angular second moment
0.0095
0.0055
0.0050
0.0050
0.611 2
Medical image analysis method
104
20 Contrast
174.3
121.8
131.7
60.8
0.313 35
21 Correlation
0.108
0.105
0.066
0.070
0.331 33
22 Sum of squares: variance
42.06
30.97
29.77
15.68
0.354 28
23 Inverse difference moment
0.090
0.029
0.098
0.025
0.196 49
24 Sum average
0.955
0.683
0.657
0.287
0.402 19
25 Sum variance
324.5
231.4
233.4
108.4
0.357 24
26 Sum entropy
0.0656
0.0283
0.0505
0.0302
0.365 29
27 Entropy
0.277
0.109
0.365
0.106
0.571 3
28 Difference variance
148.64
108.30
103.52
48.93
0.380 22
29 Difference entropy
0.394
0.113
0.440
0.097
0.310 39
30 Information measures
0.103
0.019
0.102
0.018
0.048 58
31 of correlation
0.0314
0.0189
0.0214
0.0120
0.448 14
Gray-Level-Difference Statistics (GLDS) 32 Contrast
324.26
143.26
267.01
82.03
0.347 31
33 Angular second moment (Energy)
0.259
0.181
0.161
0.125
0.446 16
34 Entropy
2.228
0.619
2.526
0.501
0.374 25
35 Mean
6.107
2.427
6.451
2.168
0.106 54
Neighborhood Gray-Tone-Difference Matrix (NGTDM) 36 Coarseness
9.265
8.236
21.354
14.909
0.710 1
37 Contrast
0.902
1.564
0.656
1.512
0.113 53
38 Busyness
0.00060
0.00207
0.00011
0.00034
0.235 40
39 Complexity
22,446
16,005
27,120
14,346
0.217 47
40 Strength
772,828
703,980
1,118,719
783,246
0.328 36
Statistical-Feature Matrix (SFM) 41 Coarseness
10.424
5.406
8.730
4.476
0.241 43
42 Contrast
24.999
4.971
22.863
3.459
0.353 32
43 Periodicity
0.578
0.081
0.625
0.064
0.452 15
44 Roughness
2.386
0.127
2.301
0.100
0.527 8
TABLE 3.1 Statistical Analysis of 61 Texture and Shape Features Computed from 230 (115 Symptomatic and 115 Asymptomatic)
Texture and morphological analysis
105
Ultrasound Images of Carotid Atherosclerotic Plaques of Data Set 1 (continued) Symptomatic
Asymptomatic
Distance
No. Texture Feature Mean, Std.Dev., Mean, Std.Dev., m1 σ1 m2 σ2
Rank Order
Laws’s Texture Energy Measures (TEM) 45
LL: texture energy from LL kernel
113,786 57,837
139,232 53,432
0.323
37
46
EE: texture energy from EE kernel
1,045.3
534.0
1,090.4
489.9
0.062
57
47
SS: texture energy from SS kernel
131.82
64.53
110.14
53.64
0.258
41
48
LE: average texture 8,369.1 energy from LE and EL kernels
3754.8
9,514.1
3,639.9
0.219
46
49
ES: average texture 335.64 energy from ES and SE kernels
174.69
312.85
149.85
0.099
55
50
LS: average texture 1,963.5 energy from LS and SL kernels
1,008.5
2,054.6
907.2
0.067
56
Fractal Dimension Texture Analysis (FDTA) 51
H1
0.367
0.081
0.423
0.068
0.531
7
52
H2
0.291
0.063
0.336
0.059
0.521
9
53
H3
0.244
0.045
0.270
0.045
0.400
23
54
H4
0.207
0.050
0.216
0.034
0.148
51
Fourier Power Spectrum (FPS) 55
Radial sum
3,073.7
1,546.0
4,219.7
2,047.5
0.447
18
56
Angular sum
2,462.3
1,362.5
3,301.7
1,500.8
0.414
21
Shape Parameters 57
X-coord. max. length
349.24
110.89
354.27
95.92
0.034
60
58
Y-coord. max. length
100.95
36.42
99.39
27.19
0.034
59
59
Area
18,797
11,744
21,092
10,761
0.144
52
60
Perimeter
927.71
291.21
939.84
261.28
0.031
61
Medical image analysis method
61
Perimeter2/area
51.266
15.608
45.089
106
11.834
0.315
38
TABLE 3.2 Verbal Interpretation of Arithmetic Values of Some Features from Table 3.1 for Symptomatic vs. Asymptomatic Plaques Symptomatic Plaques
Asymptomatic Plaques
Texture Feature
Value Interpretation
Value Interpretation
Median gray scale
Low
Darker
High
Contrast
High
More local variations present Low in the image
Fewer local variations present in the image
Entropy
Low
Less local uniformity in image density
High
Image intensity in neighboring pixels is more equal
Roughness
High
More rough
Low
More smooth
Periodicity
Low
Less periodical
High
More periodical
Coarseness
Low
Less local uniformity in intensity
High
Large areas with small graytone variations
Fractals H1, H2
Low
Rough texture surface
High
Smooth texture surface
Brighter
less contrast, greater smoothness, and with large areas with small gray-tone variations and being more periodical. These results are in agreement with the original assumption that smooth surface, echogenicity, and a homogeneous texture are characteristics of stable plaques, whereas irregular surface, echolucency, and a heterogeneous texture are characteristics of potentially unstable plaques. Table 3.2 gives a verbal interpretation of the arithmetical values of some of the features from Table 3.1 for the symptomatic vs. the asymptomatic plaques [39]. Figure 3.4 illustrates several box plots of some of the best features as selected with Equation 3.1. Principal component analysis (PCA) was also used as a method for feature selection and dimensionality reduction [40]. The 61-feature vector was reduced to nine transformed parameters by retaining only those components that contributed more than 2% to the variance in the data set. The nine PCA parameters were used as a new feature set for classification. In Data Set 2, where the usefulness of the morphological features was investigated, a total of 330 ultrasound images of carotid atherosclerotic plaques were analyzed [45]. The morphological algorithm extracted 98 features from the plaque images. Using the entire pattern spectra for classification yielded poor results. Using Equation 3.1, the number of features used was reduced to only five, which proved to yield satisfactory classification results. The selected features represent the most significant normalized pattern spectra components. We determined that small features due to:
and P−5,‘+’
Texture and morphological analysis
107
(see Equation 3.60 in Appendix 3.1) yield the best results. Table 3.3 shows the good which may be susceptible to noise. However, it is also the feature performance of that is most sensitive to turbulent flow effects around the carotid plaques. Table 3.3 tabulates the statistics for the five selected morphological features for the two classes and their interclass distance as computed with Equation 3.1. Additionally, for Data Set 2, the SF, the SGLDM, and the GLDS texture-feature sets were computed and compared with the morphological features [45].
FIGURE 3.4 Box plots of the features gray-scale median (2), entropy (14), and coarseness (36) for the symptomatic and asymptomatic plaques. (The numbers in parentheses denote the serial feature number as listed in Table 3.1.) The notched box shows the median, lower and upper quartiles, and confidence interval around the median for each feature. The dotted line connects the nearest observations within 1.5 of the interquartile range (IQR) of the lower
Medical image analysis method
108
and upper quartiles. Crosses (+) indicate possible outliers with values beyond the ends of the 1.5 ×IQR. 3.4.2 CLASSIFICATION RESULTS OF THE SOM CLASSIFIERS For the classification task, the unsupervised SOM classifier was implemented with a 10×10 output node architecture, and it was trained for 5000 learning epochs. For training the classifier, 80 symptomatic and 80 asymptomatic plaques were used, whereas for evaluation of the system, the remaining 35 symptomatic and 35 asymp- tomatic plaques were used. To estimate more reliably the correctness of the classi- fication results, a bootstrapping procedure was followed. The system was trained and evaluated using five different bootstrap sets where, in each set, 160 different plaques were selected at random for training, and the remaining 70 plaques were used for evaluation. The SOM classifier yielded a confidence measure (see Subsec- tion 3.3.5) on how reliable the classification result was, based on the number of the nearest neighbors on the self-organizing map. Five different neighborhood windows were tested: 1×1, 3×3, 5×5, 7×7, and 9×9. The confidence measure was calculated using a weighting mask giving the output nodes nearest to the winning output node a greater weight than the ones farther away.
TABLE 3.3 Statistical Analysis of the Five Best Morphological Features Computed from 330 (194 Asymptomatic and 136 Symptomatic) Ultrasound Images of Carotid Plaques of Data Set 2 Symptomatic Plaques
Asymptomatic Plaques
Distance
Feature
Mean, m1
Std. Dev., Mean, σ1 m2
Std. Dev., σ2
P1.‘+’
0.0433
0.0407
0.0249
0.0229
0.393
P3.‘+’
0.1922
0.1218
0.1355
0.0870
0.379
P2.‘+’
0.1102
0.0888
0.0713
0.0520
0.378
P–4.‘+’
0.0080
0.0061
0.0119
0.0084
0.370
P–5.‘+’
0.0108
0.0079
0.0158
0.0109
0.367
Note: For each feature, the mean and standard deviation were computed for the asymptomatic group and for the symptomatic group. The distance between the symptomatic and the asymptomatic groups was computed as described in Equation 3.5.
Table 3.4 tabulates the diagnostic yield of the SOM classifiers for the evaluation set of Data Set 1 [39]. The best feature sets in average for all windows were: the SGLDM
Texture and morphological analysis
109
(range of values) with 65.3%, the TEM with 63.0%, followed by the NGTDM with 62.2%, the SGLDM (mean values) with 61.7%, and the GLDS with 61.5%. The worst feature set was the shape parameters, with an average diagnostic yield of only 49.2%. The best SOM window sizes in average were the large ones 5×5, 7×7, and 9×9, with an average DY of about 65%. The worst window size was the 1×1, with an average DY of only 43.3%. As given in Table 3.4, the best individual DY was 70%, and it was obtained by the SGLDM (range of values) using a 5×5 neighborhood window and by the NGTDM with a 9×9 window size. Figure 3.5 illustrates the distribution of 160 carotid plaques of the training set (80 symptomatic and 80 asymptomatic) on a 10×10 SOM using as input all the 61 features (*=symptomatic, o= asymptomatic). Similar plaques are assigned to neighboring SOM matrix nodes. The figure demonstrates the overlap between the two classes and the difficulty of the problem. For comparison reasons, the diagnostic yield was computed using as a separate feature set the first 15 best features that were selected through univariate selection, as described in Subsection 3.3.4 using Equation 3.1. Using the first 15 best features yielded an average DY for the five windows of 63.0%, with the highest DY of 68.5% obtained with the 7×7 window size. This was better than the average success rate of the individual feature sets but lower than the diagnostic yield of the best feature set, and it was much worse than the overall success rate of the combiner. Furthermore, the 15 best features selected through multivariate selection were also used for classification. The average diagnostic yield was poor (about 50%), and it was much lower than the diagnostic yield obtained by the univariate selection. These results show the high degree of overlap between the two classes, demonstrating the difficulty of using the search algorithms to identify feature combinations with
TABLE 3.4 Average Diagnostic Yield (DY) of the Self-Organizing Map (SOM) Classifier System for the Evaluation Set of Data Set 1 (35 Symptomatic and 35 Asymptomatic Plaques) of the Modular Neural Network Diagnostic System after Bootstrapping Available Data for Five Different Sets of Plaques Diagnostic Yield (%) Window Size Feature Set
1×1
3×3
5×5
7×7
9×9
Average
1 SF
46.0
59.7
65.7
66.6
66.6
60.9
2 SGLDM (mean)
45.7
65.7
66.3
65.7
65.1
61.7
3 SGLDM (range)
49.4
69.4
70.0
68.3
69.4
65.3
4 GLDS
40.3
66.0
66.9
66.6
67.7
61.5
5 NGTDM
39.1
64.0
68.3
69.4
70.0
62.2
6 SFM
39.4
59.7
65.1
65.1
65.1
58.9
Medical image analysis method
110
7 TEM
46.9
65.4
67.4
67.4
67.7
63.0
8 FDTA
40.9
62.0
63.7
64.9
65.1
59.3
9 FPS
44.6
61.4
62.6
64.3
63.4
59.3
10 Shape parameters
40.3
52.9
52.9
50.6
49.4
49.2
Average
43.3
62.6
64.9
64.9
65.0
60.1
Combine the ten feature sets with majority voting
64.0
67.7
66.0
66.6
66.9
66.2
Combine by averaging the ten confidence measures
68.9
71.1
73.1
72.6
72.0
71.5
15 best features
45.6
65.0
68.0
68.5
68.0
63.0
Note: DY is given for the ten feature sets, their average, and when combined using (a) majority voting and (b) by averaging the ten confidence measures. DY is also given for the first 15 best features as selected using Equation 3.5. DY was computed for five different SOM neighborhood windows: 1×1, 3×3, 5×5, 7×7, and 9×9.
good class separability. The nine parameters obtained through principal component analysis (PCA) were also used as input to the SOM classifier. The average diagnostic yield was about 64%, which was slightly better than the average DY of the best 15 features obtained by the univariate feature selection but still much lower than the diagnostic yield obtained by combining the ten feature sets. In the second data set, where the usefulness of the morphological features was investigated, 90 asymptomatic and 90 symptomatic plaques were used for training the classifier, whereas for evaluation of the system the remaining 104 asymptomatic and 46 symptomatic plaques were used [45]. Table 3.5 tabulates the diagnostic yield for the SOM classifier for the different feature sets and for different neighborhood window sizes on the self-organizing map. The highest diagnostic yield was 69.6%, and it was obtained with a 9×9 window size, using as input the GLDS feature set. On average, the results with the highest diagnostic yield were obtained by the GLDS feature set, which was 64.6%, followed by the morphological feature set with a diagnostic yield of 62.9%, the SGLDM with 62.2%, and the SF with 59.9%.
Texture and morphological analysis
111
FIGURE 3.5 Distribution of 160 carotid plaques of the training set (80 symptomatic and 80 asymptomatic) on a 10×10 SOM using as input all 61 features from Table 3.1 (*= symptomatic, o=asymptomatic). Similar plaques are assigned to neighboring-output matrix nodes. A new plaque is assigned to one winning output node and is classified based on the labels of the neighboring nodes in an R×R neighborhood window. The output nodes near the winning node are given a greater weight than the ones farther away. 3.4.3 CLASSIFICATION RESULTS OF THE KNN CLASSIFIERS The KNN classifier was also used for the carotid plaque classification. The KNN algorithm was implemented for values of k=1, 3, 5, 7, and 9, and the results are tabulated
Medical image analysis method
112
for Data Set 1 in Table 3.6. Highest diagnostic yields were achieved with k=1 and k=9, which shows the need to consider a large number of neighbors because of the overlap of the two classes. The best feature set was, in average for all k, the SGLDM (range of values) with a DY of 66.9%, which was also the best feature set for the SOM classifier. The best individual classification results were with the SGLDM (range of values) with DY 70.9%, and with the SGLDM (mean values) with 66.9%. In both cases k=9 was used. Table 3.7 tabulates the results of the KNN classifier for the second data set. The highest diagnostic yield was 68.7%, and it was obtained with k=3, using as input
TABLE 3.5 Diagnostic Yield (DY) of the SelfOrganizing Map (SOM) Classifier System for the Evaluation Set of Data Set 2 (46 Symptomatic and 104 Asymptomatic Plaques) for the SF, SGLDM Mean, GLDS, and Morphological Feature Sets Diagnostic Yield (%) Window Size Feature Set
1×1
3×3
5×5
7×7
9×9
Average
1 SF
40.5
61.6
65.8
66.2
65.5
59.9
2 SGLDM (mean)
44.2
66.0
64.4
67.8
68.7
62.2
3 GLDS
50.0
66.0
68.0
69.3
69.6
64.6
4 Morphological
52.4
66.7
64.7
64.7
65.8
62.9
Average
46.8
65.1
65.7
67.0
67.4
62.4
Note: DY was computed for five different SOM neighborhood windows: 1×1, 3×3, 5×5, 7×7, and 9×9.
the morphological features. On average, the results with the highest diagnostic yield were obtained by the morphological feature set, which was 66.3%, followed by the GLDS feature set with a diagnostic yield of 65.6%. 3.4.4 RESULTS OF THE CLASSIFIER COMBINER To enhance the classification success rate, the ten classification results of the SOM or KNN classifiers inputted with the ten feature sets of Data Set 1 were combined (a) using majority voting and (b) by averaging of the ten confidence measures. The results of the system combiner are tabulated for the SOM classifier in Table 3.4 and in Table 3.6 for the KNN classifier. In the SOM modular system, as shown in Table 3.4, the combination of the classification results significantly improved the average success rate for the ten feature sets, for all five window sizes, and for the three combining methods. The best combining method proved to be the averaging of the confidence measures, followed by the majority voting. The average diagnostic yield for the ten
Texture and morphological analysis
113
feature sets was 60.1%, improved to 66.2% when combined with majority voting, and to 71.5% when combined with the confidence measure. When combined by averaging the confidence measure from the five different neighborhood windows tested, the best result was obtained by the 5×5 window size and was 73.1%. However, the other window sizes also yielded comparable good results. This result was better than the best individual diagnostic yield obtained by the SGLDM (range of values) using the same window size, which was 70.0%. Figure 3.6 shows histograms of the distribution of the combined average confi- dence measure for the five bootstraps for the symptomatic and asymptomatic cases. For the symptomatic cases, negative values indicate the correctly classified plaques, whereas positive values indicate the misclassified plaques. The reverse applies for the symptomatic cases. The value of the average confidence measure indicates the degree of confidence of the final classification result. Values close to −1 or close to 1 mean high confidence, whereas values close to 0 mean low confidence.
TABLE 3.6 Average Diagnostic Yield (DY) of the k-Nearest Neighbor (KNN) Classifier System for the Evaluation Set of Data Set 1 (35 Symptomatic and 35 Asymptomatic Plaques) of the Modular Network Diagnostic System after Bootstrapping the Available Data for Five Different Sets of Plaques Diagnostic Yield (%) Window Size Feature Set
k=1
k=3
k=5
k=7
k=9
Average
1 SF
60.6
60.3
61.4
61.7
64.3
61.7
2 SGLDM (mean)
59.7
64.6
63.1
66.6
66.9
64.2
3 SGLDM (range)
61.1
68.3
66.9
67.1
70.9
66.9
4 GLDS
53.1
58.9
63.4
64.0
65.4
61.0
5 NGTDM
58.0
59.1
62.0
62.9
63.7
61.1
6 SFM
56.0
59.7
59.4
59.7
62.3
59.4
7 TEM
59.4
59.7
58.9
63.4
62.6
60.8
8 FDTA
58.6
59.7
61.7
64.9
64.6
61.9
9 FPS
55.1
54.9
57.7
59.7
62.0
57.9
10 Shape parameters
51.4
54.6
54.0
55.7
55.7
54.3
Average
57.3
60.0
60.9
62.6
63.8
60.9
Combine the ten feature sets with majority voting
57.4
58.3
63.1
65.1
65.4
61.9
Medical image analysis method
114
Combine by averaging the ten confidence measures
57.4
63.4
63.7
68.9
68.0
64.3
15 best features
66.9
64.9
64.9
66.9
66.9
66.1
Note: DY is given for the ten feature sets, their average, and when combined using (a) majority voting and (b) by averaging the ten confidence measures. DY is also given for the first 15 best features as selected using Equation 3.5. DY was computed for five different values of k.
In the case of the KNN modular system (see Table 3.6), combining the classi- fication results for the ten different feature sets was also of benefit. The average diagnostic yield was improved from 60.9% to 61.9% when combined with simple majority voting, and to 64.3% when combined by averaging the confidence measure. In general, the average success rate obtained for the KNN classifier when combined was lower than the success rate obtained by the SOM classifier (71.5%). The best diagnostic yield (68.9%) was achieved for k=7 when combined with the use of a confidence measure. This was lower than the best individual result of 70.9% achieved with the SGLDM (range of values) with k=9. 3.4.5 THE PROPOSED SYSTEM As noted in the previous subsection, the best results were obtained in the case of combining by averaging the confidence measures, using the SOM classifier. Based on these results, the generic steps for constructing an automated carotid plaque classification system are described in the following subsections.
TABLE 3.7 Diagnostic Yield (DY) of k-Nearest Neighbor (KNN) Classifier System for the Evaluation Set of Data Set 2 (for 46 Symptomatic and 104 Asymptomatic Plaques) for the SF, SGLDM Mean, GLDS, and Morphological Feature Sets Diagnostic Yield (%) Window Size Feature Set
k=1
k=3
k=5
k=1
k=9
Average
1 SF
60.0
63.3
63.3
65.3
62.0
63.9
2 SGLDM (mean)
62.7
58.7
67.3
62.7
64.7
63.8
3 GLDS
65.3
64.0
62.7
68.0
67.3
65.6
4 Morphological
62.7
68.7
66.7
68.0
64.0
66.3
Average
62.7
63.7
65.0
66.0
64.5
64.9
Note: DY was computed for five different values of k.
Texture and morphological analysis
115
FIGURE 3.6 Histograms of the distribution of the average confidence measure for the five bootstraps, for the symptomatic and asymptomatic cases, and for the SOM system using the SGLDM (range) feature set as input. For the symptomatic cases, negative values indicate the correctly classified plaques, whereas the positive values indicate the misclassified plaques. The reverse applies for the asymptomatic cases. Values close to zero mean low confidence. (From Christodoulou, C.I. et al., IEEE Trans. Medical Imaging, 22, 902–912, 2003. With permission.) 3.4.5.1 Training of the System Step 1: Image acquisition and preprocessing. Acquire ultrasound images of symptomatic and asymptomatic carotid plaques to compose the system training set. Standardize images using blood and adventitia as reference points and manually segment the plaque region. Step 2: Feature extraction. Extract from the segmented plaque images of the training set the n different texture feature sets described in Subsection 3.3.3. Step 3: Training of the SOM classifiers. Train an SOM classifier for each one of the feature sets of the training set, and label each output node on the SOM classifiers with the number of the symptomatic or asymptomatic training input patterns assigned to it.
Medical image analysis method
116
3.4.5.2 Classification of a New Plaque Step 4: Feature extraction for a new plaque. To classify a new carotid plaque image, repeat steps 1 and 2 and calculate the different feature sets for the new plaque image. Step 5: Input the feature sets to the trained classifiers and compute the confidence measures. Input each one of the feature sets to its corresponding previously trained SOM classifier and classify the plaque as symptomatic or asymptomatic as described in Subsection 3.3.5. For the n classification results, compute the n confidence measures as given in Equation 3.2, using a neighborhood window size of R×R, with R≥3. Multiply the confidence measures by – 1 when the plaque was classified as symptomatic. Step 6: Combine by averaging the confidence measures. Combine the n classification results by averaging the n confidence measures as described in Subsection 3.3.6. The final output of the system is a value ranging from –1 to 1. If it is negative, then the plaque is classified as symptomatic; if it is positive, the plaque is classified as asymptomatic. The value of the average confidence measure indicates the degree of confidence of the final classification result. Values close to –1 or close to 1 mean high confidence, whereas values close to 0 mean low confidence.
3.5 DISCUSSION In this work, a multifeature, multiclassifier modular system is proposed for the classification of carotid plaques recorded from high-resolution ultrasound images. Such a system will help in enhancing the significance of noninvasive cerebrovascular tests for the identification of patients with asymptomatic carotid stenosis at risk of stroke. 3.5.1 FEATURE EXTRACTION AND SELECTION A total of 61 texture features and shape parameters were extracted from the 230 carotid plaque images of Data Set 1 [39, 44]. The statistics for all the texture features extracted as tabulated in Table 3.1 indicate a high degree of overlap between the symptomatic and asymptomatic groups. The best texture features on an individual basis, using their statistics as tabulated in Table 3.1, were found to be the coarseness of NGTDM, the entropy, the mean values of angular second moment and inverse difference moment of SGLDM, the median gray-level value, the fractal values H1 and H2, and the roughness and periodicity of SFM. A relationship between plaque morphology and risk of stroke has been reported in previous works [2, 4, 5, 22, 24]. El-Barghouty et al. [4] reported an association between carotid plaque echolucency and the incidence of cerebral computed tomog- raphy (CT) brain infarctions using the gray-scale median (GSM) of the ultrasound plaque image for the characterization of plaques as echolucent (GSM≤32) and echogenic (GSM>32). Elatrozy et al. [24] also reported that plaques with GSM< 40 are good predictors of
Texture and morphological analysis
117
ipsilateral hemispheric symptoms related to carotid plaques. In this work, the cutoff GSM value was found to be about 23 (GSM≤23 for symptomatic plaques and GSM>23 for asymptomatic plaques). This difference in the computed GSM value from those obtained in the previous studies can be explained. In the case of El-Barghouty et al. [4], the difference in GSM values can be attributed to the fact that the gray level of the plaque images was not standardized using blood and adventitia as reference. In the case of Elatrozy et al. [24], a standardization procedure was followed, as used in this work, but with different values for the reference points. Furthermore, the use in this work of a color image indicating the blood flow facilitated identifying the correct outline of the plaque region. This was especially useful in the case of the highly echolucent (dark) plaques, where the plaque boundaries were not visible and therefore dark areas of the plaque were not considered. This can explain why the GSM value of 23 reported in this work is lower than the GSM values reported in the other two studies. In most of the above studies, the characteristics of the plaques were usually subjectively defined or characterized using simple statistical measures, and the association with symptoms was established through simple statistical analysis. In this work, a large number of texture and morphological features were extracted directly from the plaque ultrasound images, and they were used to develop an automated system that can classify carotid plaques as symptomatic or asymptomatic based on a multifeature, multiclassifier modular network paradigm. 3.5.2 PLAQUE CLASSIFICATION The neural SOM classifier was used for the classification of the carotid plaques. The SOM was chosen because it is an unsupervised learning algorithm where the input patterns are freely distributed over the output-node matrix, allowing an efficient mapping of the input data with no need to create exact classification boundaries. The supervised classifiers’ back propagation (BP) and radial basis function (RBF) were tested and failed to converge because of the high degree of overlap between the two classes. Ten different texture and shape feature sets were extracted from the plaque images of Data Set 1 and used to train multiple SOM classifiers [39]. The best feature sets in average were the SGLDM, followed by the TEM, the NGTDM, the GLDS, and the SF, whereas the worst feature set was the shape parameters. The classification results of the different feature sets are correlated with the rank order of the individual features as tabulated in Table 3.1. Usually, successful feature sets contained features that individually were ordered high. The statistical KNN classifier was also implemented for the classification of carotid plaques. This classifier also performed well, yielding results comparable in most cases with the results obtained by the SOM classifier. The best individual result for the KNN classifier was also achieved with the SGLDM (range) feature set, with a diagnostic yield of 67.1%. The KNN was computationally much faster compared with the SOM classifier. In the second study with 330 plaques [45], the usefulness of morphological features was investigated as a means of characterizing carotid plaques for the iden- tification of individuals with asymptomatic carotid stenosis at risk of stroke. As shown in the first study, texture features can successfully be used to classify carotid plaques. In this study, it was shown that morphological features compare well with the most successful texture
Medical image analysis method
118
feature sets and provide an additional tool for the identification of individuals at risk of stroke. In future work, the extracted morpho- logical features will be extended to investigate the classification performance of larger components and linear, directional structural elements. 3.5.3 CLASSIFIER COMBINER Combining techniques were used to enhance the classification success rate. Com- bining the classification results of the ten different feature sets of Data Set 1 with the use of the confidence measure significantly improved the classification results obtained by the individual feature sets, reaching an average diagnostic yield of about 73.1% for the SOM modular system. The benefits of combining are more obvious in the case where no dominant best feature set or best classifier is available [43], as was the case with the features extracted from the carotid plaque images. The idea behind combining is that although one of the classifiers will eventually yield the best performance, the sets of patterns misclassified by the different classifiers, using the different feature sets as input, will not necessarily overlap. This suggests that different classifier designs potentially offer complementary information that could be harnessed to improve the overall classifier performance [42]. The confidence measure computed as given in Equation 3.2 is a qualitative measure of the degree to which a plaque belongs to the assigned class of symptomatic or asymptomatic. The range of the confidence measure is from 0 to 1, where values close to 0 mean low confidence and values close to 1 mean high confidence. By multiplying the confidence measure of the plaques classified as symptomatic by—1, we obtain as output of the modular diagnostic system an average confidence measure ranging from –1 (symptomatic) to 1 (asymptomatic). Figure 3.6 illustrates the histograms of the confidence-measure distribution of the plaques examined in this work. The in-between values indicate the degree to which a plaque can be characterized as symptomatic or asymptomatic. In a prospective study where the plaques will be followed up and assessed over a period of time, it will be interesting to follow up on how this qualitative measure changes in response to medication or other treatment. 3.6 CONCLUSIONS AND FUTURE WORK The results presented in this chapter show that it is possible to identify a group of patients at risk of stroke based on texture features extracted from high-resolution ultrasound images of carotid plaques. This group of patients at high risk will be considered for surgery (carotid endarterectomy), while patients at low risk will be spared from an unnecessary and expensive surgery that also carries a risk. In future work, the proposed multifeature, multiclassifier system will be expanded to incorporate both textural and morphological features. The new system will be applied on an extended carotid plaque imaging data set recorded for the ACSRS project. Moreover, future work will also investigate the three-dimensional (3-D) reconstruction of the carotid plaque [46], which may lead to a better extraction of the textural information, resulting in a higher diagnostic yield. Three-dimensional imaging attempts to provide the ultrasonographer with a more realistic reconstruction and visualization of the 3-D
Texture and morphological analysis
119
structure under investigation. In addition, 3-D imaging can provide quantitative measurements of volume and surface distance in vascular anatomy, especially in pathological cases. Furthermore, advances in information technology and telecommunications, and more specifically wireless and mobile communications, and their convergence (telematics) are leading to the emergence of a new type of information infrastructure that has the potential of supporting an array of advanced services for health care [47]. Telemedicine can be defined as the delivery of health care and sharing of medical knowledge over a distance using telecommunication technology. The aim is to provide expert-based medical care to any place that health care is needed. In ultrasound imaging, there is likely a need for a second expert or a panel of experts to assess the vascular images/video, thus making the capture and transmission of digital ultrasound a necessity. In the context of the ACSRS project, an integrated database system was devel- oped taking into consideration important stroke-related clinical risk factors as well as noninvasive (paraclinical) parameters, i.e., high-resolution ultrasound images of the carotid and CT brain scans. This integration of information and its rapid acces- sibility through telemedicine facilitates the data-mining analysis to assess the risk of stroke. It is anticipated that the extraction of quantitative criteria to identify highand low-risk subgroups of patients will be a decisive factor in the selection of the therapy, either medical or surgical. APPENDIX 3.1 TEXTURE-FEATURE-EXTRACTION ALGORITHMS A3.1.1 STATISTICAL FEATURES The following statistical features were computed [29]. A3.1.1.1 Mean Value The mean of the gray-level values I1,…,IN of the pixels of the segmented plaque. ( 3 . 5 ) A3.1.1.2 Median Value The median Imed of the distribution of the gray-level values I1,…, IN is the value of the middle item of the distribution.
Medical image analysis method
120
A3.1.1.3 Standard Deviation
A3.1.1.4 Skewness The skewness characterizes the degree of asymmetry of a distribution around its mean. ( 3.7) A3.1.1.5 Kurtosis Kurtosis measures the peakedness or flatness of a distribution in relation to a normal distribution.
A3.1.2 SPATIAL GRAY-LEVEL-DEPENDENCE MATRICES (SGLDM) The spatial gray-level-dependence matrices as proposed by Haralick et al. [30] are based on the estimation of the second-order joint conditional probability density functions, f(i, j, d, θ). The f(i, j, d, θ) is the probability that two pixels (k, l) and (m,n), with distance d in the direction specified by the angle θ, have intensities of gray-level i and gray-level j [34]. The estimated values for these probability-density functions will be denoted by P(i, j, d, θ). In an Nx×Ny image, let Lx={1, 2,…, Nx} be the horizontal spatial domain, Ly={1, 2,…, Ny} be the vertical spatial domain, and I(x,y) be the image intensity at pixel (x, y). Formally, for angles quantized at 45° intervals, the unnormalized probability density functions are defined by (3.9)
(3.10)
(3.11)
Texture and morphological analysis
121
(3.12)
where # denotes the number of elements in the set. Haralick et al. [30] proposed the following texture measures that can be extracted from the spatial gray-level-dependence matrices. A3.1.2.1 Notation p(i, j) is the (i, j)th entry in the normalized spatial gray-level-dependence matrix. P(i, j)=P(i, j)/R where R is a normalizing constant. px(i) is the ith entry in the marginal probability matrix obtained by summing the rows of
Ng is the number of distinct gray levels in the quantized image.
A3.1.2.2 Texture Measures A.3.1.2.2.1 Angular Second Moment The angular second moment is a measure for homogeneity of the image. (3.13)
Medical image analysis method
122
A3.1.2.2.2 Contrast The contrast is a measure of the amount of local variations present in the image. (3.14)
A.3.1.2.2.3 Correlation Correlation is a measure of gray-tone linear dependencies. (3.15)
where µx, µy, and σx, σy, are the mean and standard deviation values of px and py A.3.1.2.2.4 Sum of Squares: Variance (3.16)
A.3.1.2.2.5 Inverse Difference Moment (3.17)
A.3.1.2.2.6 Sum Average (3.18)
A.3.1.2.2.7 Sum Variance (3.19)
A.3.1.2.2.8 Sum Entropy (3.20)
Texture and morphological analysis
123
A3.7.2.2.9 Entropy (3.21)
A3.7.2.2.70 Difference Variance f10=variance of Px−y (3.22) A.3.7.2.2.11 Difference Entropy (3.23)
A.3.7.2.2.12/13 Information Measures of Correlation (3.24) f13=(1–exp[−2.0(HXY2–HXY)])1/2 (3.25) (3.26) where HX and HY are entropies of px and py, and (3.27) (3.28)
A3.1.2.3 Extracted SGLDM Features For a chosen distance d (in this work d=1 was used), we have four angular grayleveldependence matrices, i.e., we obtain four values for each of the above 13 texture measures. The mean and the range of the four values for each of the 13 texture measures compose a set of 26 texture features that can be used for classifi- cation. Some of the 26 features are strongly correlated with each other, and a featureselection procedure can be applied to select a subset or linear combinations of them. In this work, the mean values and the range of values were computed for each feature for d=1, and they were used as two different feature sets.
Medical image analysis method
124
A3.1.3 GRAY-LEVEL-DIFFERENCE STATISTICS (GLDS) The gray-level-difference-statistics algorithm [31] uses first-order statistics of local property values based on absolute differences between pairs of gray levels or of average gray levels to extract texture measures. Let I(x, y) be the image intensity function, and for any given displacement δ≡(∆x, ∆y), let Iδ(x, y)=|I(x,y)−I(x+∆x, y+∆y)|. Let pδ be the probability density of Iδ(x, y). If there are m gray levels, this has the form of an mdimensional vector whose ith component is the probability that Iδ(x, y) will have value i. The probability density pδ can be easily computed by counting the number of times each value of Iδ(x, y) occurs, where ∆x and ∆y are integers. In a coarse texture, if δ is small, Iδ(x, y) will be small, i.e., the values of pδ should be concentrated near i=0. Conversely, in a fine texture, the values of pδ should be more spread out. Thus, a good way to analyze texture coarseness would be to compute, for various magnitudes of δ, some measure of the spread of values in p§ away from the origin. Four such measures are as follows. A3.1.3.1 Contrast Contrast is the second moment of pδ, i.e., its moment of inertia about the origin. (3.29) A3.1.3.2 Angular Second Moment ASM is defined as (3.30) ASM is small when the Pδ(i) values are very close and large when some values are high and others low. A3.1.3.3 Entropy Entropy is defined as (3.31) This is largest for equal pδ(i) values and small when they are very unequal. A3.1.3.4 Mean Mean is defined as (3.32)
Texture and morphological analysis
125
This is small when the pδ(i) are concentrated near the origin and large when they are far from the origin. The above features were calculated for δ=(0, 1), (1, 1), (1, 0), (1, -1), and their mean values were taken. A3.1.4 NEIGHBORHOOD GRAY-TONE DIFFERENCE MATRIX (NGTDM) Amadasun and King [28] proposed the neighborhood gray-tone-difference matrix to extract textural features that correspond to visual properties of texture. Let f(k, l) be the gray tone of a pixel at (k, l) having gray-tone value i. Then we can find the average gray tone over a neighborhood centered at, but excluding, (k, l) (3.33)
where (m, n)≠(0,0), d specifies the neighborhood size, and W=(2d+1)2. The neighborhood size d=1 was used in this work. Then the ith entry in the NGTDM is (3.34)
where {Nt} is the set of all pixels having gray tone i. The textural features are defined in the following subsections. A3.1.4.1 Coarseness Coarseness is defined as (3.35)
where Gh is the highest gray-tone value present in the image, and ε is a small number to prevent fcos from becoming infinite. For an N×N image, pi is the probability of occurrence of gray-tone value i, and is given by pi=Ni/n2 (3.36) where n=N–2d Amadasun and King [28] define an image as coarse when the primitives composing the texture are large and texture tends to possess a high degree of local uniformity in intensity for fairly large areas. Large values of fcos represent areas where gray-tone differences are small.
Medical image analysis method
126
A3.1.4.2 Contrast Contrast is defined as (3.37)
where Ng is the total number of different gray levels present in the image. High contrast means that the intensity difference between neighboring regions is large. A3.1.4.3 Busyness Busyness is defined as (3.38)
A busy texture is one in which there are rapid changes of intensity from one pixel to its neighbor. A3.1.4.4 Complexity Complexity is defined as (3.39)
A texture is considered complex when the information content is high, i.e., when there are many primitives in the texture, and more so when the primitives have different average intensities. A3.1.4.5 Strength Strength is defined as (3.40)
A texture is generally referred to as strong when the primitives that compose it are easily definable and clearly visible.
Texture and morphological analysis
127
A3.1.5 STATISTICAL-FEATURE MATRIX (SFM) The statistical-feature matrix [32] measures the statistical properties of pixel pairs at several distances within an image that are used for statistical analysis. Let I(x, y) be the intensity at point (x, y), and let δ=(∆x, ∆y) represent the intersample-spacing distance vector, where ∆x and ∆y are integers. The δ contrast, δ covariance, and δ dissimilarity are defined as CON(δ)≡E{[I(x, y)−I(x+∆x, y+∆y)]2} (3.41) COV(δ)≡E{[I(x, y)−η] [I(x+∆x, y+∆y)–η]} (3.42) DSS(δ)≡E{[I(x, y)−I(x+∆x, y+∆y)]} (3.43) where E{} denotes the expectation operation, and η is the average gray level of the image. A statistical-feature matrix (SFM), Msf, is an (Lr+1)×(2Lc+1) matrix whose (i, j) element is the d statistical feature of the image, where d=(j−Lc, i) is an intersample spacing distance vector for i=0, 1,…,Lr, j=0, 1,…,Lc, and where Lr, Lc are the constants that determine the maximum intersample spacing distance. In a similar way, the contrast matrix (Mcon), covariance matrix (Mcov), and dissimilarity matrix (Mdss) can be defined as the matrices whose (i, j) elements are the d contrast, d covariance, and d dissimilarity, respectively. Based on the SFM, the following texture features can be computed: coarseness, contrast, periodicity, and roughness. A3.1.5.1 Coarseness Coarseness is defined as (3.44) where c is a normalizing factor, Nr is the set of displacement vectors defined as Nr ={(i, j):|i|, |j|≤r} and n is the number of elements in the set. A pattern is coarser than another when the two differ only in scale, with the magnified one being the coarser and having a larger FCRS value. The definition of coarseness given here is different from the definition given for NGTDM in Equation 3.35. A3.1.5.2 Contrast Contrast is defined as (3.45)
Medical image analysis method
128
Contrast measures the degree of sharpness of the edges in an image. A3.1.5.3 Periodicity Periodicity is defined as (3.46)
where is the mean of all elements in Mdss, and Mdss(valley) is the deepest valley in the matrix. Periodicity measures the appearance of periodically repeated patterns in the image. A3.1.5.4 Roughness Roughness is defined as (3.47) where Df is the fractal dimension in horizontal and vertical directions. Df=3–H, and E{|∆I|}=k(δ)H, where H can be estimated from the dissimilarity matrix because the (i, j+Lc) element of the matrix is E{|∆I|}, with δ=(j, i). The larger the Df, the rougher is the image. In this study, an intersample spacing distance vector δ=(4, 4) was used. A3.1.6 LAWS′S TEXTURE ENERGY MEASURES (TEM) Laws’s texture energy measures [33, 34], are derived from three simple vectors of length 3:L3=(1, 2, 1), E3=(−1, 0, 1), and S3=(−1, 2, −1). These three vectors represent, respectively, the one-dimensional operations of center-weighted local averaging, symmetric first differencing for edge detection, and second differencing for spot detection. If these vectors are convolved with themselves, we obtain new vectors of length 5: L5=(1, 4, 6, 4, 1), E5=(−1, -2, 0, 2, 1), and S5=(−1, 0, 2, 0, −1). By further selfconvolution, we obtain new vectors of length 7: L7=(1, 6, 15, 20, 15, 6, 1), E7=(−1, –4, −5, 0, 5, 4, 1), and S7=(−1, −2, 1, 4, 1, −2, −1), where L7 again performs local averaging, E7 acts as edge detector, and S7 acts as spot detector. If we multiply the column vectors of length l by row vectors of the same length, we obtain Laws’s l×l masks. In this work, the following combinations were used to obtain 7×7 masks: LL=L7t L7 LE=L7t E7 LS=L7t S7 EL=E7t L7 EE=E7t E7 ES=E7t S7 SL=S7t L7
Texture and morphological analysis
129
SE=S7t E7 SS=S7t S7 In order to extract texture features from an image, these masks are convoluted with the image, and statistics (e.g., energy) of the resulting image are used to describe texture. The following texture features were extracted: LL: texture energy from LL kernel EE: texture energy from EE kernel SS: texture energy from SS kernel LE: average texture energy ES:
average
texture
energy
from
LE
and
EL
kernels
from
ES
and
SE
kernels
LS: average texture energy from LS and SL kernels The averaging of matched pairs of energy measures gives rotational invariance. A3.1.7 FRACTAL DIMENSION TEXTURE ANALYSIS (FDTA) Mandelbrot [35] developed the fractional Brownian motion model to describe the roughness of natural surfaces. It considers naturally occurring surfaces as the end result of random walks. Such random walks are basic physical processes in our universe [34]. An important parameter to represent a fractal dimension is the fractal dimension Df, estimated theoretically by Equation 3.48 [34] (3.48) where E( ) denotes the expectation operator, ∆I is the intensity difference between two pixels, c is a constant, and ∆r is the distance between two pixels. A simpler method is to estimate the H parameter (Hurst coefficient) from Equa- tion 3.49 E(|∆I|)=k(∆r)H (3.49) where k=E(|∆I|)∆r=1. By applying the log function we obtain logE(|∆I|)=logk+H log(∆r) (3.50) From Equation 3.50, the H parameter can be estimated, and the fractal dimension Df can be computed from the relationship Df=3-H (3.51)
Medical image analysis method
130
A smooth surface is described by a small value of the fractal dimension Df (large value of the parameter H), and the reverse applies for a rough surface. Given an M×M image, the intensity difference vector is defined as IDV ≡[id(1), id(2),…id(s)] (3.52) where s is the maximum possible scale, and id(k) is the average of the absolute intensity difference of all pixel pairs with vertical or horizontal distance k. The value of the parameter H can be obtained by using least squares linear regression to estimate the slope of the curve of id(k) vs. k in log−log scales. If the image is seen under different resolutions, then the multiresolution fractal (MF) feature vector is defined as (3.53) where M=2m is the size of the original image, H(k) is the H parameter estimated from image I(k), and n is the number of resolutions chosen. The multiresolution fractal (MF) feature vector describes also the lacunarity of the image. It can be used for the separation of textures with the same fractal dimension Df by considering all but the first components of the MF vectors. In this work, H was computed for four different resolutions. A3.1.8 FORIER POWER SPECTRUM (FPS) The discrete Fourier transform [31, 34] of an N×N picture is defined by (3.54)
where 0≤u and v≤N−1. The sample Fourier power spectrum is defined by Φ(u,v)≡F(u,v) F*(u,v)=|F(u,v)|2 (3.55) where Ф is the sample power spectrum, and * denotes the complex conjugate. Coarse texture will have high values of |F|2 concentrated near the origin, whereas in fine texture, the values will be more spread out. The standard set of texture features used are ring- and wedge-shaped samples of the discrete FPS. A3.1.8.1 Radial Sum Radial sum is defined as (3.56)
Texture and morphological analysis
131
for various values of the inner and outer radii r1 and r2. A3.1.8.2 Angular Sum Angular sum is defined as (3.57) for various angles θ1 and θ2. A3.1.9 SHAPE PARAMETERS The following shape parameters were derived: X-coord. max. length: the length of the X-coordinate of the rectangular win- dow where the plaque segment is enclosed Y-coord. max. length: the length of the Y-coordinate of the rectangular window where the plaque segment is enclosed Area: the number of pixels of the plaque segment Perimeter: the number of pixels that define the outline of the plaque segment Perimeter2/Area: parameter calculated to characterize areas with irregular outline A3.1.10 MORPHOLOGICAL FEATURES Morphological image processing makes it possible to detect the presence of specified patterns at different scales. We consider the detection of isotropic features that show no preference to particular directions. The simplest structural element for nearisotropic detection is the cross ‘+’ consisting of five image pixels. Thus, we consid- ered pattern spectra based on a flat ‘+’ structural element B. Formally, the pattern spectrum is defined in terms of the discrete-size transform (DST). We define the DST using Equation 3.58 [36–38] f→(…, d−k(f; B),; d−1(f; B), d0(f; B),…, d1(f; B),…, dk(f; B),…) (3.58) where (3.59)
where o denotes an open operation, and • denotes the close operation. The gray-scale DST is a multiresolution image-decomposition scheme that decomposes an image f into residual images fokB−fo(k+1)B, for k>0, and
Medical image analysis method
132
f•|k|B−f•(|k|+1)B for k<0. The pattern spectrum of a gray-scale image f, in terms of a structuring element B, is given by (3.60)
where (3.61) We note that in the limit, as k→∞, we have that the resulting image f•kB−f•(k +1)B converges to the zero image. Also, we note that with increasing values of k, f•kB is a subset of the original image. For k≥0, we can thus normalize the pattern spectrum by dividing by the norm of the original image ||f||. Similarly, as k→∞, ||f•kB|| converges to NM max f(x, y), where it is assumed that the image is of size N by M, and that images are extended using a constant extension by replicating boundary values. Hence, for k<0, we can normalize the pattern spectrum by dividing by NM max f(x, y)−||f||. Thus, to eliminate undesired variations, all the pattern spectra were normalized. ACKNOWLEDGMENT The material for this study was collected in the context of a European Union project (Biomed 2 Program, PL 950629) carried out in centers all over Europe and coordi- nated by the St. Mary’s Hospital, London, U.K. The aim of the project was to evaluate the value of noninvasive investigations in the identification of individuals with asymptomatic carotid stenosis at risk of stroke (ACSRS). The texture and morphological analysis carried out in this work was partly funded through the project Integrated System for the Support of the Diagnosis for the Risk of Stroke (IASIS) (supported by the 5th Annual Program for the Financing of Research and by the Research Promotion Foundation of Cyprus) as well as through the project Integrated System for the Evaluation of Ultrasound Imaging of the Carotid Artery (TALOS) (supported by the Program for Research and Technological Development 2003–2005 and by the Research Promotion Foundation of Cyprus). Partial funding was also obtained from the Research Committee of the University of Cyprus for the research activity support of C.S. Pattichis for the years 2003 and 2004.
Texture and morphological analysis
133
REFERENCES 1. Nicolaides, A., Asymptomatic carotid stenosis and the risk of stroke (the ACSRS study): identification of a high-risk group, chap. 38 in Cerebrovascular Ischaemia Investigation and Management, Med-Orion Publishing, London, 1996, pp. 435–441. 2. Geroulakos, G., Domjan, J., Nicolaides, A., Stevens, J., Labropoulos, N., Ramaswami, G., Belcaro, G., and Belcaro, G., Ultrasonic carotid artery plaque structure and the risk of cerebral infarction on computed tomography, J. Vasc. Surg., 20, 263–266, 1994. 3. Salonen, J.T. and Salonen, R., Ultrasound B-mode imaging in observational studies of atherosclerotic progression, Suppl. II Circ., 87, 56–65, 1993. 4. El-Barghouty, N., Geroulakos, G., Nicolaides, A., Androulakis, A., and Bahal, V., Computerassisted carotid plaque characterisation, Eur. J. Vasc. Endovasc. Surg., 9, 548–557, 1995. 5. Polak, J., Shemanski, L., O’Leary, D., Lefkowitz, D., Price, T., Savage, P., Brand, W., and Reid, C., Hypoechoic plaque at US of the carotid artery: an independent risk factor for incident stroke in adults aged 65 years or older, Radiology, 208, 649–654, 1998. 6. AbuRahma, A., Wulu, J., and Crotty, B., Carotid plaque ultrasonic heterogeneity and severity of stenosis, Stroke, 33, 1772–1776, 2002. 7. Sonka, M. and Fitzpatrick, J.M., Handbook of Medical Imaging, Vol. 2, SPIE Press, Bellingham, WA, 2000, pp. 809–914. 8. Gutstein, D.E. and Fuster, V., Pathophysiology and clinical significance of athero- sclerotic plaque rapture, Cardiovasc. Res., 41, 323–333, 1999. 9. Zukowski, A.J., Nicolaides, A.N., Lewis, R.T., Mansfield, A.O., Williams, M.A., Helmis, E., Malouf, G.M., Thomas, D., Al-Kutoubi, A., Kyprianou, P. et al., The correlation between carotid plaque ulceration and cerebral infarction seen on CT scan, J. Vasc. Surg., 1, 782–786, 1984. 10. Libby, P., Molecular basis of acute coronary syndromes, Circulation, 91, 2844–2850, 1995. 11. Clinton, S., Underwood, R., Hayes, L., Sherman, M.L., Kufe, D.W., and Libby, V., Macrophage-colony stimulating factor gene expression in vascular cells and human atherosclerosis, Am. J. Pathol, 140, 301–316, 1992. 12. Davies, M.J., Richardson, P.D., Woolf, N., Katz, D.R., and Mann, J., Risk of throm- bosis in human atherosclerotic plaques: role of extracellular lipid, macrophage and smooth muscle cell content, Br. Heart J., 69, 377–381, 1993. 13. Nicolaides, A.N., Kakkos, S., Griffin, M., Geroulakos, G., and Bashardi, E., Ultra- sound plaque characterisation, genetic markers and risks, Pathophysiol Haemost. Thromb., 32 (suppl. 1), 1–4, 2002. 14. Sabetai, M.M., Tegos, T.J., Nicolaides, A.N., El-Atrozy, T.S., Dhanjil, S., Griffin, M., Belgaro, G., and Geroulakos, G., Hemispheric symptoms and carotid plaque echomorphology, J. Vasc. Surg., 31, 39–49, 2000. 15. Tegos, T.J., Sabetai, M.M., Nicolaides, A.N., El-Atrozy, T.S., Dhanjil, S., and Stevens, J.M., Patterns of brain computed tomography infarction and carotid plaque echogenicity, J. Vasc. Surg., 33, 334–339, 2001. 16. Belgaro, G., Nicolaides, A.N., Laurora, G., Cesarone, M.R., De Sanctis, M., Incandela, L., and Barsoti, A., Ultrasound morphology classification of the arterial wall and cardiovascular events in a 6-year follow-up study, Atheroscler. Thromb. Vasc. Biol, 16, 851–856, 1996. 17. Reilly, L.M., Lusby, R.J., Hughes, L., Ferell, L.D., Stoney, R.J., and Ehrenfeld, W.K., Carotid plaque histology using real-time ultrasonography: clinical and therapeutic implications, Am. J. Surg., 146, 188–193, 1983. 18. Johnson, J.M., Kennelly, M.M., Decesare, D., Morgan, S., and Sparrow, A., Natural history of asymptomatic carotid plaque, Arch. Surg., 120, 1010–1012, 1985.
Medical image analysis method
134
19. Gray-Weale, A.C., Graham, J.C., Burnett, J.R., Burne, K., and Lusby, R.J., Carotid artery atheroma: comparison of preoperative B-mode ultrasound appearance with carotid endarterectomy specimen pathology, J. Cardiovasc. Surg., 29,676–681,1998. 20. Widder, B., Paulat, K., Hachspacher, J., Hamann, H., Hutschenreiter, S., Kreutzer, C, Ott, F, and Vollmar, J., Morphological characterisation of carotid artery stenosis by ultrasound duplex scanning, Ultrasound Med. Biol, 16, 349–354, 1990. 21. DeBray, J.M., Baud, J.M., and Dauzat, M., For the consensus conference: consensus on the morphology of carotid plaques, Cerebrovasc. Dis., 1, 289–296, 1997. 22. Iannuzzi, A., Wilcosky, T., Mercury, M., Rubba, P., Bryan, F, and Bond, G., Ultrasonographic correlates of carotid atherosclerosis in transient ischemic attack and stroke, Stroke, 26, 614–619, 1995. 23. Wilhjelm, J.E., Gronholdt, L.M., Wiebe, B., Jespersen, S.K., Hansen, L.K., Sillesen, H., Quantitative analysis of ultrasound B-mode images of carotid atherosclerotic plaque: correlation with visual classification and histological examination, IEEE Trans. Medical Imaging, 17, 910– 922, 1998. 24. Elatrozy, T., Nicolaides, A., Tegos, T., and Griffin, M., The objective characterisation of ultrasonic carotid plaque features, Eur. J. Vasc. Endovasc. Surg., 16, 223–230, 1998. 25. Tegos, T., Kalodiki, E., Nicolaides, A., Sabetai, M., Dhanjil, S., El-Atrozy, T., Ramaswami, G., Daskalopoulos, M., Robless, P., Pare, G., Byrd, S., and Kalomiris, K., Correlation of microemboli detected in the middle cerebral artery on transcranial Doppler with the echomorphology of the carotid atherosclerotic plaque, in Proc. VIII Mediterranean Conference on Medical and Biological Engineering and Computing, MEDICON ‘98, Lemesos, Cyprus, 1998. 26. Asvestas, P., Golemati, S., Matsopoulos, G., Nikita, K., and Nicolaides, A., Fractal dimension estimation of carotid atherosclerotic plaques from B-mode ultrasound: a pilot study, Ultrasound Medicine Biol, 28, 1129–1136, 2002. 27. Elatrozy, T., Nicolaides, A., Tegos, T., Zarka, A., Griffin, M., and Sabetai, M., The effect of Bmode ultrasonic image standardisation on the echogenicity of symptomatic and asymptomatic carotid bifurcation plaques, Int. Angiol, 17, 179–186, 1998. 28. Amadasun, M. and King, R., Textural features corresponding to textural properties, IEEE Transactions on Systems, Man., and Cybernetics, 19, 1264–1274, 1989. 29. Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T., Numerical Recipes: the An of Scientific Computing, Cambridge University Press, Cambridge, U.K., 1987. 30. Haralick, R.M., Shanmugam, K., and Dinstein I., Texture features for image classi- fication, IEEE Transactions on Systems, Man., and Cybernetics, Vol. SMC-3, pp. 610–621, Nov. 1973. 31. Weszka, J.S., Dyer, C.R., and Rosenfield, A., A comparative study of texture measures for terrain classification, IEEE Transactions on Systems, Man., and Cybernetics, 6, 269–285, 1976. 32. Wu, C.-M. and Chen, Y.-C, Statistical feature matrix for texture analysis, CVGIP: Graphical Models Image Process., 54, 407–419, 1992. 33. Laws, K.I., Rapid texture identification, SPIE, 238, 376–380, 1980. 34. Wu, C.-M., Chen, Y.-C., and Hsieh, K.-S., Texture features for classification of ultrasonic liver images, IEEE Trans. Medical Imaging, 11, 141–152, 1992. 35. Mandelbrot, B.B., The Fractal Geometry of Nature, Freeman, San Francisco, 1982. 36. Dougherty, E.R., An Introduction to Morphological Image Processing, SPIE Optical Engineering Press, Bellingham, WA, 1992. 37. Dougherty, E.R. and Astola, J., An Introduction to Nonlinear Image Processing, SPIE Optical Engineering Press, Bellingham, WA, 1994. 38. Maragos, P., Pattern spectrum and multiscale shape representation, IEEE Trans. Pattern Anal. Machine Intelligence, 11, 701–715, 1989. 39. Christodoulou, C.I., Pattichis, C.S., Pantziaris, M., and Nicolaides, A., Texture-based classification of atherosclerotic carotid plaques, IEEE Trans. Medical Imaging, 22, 902–912, 2003.
Texture and morphological analysis
135
40. Haykin, S., Neural Networks: a Comprehensive Foundation, Macmillan College Publishing, New York, 1994. 41. Kohonen, T., The self-organizing map, Proc. IEEE, 78, 1464–1480, 1990. 42. Kittler, J., Hatef, M., Duin, R., and Matas, J., On combining classifiers, IEEE Trans. Pattern Anal. Machine Intelligence, 20, 226–239, 1998. 43. Perrone, M.P., Averaging/modular techniques for neural networks, in The Handbook of Brain Theory and Neural Networks, Arbib, M.A., Ed., MIT Press, Cambridge, MA, 1995, pp. 126– 129. 44. Christodoulou, C.I., Pattichis, C.S., Pantziaris, M., Tegos, T., Nicolaides, A., Elatrozy, T., Sabetai, M., and Dhanjil, S., Multifeature texture analysis for the classification of carotid plaques, in Proc. Int. Joint Conf. Neural Networks, IJCNN ‘99, Washington, DC, 1999. 45. Christodoulou, C.I., Kyriacou, E., Pattichis, M.S., Pattichis, C.S., and Nicolaides, A., A comparative study of morphological and other texture features for the character- ization of atherosclerotic carotid plaques, Computer Analysis of Images and Patterns, 10th International Conference CAIP’2003, Springer-Verlag, Groningen, Netherlands, 2003, pp. 503–511. 46. Kovalev, V. and Petrou, M., Texture analysis in three dimensions as a cue to medical diagnosis, in Handbook of Medical Imaging: Processing and Analysis, Bankman, I.N., Ed., Academic Press, New York, 2000, pp. 231–247. 47. Pattichis, C.S., Kyriacou, E., Christodoulou, C.I., Pattichis, M.S., Loizou, C, Pantz- iaris, M., and Nicolaides, A., Cardiovascular: ultrasonic imaging in vascular cases, in Wiley Encyclopaedia of Biomedical Engineering, Wiley, New York, will be pub- lished in 2005.
4 Biomedical-lmage Classification Methods and Techniques Virginie F.Ruiz and Slawomir J.Nasuto 4.1 INTRODUCTION Biomedical-image analysis has been an attractive area for application of advanced classification techniques. Sonka and Fitzpatrick [1] provide a thorough review of classification methods ranging from computer vision through statistical approaches to machine learning. Most general classification methods have been applied in various contexts such as object recognition, registration, segmentation, and feature extraction, resulting in further advances to answer the demands and requirements of the clinical application domain. In this chapter, we first review some of the most general classification methods that have found application in biomedical-image processing and analysis over the past few decades. These include unsupervised clustering algorithms, basic supervised classifiers such as the Bayesian classifier, k-nearest neighbor, and neural networks. The last decade has seen major developments in machine-learning-based classification. In this chapter we review some of these advances that may prove useful for biomedical-image analysis, including support vector machine, kernel principal component analysis, independent component analysis, bagging and boosting tech- niques in ensembles of classifiers, and particle filter in response to nonlinear, nonGaussian inference problems. We present the essential fundamental principles of these techniques and, whenever possible, illustrate their application to biomedical images with examples of application found in the literature. 4.2 GENERAL IMAGE CLASSIFICATION We usually categorize classification algorithms as unsupervised or supervised. Unsupervised classification has its foundation in pattern recognition [2–4]. Based on clustering analysis, unsupervised classification extracts the structure of the data from the data itself. In contrast, supervised classification uses a priori knowledge or information to determine the data structure. This section briefly reviews the most common classification algorithms used in recent years. For an exhaustive survey of the numerous techniques developed in various application domains and associated description, the reader can refer to the literature [1– 11].
Medical image analysis method
138
4.2.1 UNSUPERVISED CLUSTERING ALGORITHMS Unsupervised classification based on clustering algorithms seeks to partition n objects into k groups. The partitioning is often nontrivial and relies on the optimi- zation of a criterion function designed for the specific problem. Most popular approaches define an initial partition from the n available objects. From the initial partition, an initial estimate of the k groups’ means can be evaluated. The algorithm most commonly applied in this context is the k-means algorithm. 4.2.1.1 k-Means Clustering The k-means algorithm partitions the data into k clusters. A popular criterion function associated with the k-means algorithm is the sum of squared error. Let k be the number of clusters and n the number of data in the sample x1,…,xn. We define the cluster centroid mi as (4.1)
with the membership function ωij indicating whether the data point xj belongs to a cluster ωi. The membership values vary according to the type of k-means algorithm. The standard k-means uses an all-or-nothing procedure, that is, ωij=1 if the data sample xj belongs to cluster ωi, else ωij=0. The membership function must also satisfy the following constraints (4.2)
The k-means algorithm uses a criterion function based on the measure of simi- larity or distance. For example, using the Euclidean distance that will favor the hyperspherical cluster, a criterion function to minimize is defined by (4.3)
which, considering the all-or-nothing membership function, simplifies to
Biomedical-image classification method and techniques
139
(4.4)
Other measures of similarity or distance will favor other cluster geometry. The reader may refer to Dawant and Zidenbos [2] for a summary and Webb [3] for details. The kmeans clustering algorithm is an iterative algorithm that minimizes the criterion function J. A brief outline of the procedure is shown in Figure 4.1. The initial choice of cluster and measure of similarity or distance affects the way in which the algorithm behaves. This type algorithm tends to converge to local minima close to cluster centroids set initially. This reinforces the importance of
FIGURE 4.1 K-means outline procedure. initial consideration with regard to the choice of cluster, keeping in mind that such an algorithm does not converge to the global minimum. The k-means clustering algorithms have been applied in medical imaging for segmentation/classification in various areas [12–17]. However, their applications are limited when compared with the performance achieved with other more advanced classification techniques and methods. Some references to these can be found in subsequent sections. Other specific applications of k-means clustering algorithms include segmentation of chest cavity in CT/MR images [12] and another example of tissue classification in the brain following noise reduction in the restoration process [13]. Sing et al. [14] also apply the k-means algorithm in the segmentation context to help in the analysis of brain functional MRI. The reader with interest in SPET imaging and
Medical image analysis method
140
Alzheimer’s disease can refer to Pagani et al. [15], where the kmeans is used to classify image sets into categories and extract the initial feature vector. It is also worth mentioning the application in cell image analysis [16], where the authors proposed a region-growing segmentation of textured cells that integrate a k-means clustering. In the following subsections, we outline two important variants of the k-means clustering that have been applied in the context of medical-image analysis, namely the ISODATA and the fuzzy c-means clustering algorithms. 4.2.1.2 ISODATA The previously discussed k-means clustering algorithms work with a fixed preset number of clusters. The ISODATA algorithm, in contrast, allows the number of clusters to vary. In this case, it is not unusual that a problem-dependent condition needs to be met for the clusters to be merged or split. As with the k-means, these conditions can be based on a measure of distance or similarity. Also, depending on the problem addressed, it is possible to define these conditions based on within-cluster and between-cluster measures of similarity. The body of literature over the past decade has seen the ISODATA used in a medical context for comparison with other more advanced classification methods. Some illustrations of these can be found in the literature [18, 19]. 4.2.1.3 Fuzzy c-Means The fuzzy c-means differs from the k-means methods in that it introduces the notion of partial membership. The membership function is no longer an all-or-nothing function but a continuous function. The membership ωij can take any real values between 0 and 1. Thus, the fuzzy c-means method allows any data sample xj to belong to all clusters with a certain degree of membership. The criterion function that needs to be minimized subject to the constraints of Equation 4.2 becomes (4.5)
where the exponent α is a scalar quantity between 1 and infinity that defines the degree of membership to a cluster. Subsequently, the cluster centroids are evaluated using the following expression [2] (4.6)
A value α=1 corresponds to the k-means presented earlier. The fuzzy c-means algorithm follows a similar iterative algorithm, as shown in Figure 4.1, and a more detailed procedure can be found in the literature [5].
Biomedical-image classification method and techniques
141
The application of the fuzzy c-means algorithms in medical-image analysis is varied [20–28]. In many cases, the method is applied in conjunction with other approaches to segmentation, classification, and feature extraction from the medical images. The most common application is in magnetic resonance imaging. In partic- ular in image of the brain, Boodraa et al. [20] and Leigh et al. [21] are concerned with the segmentation of multiple sclerosis lesions. Ahmed et al. [22] proposed an algorithm formulated by modifying the objective function of the standard fuzzy cmeans algorithm to compensate for inhomogeneities. It should be noted that fuzzy c-means nowadays usually applies an adaptive scheme that allows cluster number to change iteratively as the data are processed. The first to introduce such adaptive fuzzy c-means medical imaging was Pham [23]. Other modified fuzzy c-means applications are illustrated in Zhu [24] and Yoon [25]. A few examples of application in PET and SPET image analysis can also be of interest to the reader, particularly Zaidi et al. [26] for PET and Acton et al. for SPET [27]. An example of the use of fuzzy c-means in dermatology is presented by Schmid [28]. The fuzzy-clustering approach, whether adaptive or not, tends to be used as initial clustering analysis to provide appropriate inputs to some neural networks (NN). References to these can be found in Section 4.2.3. 4.2.2 BASIC SUPERVISED CLASSIFIERS Among the most basic supervised classifiers applied to medical-image classification are the minimum-distance classifier, the Bayesian classifier, k-nearest neighbor, Parzen window, and decision-tree classifiers. 4.2.2.1 Minimum-Distance Classifier The minimum-distance classifier is similar to a supervised version of a single iteration of the k-means algorithms and is also referred to in the literature as the minimum distance to the means. A pattern vector x belongs to a class ωi if the Euclidean distance to the class mean vector m, is minimum, that is ||x–mi||=min {||x–mj||} (4.7) The minimum-distance classifier is a supervised classifier, as in its initial stage the user specifies the sample set from which the class mean vectors are evaluated. An example of direct application in the late 1980s can be found in the literature [29]. In practice, the minimum distance is rarely used on its own in a medical-image classification context but is used mainly for comparison purposes. It has long been recognized that its performance is easily surpassed by other classification approaches. Nevertheless, it remains an interesting tool for validation purposes. 4.2.2.2 Bayesian Classifier The Bayesian classifier belongs to the class of supervised statistical classifiers. It is often used when one can assume that the probability density functions (PDF) involved are Gaussian.
Medical image analysis method
142
Let x be a pattern vector. The conditional probability density function of x given a class ωi, i=1, 2,…, N, is denoted p(x|ωi). The a priori probability of class ωi is denoted P(ωi). The vector x belongs to the class ωi if p(x|ωi)P(ωi)=max{p(x|ωj)P(ωj)}
and j≠i (4.8)
The decision function is usually defined by taking the logarithms of the above expression, resulting in Ji(x)=ln P(ωi)+ln p(x|ωi) (4.9) Thus, the classifier assigns the feature vector x to the class ωi that maximizes the decision function (Equation 4.9). Under the assumption of Gaussian probability density functions, it only requires calculation of the mean vectors and covariance matrices. If the conditional PDF of x is Gaussian, then the decision function becomes: (4.10) where mi is the mean vector of class ωi, and Σi is its covariance matrix. T denotes the transpose of a matrix operator. If q is the number of training samples, then the statistical characteristics of a class ωi are defined by (4.11)
Iterative procedure can be developed based on Bayesian learning [2]. A simple example that applies is when some a priori statistical information is known on the class mean vector and covariance matrix. In that case, their estimation can be refined based on the training set. The application of Bayesian classifiers in the context of medical images is varied, but it is mainly used as a comparison tool or as part of more elaborate approaches [30–33]. 4.2.2.3 k-Nearest Neighbor This supervised classification approach has in the past decade also been used in medical imaging [34–36]. As opposed to the Bayesian classifier, which considers the parametric density function, this method estimates nonparametric density func- tions from the available data [6–8]. Thus, the a posteriori probability P(ωi|x) can be estimated from the data samples. In this case, the k-nearest neighbor classifier assigns x with a class ωi if the
Biomedical-image classification method and techniques
143
majority of the k-nearest neighbors of x, measured in terms usually of Euclidean distance, are also assigned the class ωi [6]. In a similar but more complex manner, the method called the Parzen window labels x with the class ωi if the majority of samples in a volume centered about x have been labeled with the same class. Further consideration in relation with the choice of volumes is available in the literature [6]. Though these two nonparametric methods may seem easier to implement because they do not require a priori knowledge on the family of probability density functions to consider. One needs to remember that the performance of the classifier is strongly dependent on the number of data samples made available.
FIGURE 4.2 A simple example of a medical decision tree. (From Kamber, M. et al., IEEE Trans. Medical Imaging, 14, 442–453, 1995. With permission © [2004 IEEE].) 4.2.2.4 Decision Tree Decision tree is yet another type of supervised classification that has found appli- cation in medical-image analysis [37, 38]. A node in a tree represents a test on a particular feature, and each branch from that node represents the possible outcome to the test. A path in the tree, from the root of the tree to an end leaf, specifies the classification, with the end leaf representing an object class. There is a large variety of decision trees, but they usually use symbolic rules. One of the most common decision trees that forms the basis of the many other algorithms is the iterative dichotomize 3 (ID3). The ID3 algorithm uses a very familiar top-down “divide and conquer” approach [39, 40]. One of its applications to medical imagery is described by Kamber et al. [37]. This work
Medical image analysis method
144
provides a simple example of the tree structure, as seen in Figure 4.2, where the square boxes represent the features tested. In this example, each patient is to be classified as having an allergy, a cold, a flu, or as being healthy (bottom-end leaves of the tree). Using the top-down recursive approach mentioned, the algorithm starts with a single node containing all training samples, and the algorithms add iteratively the branches until all samples contained in each node belong to the same class. At a node when the samples cannot be assigned to a single class, the algorithm uses an entropy function [37] of the form (4.12) where the weight of a branch i is defined [37] as (4.13) and the entropy of the branch i is (14)
The feature that minimizes the entropy is retained as the feature test for the current node, and branches are then created for each possible outcome. In the work described by Kamber et al. [37], the authors conducted a comparative study between statistical classifiers such as the minimum-distance and Bayesian classifiers and the decision-treebased classifiers. Model-based classifiers surpassed the purely datadriven type of classifiers (see Kamber et al. [37] for details). Though the basic classifiers presented in the previous sections are of interest, they have over the past decade rapidly been outperformed with the development of a large variety of applications of neural networks in image processing and analysis in general and biomedical-image classification in particular. 4.2.3 NEURAL NETWORKS With neural networks (NN), we reach the category of sufficiently advanced algo- rithms that have been applied quite successfully in the field of medical-image processing and analysis and have shown potential to answer some of these clinical demands. Over the past decade, the application of neural networks to medical images has received the most attention from the scientific community. As opposed to the segmentation/classification techniques presented earlier, there is a large body of published work related to this application domain. Some early examples of appli- cations in medical images are available in the literature [41–52]. An artificial neural network architecture is an abstraction that gives a represen- tation of the nervous system. A neural network can be implemented as supervised or unsupervised. The most common supervised neural networks are the feed-forward
Biomedical-image classification method and techniques
145
multilayer perceptron (MLP), the back propagation (BP), and an extension of the latter that is called the cascade correlation. These are categorized as supervised because the neural network is said to be trained or to learn by example. Additional examples of their application in a medical context are presented in the literature [53–67]. On the other hand, the most common unsupervised neural network archi- tectures used are the Kohonen and the Hopfield architectures. Specific examples of their applications can be found in the literature [68–84]. We briefly discuss these different types of architecture in the following subsections. 4.2.3.1 Supervised Neural Networks A typical representation of the nervous system through artificial neural networks uses nodes connected to each other by weighted links. A feed-forward multilayer perceptron is specified by its number of inputs, outputs, and hidden layers. The dimension of the feature space usually specifies the number of input neurons or nodes in the input layer, while the number of object classes defines the number of
FIGURE 4.3 Feed-forward multilayer structure. output nodes. The number of nodes in the hidden layers can vary and is most commonly determined empirically. This number depends on the specific application and the complexity of the discriminant function used in the MLP neural network. Figure 4.3 gives a representation of an MLP structure. The objective of the algorithm is to determine the values of the weights so that an input vector x that belongs to a class ωi results in a higher value for the corre- sponding
Medical image analysis method
146
output Consider an MLP with m layers as shown in Figure 4.3. The input layer (indexed zero) represents the input vector o0=[x−1]T=[x1…xn−1]T (4.15) where xi represents the input values, and the dummy node value −1 is a bias value that may be needed when solving the system for the weights (more detail can be found in the literature [93]). The outputs for layer l can be represented by the column vector where nl is the number of nodes in layer l. The connecting weights from one layer (l−1) to another layer (l) can be written as a matrix W(l) for which each row vector element represents the linking weights asso- ciated with a node i, i=1,…, nl in layer l: (4.16) Finally, each node output value is a weighted sum of the feeding output values from the previous layer ol−1 that is passed through a nonlinear activation function. Thus, (4– 17) or more compactly [2] (4.18) An activation function that is very often used is the sigmoidal activation function (4– 19) The neural network can be trained using the generalized delta rule and propagate back an error vector to adjust the weights. In this case, the neural network is usually called a back propagation network. The generalized rule uses a gradient-descent technique that modifies the values of weights so as to minimize the square error between the output vector om and a desired output vector d that represents the truth or known output values of a class for a given training pattern x. Still considering the sigmoidal activation function (Equation 4.19), the output error vector can be defined as δm+1=d−om (4.20) Thus, the error vector is propagated back, and subsequently the weights are adjusted in the following manner [2] with a learning rate denoted α:
Biomedical-image classification method and techniques
147
(4.21) with the corresponding error vector for layer l: (4.22) where f′ is the derivative of the activation function with respect to the pattern vector x. The convergence of the neural network is usually affected by the way in which the weights are adjusted as well as practical considerations related to the application domain. Further technical consideration regarding convergence can be found in the literature [85]. A modification of the back propagation called cascade correlation has also been encountered in the medical context. The cascade correlation has a dynamic structure in the sense that the training process starts with a structure that has no hidden layer. If the error vector at the output nodes is not small enough, then a hidden node is created. That new node is connected to all other existing nodes, and the process is repeated until a sufficiently small error vector is achieved. Hall et al. [67] compared the performances of a feed-forward cascade correlation neural network with an unsupervised fuzzy c-means clustering while segmenting magnetic-resonance brain-section images. 4.2.3.2 Unsupervised Neural Networks In this class of unsupervised neural networks, the two most common structures applied in medical imaging are the Kohonen and Hopfield neural networks [68–84]. Both Kohonen and Hopfield neural networks are single-layered neural networks. It should be noted that neurons are often arranged as a two-dimensional map (so as to be able to produce a twodimensional map). The Kohonen NN is a feed-forward neural network that is trained using the winner-takes-all rule. This is fundamentally an unsupervised clustering in the feature space. The self-organizing behavior allows the network to find significant features of the input data without any supervisory input. The training starts by normalizing all the weight vectors. Then, according to the learning rule applied, a weight vector that minimizes the Euclidean distance with the input feature vector x and produces the highest output value is updated in the direction of the weight gradient taking into account the learning rate. The process is summarized by Dawant and Zidenbos [2], and complete technical details are described by Zurada [85]. On the other hand, the Hopfield neural network, also single layered, uses a feedback structure. As such, the Hopfield neural network behaves like a dynamic system and is often used as an optimization tool. The reader interested in a comprehensive review on the subject of neural network in image processing and analysis can refer to Egmont-Peterson et al. [86], who reviewed more than 200 applications of neural networks in image processing. The various applications are reviewed in terms of tasks performed by the algorithms, including preprocessing, data reduction or feature extraction, segmentation, object recognition, and image understanding or optimization. The applications are also evaluated in terms of input data type, for instance pixel, local feature, structure, object and object set, and scene. The tasks performed and the data types considered set very specific constraints on the application of neural networks.
Medical image analysis method
148
4.2.4 CONTEXTUAL CLASSIFIER It is also worth mentioning the contextual classifier. Though not reviewed here, it is yet another important category. Indeed, one of the problems often encountered in image classification is that it introduces what is called classification noise. This is the case for the techniques previously discussed, as they do not take into consider- ation possible spatial relationships between neighboring pixels (or voxels). One way of reducing the classification noise is to introduce some spatial information. Contextual classifiers correspond to the case where morphological filtering operations can be integrated into the classification and are usually formulated as an optimization procedure [87–99]. Contextual classifiers have also been applied in medical image segmentation/classification. A relaxation-labeling-based approach [90, 91] and stochastic relaxation [92, 94] are examples of the application of such classifiers to MR and ultrasonic images. 4.3 MODERN ADVANCES IN CLASSIFICATION OF BIOMEDICAL IMAGES Though we have seen many successful applications of general classification techniques to medical images, they all have shown their own limitations when confronted with strict validation and clinical constraints. Over the last decade, there have been exciting developments in machine-learning-based classification. We will review some major advances in this area that may prove very useful in the context of biomedical-image analysis. 4.3.1 KERNEL-BASED METHODS There has been a lot of interest in the theory and application of kernel-based classification methods. Kernel-based methods allow for dealing with highly dimensional or nonlinear problems by utilizing a kernel trick, representing the inner product using a kernel function. Thus, the kernel function expresses a value of an inner product of the data images in the feature space (4.23) where xi are data points and (xi) are their corresponding feature vectors. However, it is important to recognize that it is calculated in the data space. It thus disposes of calculations in the feature space and, moreover, of explicit knowledge of the mapping from the data to feature space. Thus, using kernels implicitly maps data into potentially highly dimensional feature space, where the problem to be solved may be simplified. The very attractive feature of the kernel trick is that it manages to avoid the computational burden associated with the increased dimensionality of the feature space. Any algorithm that can be expressed using inner product can be transformed into a kernel method. This, combined with the success of the support-vector machines (discussed in the next subsection), motivates the interest in kernel-based algorithms.
Biomedical-image classification method and techniques
149
4.3.1.1 Support Vector Machines Support-vector machines (SVM) belong to the supervisory classification and regression methods [95, 96]. Support-vector machines achieve generalization of regularities found in the data in a principled way. They use two basic principles in their operation: a maximal margin of separation and the kernel trick. SVM operates by mapping a classification problem from a data space to a feature space, where the problem may be easier to solve. The kernel function implicitly accounts for this mapping, thus avoiding the necessity of performing costly computations in a usually highly dimensional feature space. A solution to a classification problem is a decision boundary that separates both classes. SVM finds the decision boundary that allows for correct classification and is the most removed from the data. Thus, finding such a decision boundary amounts to solving a constrained convex optimization problem. The support-vector machines subsume in a unified framework several classifi- cation methods, including multilayer perceptrons (MLPs) and radial basis functions (RBFs). Indeed, by formulating the classification as a (constrained) optimization problem, the SVM’s operation does not suffer from multiple local minima, which often impede performance of such nonlinear classification methods as MLPs. An additional benefit of such a formulation is that, in contrast to the MLPs, the SVM’s architecture is constructed in a principled way, avoiding an arbitrary choice of the number of hidden units or basis functions. The SVM’s operation is based on an implicit mapping of data into the feature space using the kernel trick and the subsequent construction of a linear optimal classifier in the feature space. Therefore, we will begin discussion of SVM principles in the case of a linearly separable classification problem. Assume data of the form {xi, di}i=1,…,N (4.24) where xi is the input data and di is the input labels. Linear classification in the feature space means that we are seeking the linear decision boundary resulting in a classifier of the form y(x)=sign(wTφ(x)+b) (4.25) where w and b are, respectively, the weights and a bias of the classifier, and φ is the (nonlinear) mapping from the data space to the (possibly highly dimensional) feature space. In the linearly separable case, the decision function on the training data should yield wTφ(xi)+b≥1, if di=1 wTφ(xi)+b≤1, if di=−1 (4.26) or more compactly (4.27)
Medical image analysis method
150
SVM constructs a decision boundary that correctly classifies the training data and is removed the farthest from the data points. The distance from the decision boundary to the nearest training points, p, is called the margin of separation. It can be shown that in the linearly separable case, the margin of separation is equal to the inverse of the length of a vector containing decision-boundary parameters. Thus, the problem of finding an optimal decision boundary maximizing the margin of sepa- ration is formulated in SVM as a constrained convex optimization problem (4.28) where J(w)=wTw. This can be solved by locating a saddle point maxα minw,b L(w,b,α) (4.29) of the corresponding Lagrangian function containing the convex cost function J and a linear mixture of constraints (4.30) where αi in the above expression are called Lagrange multipliers. The primal problem is to locate the extremum of L with respect to w and b by equating the corresponding derivatives to zero: (4.31)
From the Kuhn-Tucker conditions for constrained optimization, it follows that (4.32) and the only multipliers, i, that are nonzero are those for which the constraints are met. The data points for which the constraints are fulfilled are called support vectors. Thus, the primal problem results in (4.33)
Biomedical-image classification method and techniques
151
where sv is the number of support vectors. The vector of parameter w is expressed using a linear combination of feature vectors of training data. In order to find the Lagrange multipliers, one needs to solve the dual problem, which is the maximum of (4.34)
subject to the constraint (4.35) The solution to the above problem allows for finding a linear (in the feature space) decision boundary ensuring maximal separation of the classes. The Lagrangian Q is expressed using inner products of the feature vectors, which allows for an application of the kernel trick (4.36)
subject to
Thus, the resulting classifier is of the form (4.37)
It is worth noting that although the linear decision boundary was constructed in the feature space, the SVM does not explicitly involve calculations in this space, nor does it require knowledge of the nonlinear mapping φ (Figure 4.4). Moreover, only the support vectors are effectively used in construction of the classifier, thus data sparseness is ascertained. In more difficult cases, the problem may not be linearly separable in the feature space. Nevertheless, the modification of the above classifier construction is possible by introducing slack variables, ξ, measuring deviation of data points from the
Medical image analysis method
152
FIGURE 4.4 A nonlinear mapping, φ, from data to feature space transforms the nonlinear classification problem into a linearly separable case. decision boundary. Thus, the optimal linear decision boundary in this case will be obtained from the following constraint-optimization problem (4.38)
subject to the constraints (4.39)
The above constraints take into account misclassification of the points, and the function to be optimized contains a term leading to minimizing this misclassification. The above problem can be solved again using the Lagrange method by formulating primal and dual problems. The main difference with the separable case discussed earlier is the regularization constant C controlling the trade-off between complexity and the number of points that are not separated. This constant introduces more stringent condition for Lagrange multipliers, with the general form of the classifier remaining unchanged. Different SVMs can be obtained by using different kernels. Popular choices are listed in Table 4.1. Moreover, it is possible to construct new kernels from other ones. Investigation of construction methods of kernels reflecting data properties constitutes a very active area of research. This contributes to the modularity of SVM architectures (Figure 4.5). The optimizing engine is independent of any particular
Biomedical-image classification method and techniques
153
TABLE 4.1 Popular Kernels Used in the Construction of SVMs SVM Type
Kernel
Polynomial
(xTxi+1)p
RBF Two-layer perception
FIGURE 4.5 Architecture of a support-vector machine. application, whereas it is possible to tune the architecture to the problem at hand by an appropriate choice of the kernel function. It is also possible to extend the construction of SVM to deal with the regression problems. The interested reader is referred to the literature [96] for details. Support-Vector Machine has already attracted interest from a number of scientists in the field of medical-image analysis [97–102]. We can find application of SVM in digital mammography, ultrasound imaging, and CT image modalities. ElNaqa et al. [97] have applied SVM for the detection of microcalcification clusters (see Figure 4.6) on digital mammograms in association with a “successive enhancement learning” scheme to improve performances. The microcalcification-detection problem is formulated as a supervised learning problem. The detection uses SVM to detect/classify each location where a microcalcification cluster (MC) is present or absent. To overcome the problem arising from an impractical training set for the class of absent microcalcification, the authors introduce a successive enhancement learning (SEL) scheme. The SEL is an iterative procedure that selects the most representative examples of absent microcalcifications among the available training images while keeping the training set to a reasonable
Medical image analysis method
154
FIGURE 4.6 (left) Mammogram in craniocaudal view, (right) Expanded view showing MCs. (From El-Naqa, I. et al., IEEE Trans. Medical Imaging, 21, 1552–1563, 2002. With permission © [2004 IEEE].) size. El-Naqa et al. used a database of 76 mammograms containing 1120 microcalcifications. As reported by the authors, individual MCs are sufficiently well localized to use a small window centered about the location of interest. For their application, the window chosen was a 9×9-pixel window. Pixels extracted from these windows are used to define the input pattern vectors to the SVM classifier. First, the image background is removed by applying a sharp high-pass filter (for details, see ElNaqa et al. [97]). In this way, some of the intraclass variation is restricted. Thus, the input vectors defined are the filtered windowed pixel values at each location where an MC is to be detected. The procedure applied by El-Naqa et al. can be summarized as follows: 1. For each MC location in a training mammogram, extract the vector xi from the M×M window and use this vector as an input pattern vector for the MC-present class (di=+1). 2. Randomly select the MC absent location in training mammograms and form the input vector from the associated window and use it as an input pattern for the MC-absent class (di=−1). For that particular application and for comparison purposes, the kernel functions used are the polynomial and the Gaussian RBF ones (see Table 4.1). To specify the SVM classifier, the kernel function parameter and the regularization constant C (Equation 4.38) are optimized. El-Naqa et al. apply an m-fold cross validation to the training
Biomedical-image classification method and techniques
155
mammogram set. The procedure is summarized below, although the interested reader can refer to El-Naqa et al. [97] and the references therein for further details:
FIGURE 4.7 Generalization error rate against regularization parameter C. (From El-Naqa, I. et al., IEEE Trans. Medical Imaging, 21, 1552–1563, 2002. With permission © [2004 IEEE].) 1. Randomly divide the training examples into N subsets. 2. For each parameter setting: Train the SVM classifier N times with N−1 subsets, leaving one different subset out in turn. Test the trained SVM against the held-out subsets and record classification. 3. Estimate the generalization error by averaging the classification errors. 4. The smaller generalization error specifies the SVM classifier to be retained. Using the above procedure, El-Naqa et al. present in Figure 4.7 the generalized error rate against the regularization parameter C. Results were achieved by trained SVM classifiers using a polynomial kernel of order two and three (shown on the left) and using a Gaussian RBF kernel with characteristics σ=2.5, 5, 10 (shown on the right). From these, the authors observed that the SVM classifier was not very sensitive to the setting for parameter values. However, it was noted that both the polynomial kernel and the RBF kernel achieved similar performances. As previously mentioned, the SVM attempts to minimize the error made on the test data taken outside the training set. As such, the classifier focuses on examples that are most difficult to classify, resulting in support vectors that correspond to the borderline cases. El-Naqa et al. illustrate this behavior with Figure 4.8, where we can see that SVs for the MC-present class (top right) could potentially be mistaken for MC-absent regions (bottom left). Conversely, the support vectors for the MCabsent class could be mistaken for MC-present image regions. El-Naqa et al. [97] also conducted an interesting comparative study. The SVM classifier was tested against four existing detection methods. Namely, the imagedifference technique (IDT), a method based on the difference of Gaussian (DoG), a
Medical image analysis method
156
wavelet decomposition-based method (WD), and finally a two-stage multilayer neural network (TMNN). These approaches will not be discussed here, but details are presented in the cited work [97]. The performance evaluation reported by ElNaqa et al. [97] uses the free-receiver operating-characteristic curves. The authors
FIGURE 4.8Examples of 9×9 image windows (top and bottom left) and support vectors (top and bottom right). (From El-Naqa, I. et al., IEEE Trans. Medical Imaging, 21, 1552–1563, 2002. With permission © [2004 IEEE].) found that the SVM classifier gives the best detection, and it was noted that the way in which the clusters were specified affects the ROC curves, which may render comparison with previously reported results difficult. Nevertheless, consistency in the overall tests
Biomedical-image classification method and techniques
157
conducted, best illustrated in Figure 4.9 by El-Naqa et al., give an indication of the potential usefulness of SVM applied in medical-image analysis. The best performance was obtained by a successive learning SVM classifier, which achieves around 94% detection rate at a cost of one FP cluster per image. The paper by El-Naqa et al. is not the only successful application of SVM to the medical-image classification problem. Still in the context of mammography, Chang et al. [98] applied SVM for diagnosis of breast tumors on ultrasound images. In this application, the SVM is formulated so as to classify breast tumors as benign or malignant. The authors used data from 250 cases of pathologically proved breast tumors, of which 140 were benign and 110 malignant. Based on their results, they commented that the SVM proved helpful. The classification ability of the SVM was found to be nearly equal to that of the neural network model, and the SVM has a much shorter training time (1 vs. 189 sec). Given the increasing size and complexity of data sets, Chang et al. [98] concluded that SVM is preferable for computer-aided
FIGURE 4.9 FROC curves of the methods tested. (From El-Naqa, I. et al., IEEE Trans. Medical Imaging, 21, 1552–1563, 2002. With permission © [2004 IEEE].)
Medical image analysis method
158
FIGURE 4.10 (a-c) Examples of polyps, (d, e) Examples of healthy tissue that have similar shapes. (From Gokturk, S.B. et al., IEEE Trans. Medical Imaging, 20, 1251–1260, 2001. With permission © [2004 IEEE].) diagnosis. Another example of application of SVM to ultrasound images can be found in the work of Prakash et al. [99], who conducted a study to evaluate the feasibility of analyzing the maturity of fetal lung with ultrasound images. Gokturk et al. [100] proposed a computer-aided detection of polyps in the colon that uses a linear SVM classifier. In this application, the SVM classifier is implemented on CT colonography. This is a noninvasive technique still at its early stage. It combines spiral CT data acquisition of the air-filled and cleansed colon with threedimensional imaging to create virtual endoscopic images of the colon surface. As in many medical applications, the problem of detecting colonic polyps is difficult, as there is a large variety of polyp types, which usually differ in size and shape. Figure 4.10 illustrates that difficulty. Gokturk et al. [100] implemented a standard linear SVM classifier. The interested reader can refer to the literature for further details that will not be discussed here. Chan et al. [101] compared traditional classifiers and machine learning algorithms. They also found that machine-learningtype classifiers such as SVM give improved performance. Another important, yet still young, area of application is genomic profiling. Segal et al. [102] have proposed a genome-based classification method for clear-cell sarcoma. In this study, the authors implemented a hierarchical cluster analysis, SVM classifier, and a principal-component analysis (see next subsection) that used genomic correlation within the data to classify the clear-cell sarcoma (melanoma of soft parts). The authors reported that supervised analysis showed a clear distinction between soft tissue sarcoma. This result was corroborated by the SVM approach that was used. The results achieved led Segal et al. [102] to conclude that the analysis of gene profiles using SVM may be an important diagnostic tool.
Biomedical-image classification method and techniques
159
FIGURE 4.11 A PCA finds the orthogonal directions corresponding to the greatest variance in the data. 4.3.1.2 Kernel Principal-Component Analysis SVM is an example of a supervised kernel method. However, kernel approach was also used successfully in unsupervised learning. An example discussed in this section is kernel principal-component analysis (kernel PCA) [103]. Traditionally, PCA has been used for structure discovery, feature extraction, dimensionality reduction, or denoising. These operations are appealing from the object-recognition perspective and have found wide use in image-analysis literature. PCA operates by rotating the coordinate system so that the maximal variation of data is observed along the coordinate axis (Figure 4.11). One way of performing PCA involves the use of eigen-decomposition of the data covariance matrix. (4.40)
where xi represents centered observed data (column vectors), and C is their sample covariance. C is a positive definite matrix, so its eigenvalues are nonnegative. The eigenvectors, v, obtained from the decomposition of C denote the principal (orthogonal) directions, and the corresponding eigenvalues, λ, are equal to variances along these directions. One of the very interesting PC A applications discussed in the literature [1] is the creation of deformable models. To capture a variability of a shape, one can normalize and align sample shape contours using well-defined correspondences. PCA performed on the sample contour points would provide directions for the most significant shape variations. Thus, a variability of the shape could be represented using the average contour plus a
Medical image analysis method
160
linear combination of the most significant components. Readers with an interest in deformable models can find more information in the literature [104, 105, 109]. The application of PCA as a linear transformation has been seen in a variety of fields, including magnetic resonance imaging [106–108]. Dehmeshki et al. [106] proposed an approach to characterize changes in multiple sclerosis based on the analysis of magnetictransfer ration (MTR) histograms. The authors conduct their study in two stages. The first stage involved a linear discriminant analysis (LDA) to classify the MTR histograms in control and MS subgroups. The use of LDA reduces the space of the MTR histogram to an optimal discriminant space for a nearest-mean classifier as it focuses on the betweenclass variations. Then the PCA further reduces the complexity of the analysis and is shown by the authors to be useful for analysis of correlation with current disability as PCA focuses on variation within NS subgroups. Dehmeshki et al. [106] used a multiple regression to estimate the multiple correlation of the principal components with the degree of disability in NS. The combination of classification and correlation techniques as presented by Dehmeshki et al. was shown to outperform conventional histogram feature-based approaches. As mentioned previously PCA can be interpreted as a method of extraction of linear features. Kernel PCA is an extension allowing for nonlinear feature extraction. Its principles are similar to those of SVM. The use of an appropriate kernel implicitly transforms data into a highly dimensional (possibly infinite) feature space, where standard PCA is performed. To formulate kernel PCA, it is necessary to note that eigenvectors, v, of C lie in the span of data points, xi, i=1,…, n, i.e. (4.41) Considering the projection of data into the feature space, and assuming that the feature vectors are already centered, it is possible to write (4.42)
Equation 4.42 can be rewritten to form Kα=λα (4.43)
Biomedical-image classification method and techniques
161
FIGURE 4.12 A kernel PCA performs linear PCA in the feature space. where K is the kernel matrix with ij elements being a scalar product kernel K(xi,xj), and is a vector composed of coefficients i. Thus, using an appropriate kernel function allows a formulation of the problem as a standard linear PCA in the feature space without the need to perform explicitly expensive computations in this highly dimensional space. Kernel PCA can be used for nonlinear feature extraction (Figure 4.12) and may be useful in deformable models when shapes of interest undergo more complex deformations. However, care needs to be taken when kernel PCA is applied, as in some cases it may not be possible to find a preimage of the principal direction found in the feature space. 4.3.2 INDEPENDENT COMPONENT ANALYSIS SVMs and PCA, discussed in the previous section, can discover structure in the data either by identifying distinct classes (SVM) or by appropriately transforming the data (both standard and kernel PCA). However, there is an alternative approach to both classification and structure discovery. This approach assumes a particular model of data generation. The machine learning community has expressed great interest in such methods, many of which enable a principled development of image-analysis methods. In this section, we discuss independent component analysis (ICA), a method used to separate mixtures of signals, with interesting applications in computer vision. Independent component analysis (ICA) [110] is closely related to PCA. It was originally proposed to deal with the blind-source separation problem, in which the goal is to recover the original sources from their linear instantaneous mixtures. x=As (4.44)
Medical image analysis method
162
FIGURE 4.13 ICA does not impose the orthogonality restriction, so it is able to discover structure in the data not detected by PCA. where s is the vector of sources and A is a mixing matrix. Thus, in ICA it is assumed that the sources are linearly and instantaneously mixed to produce the observed mixtures. The problem is ill posed, as there are infinitely many combinations of sources and mixing matrices that could give rise to the same mixtures. However, additional assumptions of source independence and unit variance transform the problem into a wellposed one. Thus, ICA is usually formulated in terms of finding maximally independent sources, hence transforming it into an optimization problem of appropriately formulated contrast function measuring degree of independence. ICA can be considered as a method of transformation of the coordinate system, similar to the interpretation of PCA. However, whereas PCA will look only for orthogonal axes, the independence assumption in ICA does not impose such a restriction. Thus, ICA can find nonorthogonal structure in data, which is impossible to achieve with PCA (see Figure 4.13). An additional requirement is that only one of the sources, at most, can be Gaussian for ICA to obtain a good separation. In fact, often the contrast functions are measuring the degree of non-Gaussianity of the sources. This can be motivated by invoking the centrallimit theorem. According to it, a (normalized) sum of independent random variables with some arbitrary (non-Gaussian) probability distribution has a distribution that approaches the normal distribution as the number of summands increases. Consider the following (4.45) Hence, y is more Gaussian than sources s and is least Gaussian when it is equal to one of the sources. This motivates a view of ICA that can be summarized by the following “equation” ICA=cost function+optimization
Biomedical-image classification method and techniques
163
where the cost function measures the departure from Gaussianity by the random variables, and the optimization often takes the form of gradient ascent on the cost function (4.46)
with renormalization of the unmixing vector at each step. There are various cost functions that can reflect the amount of non-Gaussianity. One of the most direct measures is kurtosis, the fourth-order moment of a centered random variable (4.47)
Kurtosis is zero for a Gaussian random variable, positive for super-Gaussian, and negative for sub-Gaussian random variables. Thus, kurtosis-based ICA takes the form of gradient ascent on the absolute value of kurtosis of the (prewhitened) source estimates (4.48)
The kurtosis-based ICA suffers from lack of robustness due to outliers, a problem alleviated by the use of negentropy as a cost function J(y)=H(yGauss)−H(y) (4.49) where H is a differential entropy H(y)=−∫p(y)1np(y)dy (4.50) and yGauss is a Gaussian random vector with the covariance matrix equal to that of y. Differential entropy of the random variable measures the amount of uncertainty and achieves a maximum for a Gaussian random variable. This implies that negentropy is nonnegative and vanishes only for Gaussian random variables. Negentropy is a very good statistical estimator of non-Gaussianity. However, it is difficult to compute, and usually some approximation is employed.
Medical image analysis method
164
The above-mentioned ICA methods are deflational, i.e., they estimate one ICA component but can be used in an iterative procedure that alternates between application of deflational ICA with orthonormalization of estimates, which can be performed using the Gram-Schmidt procedure [111]. All components can also be estimated simultaneously in a maximum-likelihood formulation of ICA. The probability density function of the mixture vector x=As is (4.51) where B=A−1 and pi are the probability density functions of the sources. The log likelihood of a sample of observed mixtures can thus be formed (4.52)
The gradient ascent on the log likelihood defines the Bell-Sejnowski ICA algorithm [112] (4.53)
where (4.54)
In recent years, as far as medical imaging is concerned, ICA has been applied mainly in the fields of magnetic resonance imaging (MRI) and gene analysis [113–116]. In particular, Jung et al. [113] and Beckmann and Smith [114] have conducted investigations with functional MRI (fMRI). Beckmann and Smith [114] present the extension of the standard ICA, namely a probabilistic ICA for fMRI. In fMRI, the majority of techniques applied to date are hypothesis driven and usually involve the application of simple regression or more elaborate general linear models. Beckmann and Smith [114] model the standard ICA with Equation 4.44 where, for this particular application, x is a p×n matrix representing the data from the fMRI with n voxels measured at p different time points. The matrix S is optimized to represent the spatial areas of the brain. The rows of S contain the corresponding independent spatial maps. Note that in this case, the conversion of the component map value into Z-scores can only use ad hoc thresholding techniques. Because the standard ICA is a noise-free model, it suffers from the typical problem of overfitting a noise-free generative model to noisy observation. Beckmann and Smith illustrate the problem in Figure 4.14, which displays results achieved with GLM and standard ICA of visual-stimulus fMRI data.
Biomedical-image classification method and techniques
165
FIGURE 4.14GLM and standard ICA analysis of visual stimulus fMRI data. (From Beckmann, C.F. and Smith, S.M., IEEE Trans. Medical Imaging, 23, 137–152, 2004. With permission © [2004 IEEE].) The data used are from a visual stimulation of 30-sec on/off block design with checkerboard revision at 8 Hz during the on stage. The GLM results displayed in Figure 4.14(i) were achieved with Gaussian Random Field-based inference with Z >2.3, p<0.01. The IC maps shown in Figure 4.14(ii–v) correspond to those for which the correlation of the time course extracted with the expected bold response is greater than 0.3. In all cases, the data were high-pass filtered and smoothed spatially with a Gaussian kernel. Following the transformation to Z-scores, Beckmann and Smith used a threshold Z>2.3 to produce the above IC maps (see [114] and references therein for details). The authors highlight that though one would expect similar activation maps from different techniques, the ones generated by the GLM and ICA are quite different. The ICA maps in Figure 4.14(ii) clearly identify primary visual areas. However, these are less extended than that of Figure 4.14(i) obtained with the GLM analysis. Figure 4.14(iii–v) display three other maps with well-localized activations in the visual cortical areas. However, these are difficult to interpret. To address these problems, Beckmann and Smith examined the blindsource separation problem in three stages: 1. Estimate the signal+noise subspace 2. Estimate the independent components in the signal+noise subspace 3. Assess the statistical significance of the sources where in this framework the probabilistic ICA model assumes that the p-dimensional vector xi of observations is generated from a set of q statistically independent nonGaussian sources si through a linear mixing process corrupted by additive Gaussian noise ŋ [114]: xi=Asi+µ+ŋi (4.55) The index i denotes the voxels’ location. The dimensionality of the model set by Beckmann and Smith is such that there are fewer sources than observations in time, and µ is the mean of the observation vector xi. The problem is then to find a good
Medical image analysis method
166
approximation of the true source signals by finding the linear transform matrix W such that [114]: ŝ=Wx (4.56) We will not discuss further the technical details of Beckmann and Smith’s integrated approach to PICA for fMRI. The approach can be summarized as follows: 1. Use probabilistic PCA to find the appropriate subspace containing the sources. 2. Estimate the model order based on extension of the Bayesian approach that approximates the a posteriori distribution of the model order to take into account the limited amount of data and fMRI noise structure. 3. Estimate the source signal in the subspace with a fixed-point iteration procedure that maximizes the non-Gaussianity of the sources 4. Extract the spatial maps and use the estimated standard error of the residual noise to convert the spatial maps into Z-statistic maps. 5. Assess the significantly modulated voxels using the Gaussian mixture model for the distribution of intensity value. The interested reader can refer to the literature [114] for an in-depth and comprehensive presentation. We simply illustrate the procedure applied by the authors with the schematic of the analysis steps, shown in Figure 4.15. Beckmann and Smith also tested their approach against artificial and real fMRI data for visual/audio-visual stimulation. Figure 4.16, Figure 4.17, and Figure 4.18 illustrate some of their results. Other examples of application of IC A include Muraky et al. [115] and Martoglio et al. [116]. Muraky et al. [115] used IC A to assign colors to MRI volume data, and Martoglio et al. [116] applied a variational ICA approach to gene analysis. 4.3.3 ENSEMBLES OF CLASSIFIERS Ensembles of classifiers constitute another area of intensive research in the field of machine learning. They encapsulate the intuitive principle of divide and conquer. The general approach in ensembles of classifiers is to construct an ensemble of potentially weaker classifiers specialized on simpler subtasks and subsequently to reconstruct the solution to the original complex problem from the responses generated by the collective. Various architectures corresponding to this paradigm can be broadly classified into static and dynamic architectures. Static ensembles of classifiers use training data to learn the division of labor and their individual responses, but once the final architecture is found, the way that its response is assembled from individual expert outputs does not take into account the input signal. On the other hand, dynamic architectures also use the input to influence the mechanism integrating the individual responses of the experts to produce an overall response. The overview of classifiers presented here concentrates on static architectures. Readers who are interested in dynamic architectures can consult the literature [117] for more details.
Biomedical-image classification method and techniques
167
FIGURE 4.15 Schematic illustration of the analysis steps involved in estimating the PICA model. (From Beckmann, C.R and Smith, S.M., IEEE Trans. Medical Imaging, 23, 137–152, 2004. With permission © [2004 IEEE].) Static architectures can be further subdivided into bagging and boosting techniques [118–125]. Bagging (bootstrap aggregating) was introduced by Breiman in 1986 [118] and is based on creating several training sets from the original set by resampling with replacement (bootstrapping). The individual experts are created by
Medical image analysis method
168
FIGURE 4.16 Analysis of visual stimulation data: (a) map from a fixedeffects analysis of the nonmotion confounded 31 data sets for reference; (b) FEAT Z-statistical maps (Z>3.0; p<0.01) obtained from GLM fitting of the motion-confounded data (left) and after the inclusion of estimated motion parameters as additional regressors of no interest (right); (c) estimated motion parameters on this one data set that show a high absolute correlation with the stimulus; (d) spatial maps from PICA performed in a space spanned by the seven dominant eigenvectors; and (e) set of spatial maps from a standard ICA analysis where the data were projected into a 29-dimensional subspace (out of a possible 35) of the data that retains >90% of the overall variability in the data. For ICA and PICA, all maps are shown where the associated time course has its peak power at the frequency of the stimulus. (From Beckmann, C.F. and Smith, S.M., IEEE Trans. Medical Imaging, 23,
Biomedical-image classification method and techniques
169
137–152, 2004. With permission © [2004 IEEE].) training the classifier on the training sets. The ensemble output is obtained by averaging outputs of the experts (regression) or by simple voting (classification). Gefen et al. [119] evaluated the usefulness of ultrasound tissue characterization for breast cancer diagnosis. The authors conducted a performance evaluation with the receiver operating characteristic (ROC) curves. Their analysis used a combination of ultrasound features, patient age, and radiological findings. Their methodology applied ordinal dominance and bootstrap resampling. The bootstrap resampling, in this particular application, was used to evaluate the confidence interval of the ROC summary index, Az. Boosting iteratively trains a weak classifier by adaptively changing its training set. Boosting by filtering requires a large data pool. A classifier is trained on a subset of a data pool of a given size, N. The trained classifier filters the remaining part of the pool by creating a new training set of N data with approximately 50% of misclassified points. (This can be achieved by including the first misclassified example with 0.5 probability, and then including the first correctly classified example with probability 0.5 and repeating the procedure until the number of data points in the new set is N.) Thus, the training set for the second classifier will be, by the nature of its construction, difficult for the first classifier (because the classification rate on this set is about 50%). Subsequently, the second classifier is trained on the filtered data, and the filtering procedure is repeated with both classifiers to prepare the training set for the third classifier. The final output of the ensemble can be created by voting, and the resulting error rate is strictly smaller than the error rates of the individual classifiers. Boosting by resampling does not require a very large data set, as is required by boosting by filtering. It achieves a similar goal of concentrating the subsequent classifiers on the examples that the previous experts found hard to learn by adaptively increasing the probability of sampling (with replacement) misclassified data for the next classifier. One of the most popular examples of this class of boosting methods is AdaBoost [120] (see Figure 4.19). Boosting by reweighing is similar to boosting by resampling, and this process can be used with the classifiers that are able to use weights associated with the data. An example of a classifier capable of utilizing such information is a multilayer perceptron. Instead of using the calculated probabilities for resampling, boosting by reweighing uses them as weights associated with each data point. Thus, the whole training set is used each time, but with the misclassified examples receiving gradually increasing weights. The underlying idea is similar to boosting by resampling; reweighing gradually shifts emphasis toward more difficult examples not yet mastered by classifiers constructed in previous cycles of operation. Empirical comparisons indicate that boosting quite consistently outperforms bagging. Analysis of ensemble algorithms constitutes an intensive area of research with links to Bayesian classification [121, 122] and SVMs being investigated [123].
Medical image analysis method
170
FIGURE 4.17 GLM vs. PICA on visual stimulation data: (a) FEAT results and regressor time course; (b) eigenspectrum of the data covariance matrix and estimate of the latent dimensionality; (c, d) spatial maps and associated time courses of PICA results, showing all maps with r>0.3 between the estimated and expected time course. (From Beckmann, C.F. and Smith, S.M., IEEE Trans. Medical Imaging, 23,137–152, 2004. With permission © [2004 IEEE].)
Biomedical-image classification method and techniques
171
FIGURE 4.18 Additional PICA maps from the visual activation data: (a) head motion (translation in Z), (b) sensory motor activation, (c) signal fluctuations in areas close to the sinuses (possibly due to interaction of B field inhomogeneity with head motion), (d) high-frequency MR “ghost,” and (e) “resting-state” fluctuations/physiological noise. (From Beckmann, C.F. and Smith, S.M., IEEE Trans. Medical Imaging, 23, 137–152, 2004. With permission © [2004 IEEE].)
Medical image analysis method
172
FIGURE 4.19 Pseudocode for the AdaBoost algorithm. 4.3.4 DISTRIBUTED METHODS Application of statistical-pattern recognition to biomedical-image analysis often results in nonlinear and non-Gaussian inference problems. Most often, such problems do not admit closed-form solutions, so efficient approximations need to be employed. This section discusses two main approaches based on sampling that have been developed over the last decade. The first approach, called particle filters, has been developed in the statistical community and is particularly suitable for Bayesian inference. The second, collectively known as model-based search methods, has been developed recently in the machine learning community, and it seems to be a promising approximation approach within the maximum-likelihood framework.
Biomedical-image classification method and techniques
173
4.3.4.1 Particle Filters Particle filters became increasingly popular in computer vision in motion-tracking applications [126]. However, they were also proposed in the context of Bayesian object localization [127]. The latter methodology is very general, but it requires explicit modeling of possible object deformations and the creation of explicit background and foreground noise. Thus it may prove very promising in locating objects of interest in medical-image analysis applications. Tracking applications are also relevant for applications involving image stacks, a situation often encountered in biomedical-image analysis. More generally, in statistical-pattern recognition it is common to represent an object of interest with an appropriately parameterized statistical model. Such a model can capture the natural variability in the object’s appearance in the scene, and it may also account for some transformation affecting it. Typical examples of such methodology include activeshape models, described in the context of medical-image analysis context in the literature [1]. Thus, the aim of the recognition procedure is to find the object parameterized with x from the observation data z. The posterior probability distribution p(x|z), encapsulating information about the object available in the data, can be calculated from the Bayes theorem (4.57) where p(x) is a prior distribution, and p(z|x) is the data likelihood (the observation model). In the images with significant amount of clutter, or in cases with multiple models of the object (or its transformations), the posterior is not Gaussian, and efficient methods of its calculation may not exist. In such situations the sampling methods are particularly attractive. Many different particle filters have been proposed. For an overview, see Arulampalam et al. [128] and Doucet et al. [129]. In this section, we describe a particle filter called CONDENSATION that was proposed specifically for image-processing applications by Isard and Blake [130]. CONDENSATION is also known as a sampling-importance resampling particle filter [128]. The motivation for developing CONDENSATION was to be able to track the curves in the visual clutter. Hence, the algorithm requires a statistical model of the change (dynamics) of the shape and a statistical model of the observation process. The basic idea behind a single iteration of the algorithm is to approximate the posterior distribution by a weighted collection of appropriately sampled particles. The weights can be calculated from the likelihood of observations. This procedure is known as factored sampling. First, a sample of particles {S1,…,SN} is generated from a prior density p(x). Then, each particle is assigned a weight wi, where (4.58)
Medical image analysis method
174
FIGURE 4.20 Pseudocode of the CONDENSATION algorithm. Hence, the posterior is approximated from the weighted sample (4.59) where δ is a delta Dirac measure. CONDENSATION proceeds iteratively at each step, treating the posterior found at the previous iteration as the current prior for the next step of the factored sampling. The process converges to the true posterior. The pseudoalgorithm is illustrated in Figure 4.20, and the graphical illustration of the process is provided in Figure 4.21. As noted in the literature [128], this algorithm propagates the particles through the state space using only the model of the state dynamics, i.e., without direct consideration of the observations. This can prove to be inefficient, prone to outliers, and sensitive to inaccuracies of the assumed model of state dynamics. Nevertheless, because of its relative simplicity and ease of sample generation, it became a popular method of motion tracking in computer vision, particularly in a nonlinear and nonGaussian case (where, for example, the more classical approach via a Kalman filter fails). Subsequently, various extensions to tracking via particle filters have been proposed to deal with problems related to computational efficiency, robustness against nonrigidity of motion, or tracking problems in the case of many objects or occlusions [131–136]. The literature also
Biomedical-image classification method and techniques
175
includes recent applications in the medical field [137] and, more specifically, in medical images [138].
FIGURE 4.21 Tracking nonstationary probability distribution with CONDENSATION. Particles, si, are represented by ovals with area proportional to their importance weights, wi. 4.3.4.2 Model-Based Search Methods Particle filters have been designed to approximate an optimal Bayesian filter. Hence, they are particularly relevant for Bayesian inference. One can pose the equivalent problem of statistical pattern recognition (or object tracking) in the maximum-likelihood framework. In this context, one will be interested in
Medical image analysis method
176
point estimates (and their tracking/update) of the object configuration x, corresponding to the highest likelihood of observations. In this context, a family of model-based search and optimization methods can be employed to perform an operation analogous to that of particle filters in Bayesian inference. The members of this family that are particularly relevant in the context of image analysis include cross entropy [139] and estimation of distribution algorithms [140]. Model-based search methods approach the optimization problem by alternating between two steps: 1. Sampling of candidate solutions using some parametric probabilistic model over the solution space, representing the likelihood of finding the correct solution. 2. Updating the model based on the candidate solutions from step 1. This step is meant to bias the sampling in the next cycle of operation toward even better candidate solutions. Within the above framework, cross entropy inductively builds a series of proposal distributions that emphasize the promising regions in the state space using an appropriate quality function and subsequently projecting the proposal distributions onto the predefined parametric family of admissible distributions. The latter step is achieved by maximizing the cross entropy between the candidate distribution and the parametric family using finite sample approximations. The estimation of distribution algorithms (EDA) was developed in the evolutionary computation community as an attempt to deal with the known difficulties of constructing appropriate genetic operators. The algorithms proceed by alternating between the following general three steps: 1. Generate a population of candidate solutions using the current probabilistic model (probability distribution over the state space). 2. Select a subpopulation of the candidate solutions on the basis of the score. 3. Re-estimate the model parameters using the subpopulation from step 2 (most often using maximum-likelihood approach). There are many variants of EDA algorithms and their discussion is beyond the scope of this review. For a general overview of the model-based search methods see Zlochin et al. [141] and the references therein. 4.4 CONCLUSION In the wide sense, classification is a very broad and well-researched field. Over the past 40 years, the most general classification methods have been used to improve image processing and analysis in a wide spectrum of applications, including remote sensing via satellites and other spacecrafts, image transmission, medical images, radar, sonar, acoustic image processing, robotics, automated inspection of industrial parts, etc. Technological advances at the end of the 20th century include the development of such medical-imaging systems as computed tomography (CT), magnetic resonance imaging (MRI), digital-subtraction angiography, Doppler ultrasound imaging, and other techniques based on nuclear emission, e.g., positron-emission tomography (PET) and single-photon-emission computed tomography. Most applications of general
Biomedical-image classification method and techniques
177
classification methods are problem-dependent and take into account the constraints imposed by the particular image modality considered, with the general methodology being modified or extended accordingly to enable applications in various contexts, such as object recognition, registration, segmentation, feature extraction, etc. Over the past decade, applications of some of these methods to biomedicalimage analysis have shown some limitations when confronted with problems such as high dimensionality, nonlinearity, and non-Gaussianity. In this chapter, we have reviewed some of the most advanced techniques available that may prove useful in solving some of those problems. We have focused on the presentation of the fundamental principle for support-vector machine; kernel principal-component analysis; independent component analysis; bagging and boosting techniques in ensembles of classifiers; and particle filter in response to nonlinear, non-Gaussian inference problems. Only recently have these techniques been applied to biomedical images. Over the past few years, the results made available to the public have shown evidence of their usefulness in the biomedical field, and at this early stage, indicators of their performance already suggest the promise of these few applications. REFERENCES 1. Sonka, M. and Fitzpatrick, J.M., Eds., Handbook of Medical Imaging, Vol. 2, Medical Image Processing and Analysis, SPIE Press, Bellingham, WA, 2000. 2. Dawant, B.M. and Zidenbos, A.P., Image segmentation, in Handbook of Medical Imaging, Vol. 2, Medical Image Processing and Analysis, Sonka, M. and Fitzpatrick, J.M., Eds., SPIE Press, Bellingham, WA, 2000. 3. Webb, A.R., Statistical Pattern Recognition, 2nd ed., John Wiley & Sons, New York, 2002. 4. Bezdek, J.C., Some nonstandard clustering algorithms, in Developments in Numerical Ecology, Legendre, P. and Legendre, L., Eds., Springer-Verlag, Berlin, 1987, pp. 225–287. 5. Bezdek, L.C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. 6. Duda, R.O. and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley and Sons, New York, 1973. 7. Kukunaga, K., Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972. 8. Jain, A.K. and Dubes, R.C., Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ, 1988. 9. James, M., Classification Algorithms, John Wiley, New York, 1985. 10. Thomas, I.L., Benning, V.M., and Ching, N.P., Classification of Remotely Sensed Images, Adam Hilger, Bristol, 1987. 11. Young, T.Y and Fu, K.-S., Eds., Handbook of Pattern Recognition and Image Processing, Academic Press, New York, 1986. 12. Dhawan, A.P. and Arata, L., Knowledge-based 3-D analysis from 2-D medical images, IEEEEng. Medicine Biol Mag., 10, 30–37, 1991. 13. Lundervold, A. and Storvik, G., Segmentation of brain parenchyma and cerebrospinal fluid in multispectral magnetic resonance images, IEEE Trans. Medical Imaging, 14, 339–349, 1995. 14. Singh, M., Patel, P., Khosla, D., and Kim, T., Segmentation of functional MRI by kmeans clustering, IEEE Trans. Nucl. Sci., 43, 2030–2036, 1996. 15. Pagani, M., Kovalev, V.A., Lundqvist, R., Jacobsson, H., Larsson, S.A., and Thurfjell, L., A new approach for improving diagnostic accuracy in Alzheimer’s disease and frontal lobe
Medical image analysis method
178
dementia utilising the intrinsic properties of the SPET dataset., Eur. J. Nucl. Med. Mol Imaging, [Epub ahead of print], 2003. 16. Wu, H.-S., Barba, J., and Gil, J., Region growing segmentation of textured cell images, Electron. Lett., 32, 1084–1085, 1996. 17. Grus, F.H., Augustin, A.J., Evangelou, N.G., and Toth-Sagi, K., Analysis of tearprotein patterns as a diagnostic tool for the detection of dry eyes, Eur. J. Ophthalmol., 8, 90–97, 1998. 18. Herskovits, E., A hybrid classifier for automated radiologic diagnosis: preliminary results and clinical applications, Comput. Methods Programs Biomed., 32, 45–52, 1990. 19. Mitsias P.D., Jacobs, M.A., Hammoud, R., Pasnoor, M., Santhakumar, S., Papamitsakis, N.I., Soltanian-Zadeh, H., Lu, M., Chopp, M., and Patel, S.C., Multiparametric MRI ISODATA ischemic lesion analysis: correlation with the clinical neurological deficit and single-parameter MRI techniques, Stroke, 33, 2839–2844, 2002. 20. Boudraa, A.O., Dehak, S.M., Zhu, Y.M., Pachai, C., Bao, Y.G., and Grimaud, J., Automated segmentation of multiple sclerosis lesions in multispectral MR imaging using fuzzy clustering, Comput. Biol Med. 2000, 30, 23–40, 2000. 21. Leigh, R., Ostuni, J., Pham, D., Goldszal, A., Lewis, B.K., Howard, T., Richert, N., McFarland, H., and J.A., R, Estimating cerebral atrophy in multiple sclerosis patients from various MR pulse sequences, Multiple Sclerosis, 8, 420–429, 2002. 22. Ahmed, M.N., Yamany, S.M., Mohamed, N., Farag, A.A., and Moriarty, T., A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data, IEEE Trans. Medical Imaging, 21, 193–199, 2002. 23. Pham, D.L. and Prince, J.L., Adaptive fuzzy segmentation of magnetic resonance images, IEEE Trans. Medical Imaging, 18, 737–752, 1999. 24. Zhu, C. and Jiang, T., Multicontext fuzzy clustering for separation of brain tissues in magnetic resonance images, Neuroimage, 18, 685–696, 2003. 25. Yoon, U., Lee, J.M., Kim, J.J., Lee, S.M., Kim, I.Y., Kwon, J.S., and Kim, S.I., Modified magnetic resonance image-based parcellation method for cerebral cortex using successive fuzzy clustering and boundary detection, Ann. Biomed. Eng., 31, 441–447, 2003. 26. Zaidi, H., Diaz-Gomez, M., Boudraa, A., and Slosman, D.O., Fuzzy clustering-based segmented attenuation correction in whole-body PET imaging, Phys. Med. Biol., 47, 1143– 1160,2002. 27. Acton, P.D., Pilowsky, L.S., Kung, H.F., and Ell, P.J., Automatic segmentation of dynamic neuroreceptor single-photon emission tomography images using fuzzy clustering, Eur. J. Nucl. Med., 26, 581–590, 1999. 28. Schmid, P., Segmentation of digitized dermatoscopic images by two-dimensional color clustering, IEEE Trans. Medical Imaging, 18, 164–171, 1999. 29. Vannier, M.W., Butterfield, R.L., Rickman, D.L., Jordan, D.M., Murphy, W.A., and Biondetti, P.R., Multispectral magnetic resonance image analysis, Crit. Rev. Biomed. Eng., 15, 117–144, 1987. 30. Wu, H.-S., Gil, J., and Barba, J., Optimal segmentation of cell images, Vision, Image Signal Process., IEEE Proc., 145, 50–56, 1998. 31. Spyridonos, P., Ravazoula, P., Cavouras, D., Berberidis, K., and Nikiforidis, G., Computerbased grading of haematoxylin-eosin stained tissue sections of urinary bladder carcinomas, Med. Inform. Internet Med., 26, 179–190, 2001. 32. Chabat, R, G.Z., Y, and Hansell, D.M., Obstructive lung diseases: texture classifica- tion for differentiation at CT, Radiology, 228, 871–877, 2003. 33. Zhang, X., Broschat, S.L., and Flynn, P.J., A comparison of material classification techniques for ultrasound inverse imaging, J. Acoust. Soc. Am., 111, 457–467, 2002. 34. Christodoulou, C.I., Pattichis, C.S., Pantziaris, M, and Nicolaides, A., Texture-based classification of atherosclerotic carotid plaques, IEEE Trans; Medical Imaging, 22, 902–912, 2003.
Biomedical-image classification method and techniques
179
35. Jafari-Khouzani, K. and Soltanian-Zadeh, H., Multiwavelet grading of pathological images of prostate, IEEE Trans. Biomed. Eng., 50, 697–704, 2003. 36. Loukas, C.G., Wilson, G.D., Vojnovic, B., and Linney, A., Tumor hypoxia and blood vessel detection: an image-analysis technique for simultaneous tumor hypoxia grading and blood vessel detection in tissue sections, Ann. NYAcad. Sci., 980,125–138, 2002. 37. Kamber, M., Shinghal, R., Collins, D.L., Francis, G.S., and Evans, A.C., Model-based 3-D segmentation of multiple sclerosis lesions in magnetic resonance brain images, IEEE Trans. Medical Imaging, 14, 442–453, 1995. 38. Li, H.D., Kallergi, M., Clarke, L.P, Jain, V.K., and Clark, R.A., Markov random field for tumor detection in digital mammography, IEEE Trans. Medical Imaging, 14, 565–576, 1995. 39. Mirzai, A.R. et al., Eds., Artificial Intelligence: Concepts and Applications in Engineering, MIT Press, Cambridge, MA, 1990. 40. Quinlan, J.R., Induction of decision trees, Machine Learning, 1, 81–106, 1986. 41. Aleynikov, S. and Micheli-Tzanakou, E., Classification of retinal damage by neural networkbased system, J. Medical Systems, 22, 129–136, 1998. 42. Amartur, S.C., Piraino, D., and Takefuji, Y., Optimisation neural networks for the segmentation of magnetic resonance images, IEEE Trans. Medical Imaging, 11, 215–220, 1992. 43. Binder, M., Kittler, H., Seeber, A., Steiner, A., Pehamberger, H., and Wolff, K., Epiluminescence microscopy-based classification of pigmented skin lesions using computerized image analysis and artificial neural network, Melanoma Res., 8, 261–266, 1998. 44. Cagnoni, S., Coppini, G., Rucci, M., Caramella, D., and Valli, G., Neural network segmentation of magnetic resonance spin echo images of the brain, J. Biomed. Eng., 15, 355–362, 1993. 45. Gebbinck, M.S., Verhoeven, J.T., Thijssen, J.M., and Schouten, T.E., Application of neural networks for the classification of diffuse liver disease by quantitative echography, Ultrasonic Imaging, 15, 205–217, 1993. 46. Özkan, M., Dawant, B.M., and Miciunas, R.J., Neural network-based segmentation of multimodal medical images: a comparative and prospective study, IEEE Trans. Medical Imaging, 12, 534–544, 1993. 47. Pantazopoulos, D., Karakitsos, P., Iokim-Liossi, A., Pouliakis, A., Botsoli-Stergiou, E., and Dimopoulos, C., Back propagation neural network in the discrimination of benign from malignant lower urinary tract lesions, J. Urol., 159, 1619–1623, 1998. 48. Sujana, H., Swarnamani, S., and Suresh, S., Application of artificial neural networks for the classification of liver lesions by texture parameters, Ultrasound Medicine Biol., 22, 1177–1181, 1996. 49. Tourassi, G.D., Tourassi, G.D., and Floyd, C.E., Jr., Lesion size quantification in spect using an artificial neural network classification approach, Comput. Biomed. Res., 28, 257–270, 1995. 50. Tsujii, O., Freedman, M.T., and Mun, S.K., Automated segmentation of anatomic regions in chest radiographs using an adaptive-sized hybrid neural network, Medical Phys., 25, 998–1007, 1998. 51. Worth, A.J., Lehar, S., and Kennedy, D.M., A recurrent cooperative/competitive field for segmentation of magnetic resonance brain images, IEEE Trans. Knowledge Data Eng., 4, 156– 161, 1992. 52. Zijdenbos, A.P., Dawant, B.M., Margolin, R.A., and Palmer, A.C., Morphometric analysis of white matter lesions in MR images: method and validation, IEEE Trans. Medical Imaging, 13, 716–724, 1994. 53. Albrecht, A., Hein, E., Steinhofel, K., Taupitz, M., and Wong, C.K., Bounded-depth threshold circuits for computer-assisted CT image classification, Artif. Intelligence Medicine, 24, 179– 192, 2002. 54. Bowd, C, Chan, K., Zangwill, L.M., Goldbaum, M.H., Lee, T.W., Sejnowski, T.J., and Weinreb, R.N., Comparing neural networks and linear discriminant functions for glaucoma detection using confocal scanning laser ophthalmoscopy of the optic disc, Invest. Ophthalmol. Vis. Sci., 43, 3444–3454, 2002.
Medical image analysis method
180
55. Chen, Y.-T., Cheng, K.-S., and Liu, J.-K., Improving cephalogram analysis through feature subimage extraction, Eng. Medicine Biol Mag., IEEE, 18, 25–31, 1999. 56. Feleppa, E.J., Fair, W.R., Liu, T., Kalisz, A., Balaji, K.C., Porter, C.R., Tsai, H., Reuter, V., Gnadt, W, and Miltner, M.J., Three-dimensional ultrasound analyses of the prostate, Mol. Urol., 4, 133–139, 2000. 57. Handels, H., Ross, T., Kreusch, J., Wolff, H.H., and Poppl, S.J., Computer-supported diagnosis of melanoma in profilometry, Methods Inf. Med., 38, 43–49, 1999. 58. Polakowski, WE., Cournoyer, D.A., Rogers, S.K., DeSimio, M.P., Ruck, D.W, Hoffmeister, J.W., and Raines, R.A., Computer-aided breast cancer detection and diagnosis of masses using difference of Gaussian and derivative-based feature saliency, IEEE Trans. Medical Imaging, 16, 811–819, 1997. 59. Yi, W.J., Park, K.S., and Paick, J.S., Morphological classification of sperm heads using artificial neural networks, Medinfo, 9, 1071–1074, 1998. 60. Oh, S.-K., Pedrycz, W, and Park, H.-S., Self-organising networks in modelling experimental data in software engineering, Comput. Digital Tech., IEEE Proc., 149, 61–78, 2002. 61. Dhawan, A.P., Chitre, Y, and Kaiser-Bonasso, C., Analysis of mammographic microcalcifications using gray-level image structure features, IEEE Trans. Medical Imaging, 15, 246–259, 1996. 62. Grossberg, S. and Williamson, J.R., A self-organizing neural system for learning to recognize textured scenes, Vision Res., 39, 1385–1406, 1999. 63. Nekovei, R. and Sun, Y, Back propagation network and its configuration for blood vessel detection in angiograms, IEEE Trans. Neural Networks, 6, 64–72, 1995. 64. Seker, H., Odetayo, M.O., Petrovic, D., and Naguib, R.N.G., A fuzzy logic-based method for prognostic decision making in breast and prostate cancers, IEEE Trans. Inf. Technol. Biomedicine, 7, 114–122, 2003. 65. Smith, M.R. and Hui, Y, A data extrapolation algorithm using a complex domain neural network, IEEE Trans. Circuits Syst. II: Analog Digital Signal Process., 44, 143–147, 1997. 66. Stanley, R.J. and Long, R., A radius of curvature-based approach to cervical spine vertebra image analysis, Biomed. Sci. Instrum., 37, 385–390, 2001. 67. Hall, L.O., Bensaid, A.M., Clarke, L.P., Velthuizen, R.P., Silbiger, M.S., and Bezdek, J.C., A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain, IEEE Trans. Neural Networks, 3, 672–682, 1992. 68. Axelson, D., Bakken, I.J., Susann Gribbestad, I., Ehrnholm, B., Nilsen, G., and Aasly, J., Applications of neural network analyses to in vivo 1H magnetic resonance spectroscopy of Parkinson disease patients, J. Magn. Resonance Imaging, 16, 13–20, 2002. 69. Chuang, K.-H., Chiu, M.-J., Lin, C.-C, and Chen, J.-H., Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy c-means, IEEE Trans. Medical Imaging, 18, 1117–1128, 1999. 70. Comtat, C. and Morel, C., Approximate reconstruction of PET data with a selforganizing neural network, IEEE Trans. Neural Networks, 6, 783–789, 1995. 71. Coppini, G., Tamburini, E., L’Abbate, A., and Valli, G., Assessment of regions at risk from coronary X-ray imaging by Kohonen’s map, Comput. Cardiol, 1995, 757–760, 1995. 72. Hammond, P., Hutton, T.J., Nelson-Moon, Z.L., Hunt, N.P., and Madgwick, A.J., Classifying vertical facial deformity using supervised and unsupervised learning, Methods Inf. Med., 40, 365–372, 2001. 73. Manhaeghe, C., Lemahieu, I., Vogelaers, D., and Colardyn, R, Automatic initial estimation of the left ventricular myocardial midwall in emission tomograms using Kohonen maps, IEEE Trans. Pattern Anal. Machine Intelligence, 16, 259–266, 1994. 74. Pascual, A., Barcena, M., Merelo, J.J., and Carazo, J.M., Mapping and fuzzy classification of macromolecular images using self-organizing neural networks, Ultramicroscopy, 84, 85–99, 2000.
Biomedical-image classification method and techniques
181
75. Reddick, W.E., Glass, J.O., Cook, E.N., Elkin, T.D., and Deaton, R.J., Automated segmentation and classification of multispectral magnetic resonance images of brain using artificial neural networks, IEEE Trans. Medical Imaging, 16, 911–918, 1997. 76. Cheng, K.-S., Lin, J.-S., and Mao, C.-W., The application of competitive Hopfield neural network to medical-image segmentation, IEEE Trans. Medical Imaging, 15, 560–567, 1996. 77. Gopal, S.S. and Hebert, T.J., Prereconstruction restoration of SPECT projection images by a neural network, IEEE Trans. Nuclear Sci., 41, 1620–1625, 1994. 78. Koss, J.E., Newman, F.D., Johnson, T.K., and Kirch, D.L., Abdominal organ segmen- tation using texture transforms and a Hopfield neural network, IEEE Trans. Medical Imaging, 18, 640–648, 1999. 79. Lin, J.S., Cheng, K.S., and Mao, C.W., Multispectral magnetic resonance images segmentation using fuzzy Hopfield neural network, Int. J. Biomed. Comput., 42, 205–214, 1996. 80. Lin, J.-S., Cheng, K.-S., and Mao, C.-W., A fuzzy Hopfield neural network for medical image segmentation, IEEE Trans. Nucl Sci., 43, 2389–2398, 1996. 81. Sammouda, R., Niki, N., and Nishitani, H., A comparison of Hopfield neural network and Boltzmann machine in segmenting MR images of the brain, IEEE Trans. Nucl. Sci., 43, 3361– 3369, 1996. 82. Tsai, C.-T., Sun, Y.-N., and Chung, P.-C., Minimising the energy of active contour model using a Hopfield network, Comput. Digital Tech., IEEE Proc., 140, 297–303, 1993. 83. Wang, Y. and Wahl, P.M., Multiobjective neural network for image reconstruction, Vision, Image Signal Process., IEEE Proc., 144, 233–236, 1997. 84. Zhu, Y. and Yan, Z., Computerized tumor boundary detection using a Hopfield neural network, IEEE Trans. Medical Imaging, 16, 55–67, 1997. 85. Zurada, J.M., Introduction to Artificial Neural Systems, West Publishing, St. Paul, MN, 1992. 86. Egmont-Peterson, M., de Ridder, D., and Handels, H., Image processing with neural network: a review, Pattern Recognition, 35, 2279–2301, 2002. 87. Hummel, R.A. and Zucker, S.W., On the foundation of relaxation labeling processes, IEEE Trans. Pattern Anal. Machine Intelligence, 5, 259–288, 1983. 88. Peleg, S., A new probabilistic relaxation scheme, IEEE Trans. Pattern Anal Machine Intelligence, 2, 362–369, 1980. 89. Geman, S. and Geman, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Machine Intelligence, 6, 721–741, 1986. 90. Hansen, M.W. and Higgins, W.E., Relaxation methods for supervised image segmen- tation, IEEE Trans. Pattern Anal. Machine Intelligence, 19, 949–962, 1997. 91. Wang, Y., Adali, T., Xuan, J., and Szabo, Z., Magnetic resonance image analysis by information theoretic criteria and stochastic site models, IEEE Trans. Inf. Technol. Biomedicine, 5, 150–158, 2001. 92. Hokland, J.H. and Kelly, P.A., Markov models of specular and diffuse scattering in restoration of medical ultrasound images, IEEE Trans. Ultrasonics, Ferroelectrics Frequency Control, 43, 660–669, 1996. 93. Johnston, B., Atkins, M.S., Mackiewich, B., and Anderson, M., Segmentation of multiple sclerosis lesions in intensity corrected multispectral MRI, IEEE Trans. Medical Imaging, 15, 154–169, 1996. 94. Rueckert, D., Burger, P., Forbat, S.M., Mohiaddin, R.D., and Yang, G.Z., Automatic tracking of the aorta in cardiovascular MR images using deformable models, IEEE Trans. Medical Imaging, 16, 581–590, 1997. 95. Vapnik, V., Statistical Learning Theory, Wiley, New York, 1998. 96. Cristiannini, N. and Shawe-Taylor, J., Support-vector machines and other kernelbased learning methods, CUP, 2000. 97. El-Naqa, L, Yang, Y, Wernick, M.N., Galatsanos, N.P., and Nishikawa, R.M., A support-vector machine approach for detection of microcalcifications, IEEE Trans. Medical Imaging, 21, 1552–1563, 2002.
Medical image analysis method
182
98. Chang, R.R, Wu, W.J., Moon, W.K., Chou, Y.H., and Chen, D.R., Support-vector machines for diagnosis of breast tumors on US images, Acad. Radial., 10, 189–197, 2003. 99. Bhanu Prakash, K.N., Ramakrishnan, A.G., Suresh, S., and Chow, T.W.P., Fetal lung maturity analysis using ultrasound image features, IEEE Trans. Inf. Technol. Biomed., 6, 38−45, 2002. 100. Gokturk, S.B., Tomasi, C, Acar, B., Beaulieu, C.F., Paik, D.S., Jeffrey, R.B.J., Yee, J., and Napel, S., A statistical 3-D pattern processing method for computer-aided detection of polyps in CT colonography, IEEE Trans. Medical Imaging, 20, 1251–1260, 2001. 101. Chan, K., Lee, T.W, Sample, P.A., Goldbaum, M.H., Weinreb, R.N., and Sejnowski, T.J., Comparison of machine learning and traditional classifiers in glaucoma diagno- sis, IEEE Trans. Biomed. Eng., 49, 963–974, 2002. 102. Segal, N.H., Pavlidis, P., Noble, W.S., Antonescu, C.R., Viale, A., Wesley, U.V., Busam, K., Gallardo, H., DeSantis, D., Brennan, M.F., Cordon-Cardo, C., Wolchok, J.D., and Houghton, A.N., Classification of clear-cell sarcoma as a subtype of mel- anoma by genomic profiling, J. Clin. Oncol, 21, 1775–1781, 2003. 103. Schölkopf, B., Smola, A., and Müller, K.-R., Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10, 1299–1319, 1998. 104. Davatzikos, C., Tao, X., and Shen, D., Hierarchical active shape models, using the wavelet transform, IEEE Trans. Medical Imaging, 22, 414–423, 2003. 105. Kaus, M.R., Pekar, V., Lorenz, C., Truyen, R., Lobregt, S., and Weese, J., Automated 3-D PDM construction from segmented images using deformable models, IEEE Trans. Medical Imaging, 22, 1005–1013, 2003. 106. Dehmeshki, J., Barker, G.J., and Tofts, P.S., Classification of disease subgroup and correlation with disease severity using magnetic resonance imaging whole-brain histograms: application to magnetization transfer ratios and multiple sclerosis, IEEE Trans. Medical Imaging, 21, 320– 331, 2002. 107. Nyui, Y., Ogawa, K., and Kunieda, E., Visualization of intracranial arteriovenous malformation with physiological information, IEEE Trans. Nucl. Sci., 48, 855–858, 2001. 108. Soltanian-Zadeh, H., Windham, J.R, Peck, D.J., and Yagle, A.E., A comparative analysis of several transformations for enhancement and segmentation of magnetic resonance image scene sequences, IEEE Trans. Medical Imaging, 11, 302–318, 1992. 109. Andresen, P.R., Bookstein, F.L., Couradsen, K., Ersboll, B.K., Marsh, J.L., and Kreiborg, S., Surface-bounded growth modeling applied to human mandibles, IEEE Trans. Medical Imaging, 19, 1053–1063, 2000. 110. Hyvarinen, A., Karhunen, J., and Oja, E., Independent Component Analysis, John Wiley & Sons, New York, 2001. 111. Horn, R.A. and Johnson, Ch.R., Matrix analysis, Cambridge University Press, 1985. 112. Bell, J. and Sejnowski, T.J., An information-maximization approach to blind separa- tion and blind deconvolution, Neural Computation, 7, 1129–1159, 1995. 113. Jung, T.-P, Makeig, S., McKeown, M.J., Bell, A.J., Lee, T.-W., and Sejnowski, T.J., Imaging brain dynamics using independent component analysis, Proc. IEEE, 89, 1107–1122,2001. 114. Beckmann, C.F. and Smith, S.M., Probabilistic independent component analysis for functional magnetic resonance imaging, IEEE Trans. Medical Imaging, 23, 137–152, 2004. 115. Muraki, S., Nakai, T., Kita, Y, and Tsuda, K., An attempt for coloring multichannel MR imaging data, IEEE Trans. Visualization Comput. Graphics, 7, 265–274, 2001. 116. Martoglio, A.M., Miskin, J.W., Smith, S.K., and MacKay, D.J., A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer, Bioinformatics, 18, 1617–1624, 2002. 117. Haykin, S., Neural Networks: a Comprehensive Foundation, Prentice Hall, New York, 1998. 118. Breiman, L., Bagging predictors, Machine Learning, 26, 123–140,1996. 119. Gefen, S., Tretiak, O.J., Piccoli, C.W., Donohue, K.D., Petropulu, A.P., Shankar, P.M., Dumane, V.A., Huang, L., Kutay, M.A., Genis, V., Forsberg, F., Reid, J.M., and Goldberg,
Biomedical-image classification method and techniques
183
B.B., ROC analysis of ultrasound tissue characterization classifiers for breast cancer diagnosis, IEEE Trans. Medical Imaging, 22, 170–177, 2003. 120. Freund, Y. and Schapire, R.E., A decision-theoretic generalization of online learning and an application to boosting, Computational Learning Theory: Second European Conference, EuroCOLT ’95, Springer-Verlag, Berlin, 1995, pp. 23–37. 121. Androutsopoulos, I., Koutsias, J., Chandrinos, K., and Spyropoulos, C., An experi- mental comparison of naive Bayesian and keyword-based antispam lettering with personal email messages, in Proc. 23rd Ann Int. ACM SIGIR Conf. Res. Dev. Inf. Retrieval, 2000, pp. 160–167. 122. Mehrubeoglu, M., Kehtarnavaz, N., Marquez, G., Duvic, M., and Wang, L.V., Skin lesion classification using oblique-incidence diffuse reflectance spectroscopic imag- ing, Appl. Opt., 41, 182–192, 2002. 123. Rätsch, G., Schölkopf, B., Mika, S., and Müller, K.R., SVM and Boosting: One Class, GMD FIRST, Berlin, 2000, p. TR 119. 124. Hothorn, T. and Lausen, B., Bagging tree classifiers for laser scanning images: a dataand simulation-based strategy, Artif. Intelligence Medicine, 27, 65–79, 2003. 125. Kustra, R. and Strother, S., Penalized discriminant analysis of 15O-water PET brain images with prediction error selection of smoothness and regularization hyperparameters, IEEE Trans. Medical Imaging, 20, 376–387, 2001. 126. Blake, A. and Isard, M., Active Contours, Springer, London, 1998. 127. Sullivan, J., Blake, A., Isard, M., and MacCormick, J., Bayesian object localisation in images, Int. J. Comput. Vision, 44, 111–135, 2001. 128. Arulampalam, M.S., Maskell, S., Gordon, N., and Clapp, T., A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Trans. Signal Process., 50, 174–188, 2002. 129. Doucet, A., de Freitas, N., and Gordon, N., Eds., Sequential Monte Carlo methods in practice, in Statistics for Engineering and Information Science, Springer-Verlag, New York, 2002. 130. Isard, M. and Blake, A., Condensation: conditional density propagation for visual tracking, Int. J. Comput. Vision, 29, 5–28, 1998. 131. Isard, M. and Blake, A., Condensation: unifying low-level and high-level tracking in a stochastic framework, in Proc. European Conference on Computer Vision, Vol. 1, 1998, pp. 893–908. 132. Isard, M. and MacCormick, J., Hand Tracking for Vision-Based Drawing, Technical Report, Visual Dynamics Research Group, University of Oxford, London, 2000. 133. Nummiaro, K., Koller-Meier, E., and Van Gool, L., An adaptive color-based particle filter, Image Vision Computing, 21, 99–110, 2003. 134. Isard, M. and MacCormick, J., Bramble: a Bayesian multiple blob tracker, in IEEE Int. Conf. Comput. Vision, Vol. 2, IEEE Computer Society, Los Alamos, CA, 2001, pp. 34–41. 135. Tao, H., Sawhney, H., and Kumar, R., A sampling algorithm for tracking multiple objects, in Proc. IEEE Workshop on Vision Algorithms, Corfu, Greece, 1999, pp. 53–68. 136. Tweed, D. and Calway, A., Tracking many objects using subordinated condensation, in Br. Machine Vision Conf. Proc., 2002, pp. 283–292. 137. DiMaio, S.P. and Salcudean, S.E., Needle insertion modeling and simulation, IEEE Trans. Robotics Automation, 19, 864–875, 2003. 138. Magee, D., Bulpitt, A., and Berry, E., 3-D automated segmentation and structural analysis of vascular trees using deformable models, Proc. IEEE Workshop on Variational and Level Set Methods in Computer Vision, 2001, pp. 119–126. 139. Rubinstein, R.Y., The cross-entropy method for combinatorial and continuous opti- misation, methodology and computing, Appl. Probab., 1, 127–190, 1999. 140. Muhlenbein, H., Bendisch, J., and Voigt, H.M., From recombination of genes to estimation of distributions: 1, Binary parameters, in Proc. First Int. Conf. Parallel Problem Solving from Nature, Springer-Verlag, Berlin, 1996, pp. 178–187.
Medical image analysis method
184
141. Zlochin, M., Birattari, M., Meuleau, N., and Dorigo, M., Model-Base Search for Combinatorial Optimization, TR/IRIDIA/2001–15, IRIDIA, Université Libre de Bruxelles, Belgium, 2001.
5 Texture Characterization Using Autoregressive Models with Application to Medical Imaging Sarah Lee and Tania Stathaki 5.1 INTRODUCTION In this chapter, we introduce texture characterization using autoregressive (AR) models and demonstrate its potential use in medical-image analysis. The one-dimen- sional AR modeling technique has been used extensively for one-dimensional biomedical signals, and some examples are given in Section 5.1.1. For two-dimensional biomedical signals, the idea of applying the two-dimensional AR modeling technique has not been explored, as only a couple of examples can be found in the literature, as shown in Section 5.1.2. In the following sections, we concentrate on a two-dimensional AR modeling technique whose results can be used to describe textured surfaces in images under the assumption that every distinct texture can be represented by a different set of twodimensional AR model coefficients. The conventional Yule-Walker system of equations is one of the most widely used methods for solving AR model coefficients, and the variances of the estimated coefficients obtained from a large number of realizations, i.e., simulations using the output of a same set of AR model coefficients but randomly generated driving input, are sufficiently low. However, estimations fail when large external noise is added onto the system; if the noise is Gaussian, we are tempted to work in the third-order statistical domain, where the third-order moments are employed, and therefore the external Gaussian noise can be eliminated [1, 2]. This method leads to higher variances from the estimated AR model coefficients obtained from a number of realizations. We propose three methods for estimation of two-dimensional AR model coefficients. The first method relates the extended Yule-Walker system of equations in the third-order statistical domain to the YuleWalker system of equations in the secondorder statistical domain through a con- strained-optimization formulation with equality constraints. The second and third methods use inequality constraints instead. The textured areas of the images are thus characterized by sets of the estimated AR model coefficients instead of the original intensities. Areas with a distinct texture can be divided into a number of blocks, and a set of AR model coefficients is estimated for each block. A clustering technique is then applied to these sets, and a weighting scheme is used to obtain the final estimation. The proposed AR modeling method is also applied to mammography to compare the AR model coefficients of the block of problematic area with the coefficients of its neighborhood blocks. The structure of this chapter is as follows. In Section 5.2 the two-dimensional AR model is revisited, and Section 5.3 describes one of the conventional methods, the Yule-
Texture characterization using autoregressive models
187
Walker system of equations. Another conventional method, the extended Yule-Walker system of equations in the third-order statistical domain, is explained in Section 5.4. The proposed methods—constrained-optimization formulation with equality constraints and constrained-optimization formulations with inequality con- straints—are covered in Sections 5.5 and 5.6, respectively. In Section 5.7, two clustering techniques—minimum hierarchical clustering scheme and k-means algo- rithm—are applied to a number of sets of AR model coefficients estimated from an image with a single texture. In Section 5.8, the two-dimensional AR modeling technique is applied to the texture characterization of mammography. A relationship is established between the AR model coefficients obtained from the block containing a tumor and its neighborhood blocks. The summary and conclusion can be found in Section 5.9. 5.1.1 ONE-DIMENSIONAL AUTOREGRESSIVE MODELING FOR BlOMEDICAL SIGNALS The output x[m] of the one-dimensional autoregressive (AR) can be written mathematically [3] as (5.1) where a[i] is the AR model coefficient, p is the order of the model, and u[m] is the driving input. AR modeling is among a number of signal-processing techniques that have been applied to biomedical signals, including the fast Fourier transform (FFT) used for frequency analysis; linear, adaptive, and morphological filters; and others [3]. Some examples are given here. According to Bloem and Arzbaecher [4], the one-dimen- sional AR modeling technique is applied to discriminate atrial arrhythmias based on the fact that AR modeling of organized cardiac rhythm produces residuals that are dominated by the impulse. On the other hand, atrial fibrillation shows a residual containing decorrelated noise. Apart from the cardiac rhythms, the AR modeling technique has been applied to apnea detection and to estimation of respiration rate [5]. Respiration signals are assumed to be one-dimensional second-order AR signals, i.e., p=2 in Equation 5.1. Effective classification of different respiratory states and accurate detection of apnea are obtained from the functions of estimated AR model coefficients [5]. In addition, the AR modeling method is applied to heart rate (HR) variability analysis [6], whose purpose is to study the interaction between the autonomic nervous system and the heart sinus pacemakers. The long-term HR is said to be nonstationary because it has shown strong circadian variations. According to Thonet [6], a time-varying AR (TVAR) model is assumed for HR analysis: “the comparison of the TVAR coefficients significance rate has suggested an increasing linearity of HR signals from control subjects to patients suffering from a ventricular tachyarrhythmia.” The AR modeling technique has also been applied to code and decode the electrocardiogram (ECG) signals over the transmission between an ambulance and a hospital [7]. The AR model coefficients estimated in the higher-order statistical domain are transmitted instead of the real ECG signals. The transmission results were said to be
Medical image analysis method
188
safe and efficient, even in the presence of high noise (17 dB) [7]. According to Palianappan et al. [8], the AR modeling method is also applied to ECG signals, but this time the work was concentrated on estimating the AR model orders from some conventional methods for two different mental tasks: math task and geometric figure rotation. Spectral density functions are derived after the order of the AR model is obtained, and a neural-network technique is applied to assign the tasks into their respective categories [8]. 5.1.2 TWO-DIMENSIONAL AUTOREGRESSIVE MODELING FOR BlOMEDICAL SIGNALS The two-dimensional AR modeling technique has been applied to mammography [2, 9– 11]. Stathaki [2] concentrated on the directionalities of the tissue shown in mammograms, because healthy tissue has specific properties with respect to the directionalities. “There exist decided directions in the observed X-ray images that show the underlying tissue structure as having distinct correlations in some specific direction of the image plane” [2]. Thus, by applying the two-dimensional AR modeling technique to these twodimensional signals, the variations in parameters are crucial in directionality characterization. The AR model coefficients are obtained with the use of blocks of size between 2×2 and 40×40 and different “slices” (vertical, horizontal, or diagonal) (see Section 5.4 for details of slices). The preliminary study of a comparative nature on the subject of selecting cumulant slices in the area of mammography by Stathaki [2] shows that the directionality is destroyed in the area of tumor. The three types of slices used give similar performance, except in the case of [c1,c2]=[1,0]. The estimated AR model parameters tend to converge to a specific value as the size of the window increases [10]. In addition, the greater the calcification, the greater will be the deviation of the texture parameters of the lesions from the norm [2]. 5.2 TWO-DIMENSIONAL AUTOREGRESSIVE MODEL The two-dimensional autoregressive (AR) model is defined [12] as (5.2) where p1×p2 is the AR model order, aij is the AR model coefficient, and u[m,n] is the driving input, which is assumed to have the following properties [2, 13]: 1. u[m,n] is non-Gaussian. 2. Zero mean, i.e., E[u[m,n]}=0, where E{•} is the expectation operation. 3. Second-order white, i.e., the input autocorrelation function is and
Texture characterization using autoregressive models
189
4. At least second-order stationary. The first condition is imposed to enable the use of third-order statistics. A set of stable two-dimensional AR model coefficients can be obtained from two sets of stable onedimensional AR model coefficients. Let a1 be a row vector that represents a set of stable one-dimensional AR model coefficients and a2 be another row vector that represents a set of stable one-dimensional AR model coefficients, a, where a= is a set of stable two-dimensional AR model coefficients and T denotes transposition. When a1 is equal to a2, the two-dimensional AR model coefficients, a, are symmetric [14]. 5.3 YULE-WALKER SYSTEM OF EQUATIONS The Yule-Walker system of equations is revisited for the two-dimensional AR model in this section. The truncated nonsymmetric half-plane (TNSHP) is taken to be the region of support of AR model parameters [12]:
Two examples of TNSHP are shown in Figure 5.1. The shape of the dotted lines indicates the region of support when p1=1 and p2=3, and the shape of the solid lines is for pl=p2=2.
FIGURE 5.1 Examples of the truncated nonsymmetric half-plane region of support (TNSHP) for AR model parameters.
Medical image analysis method
190
The two-dimensional signal x[m,n] given in Equation 5.2 is multiplied by its shifted version, x[m−k,n−l], and under the assumption that all fields are wide sense stationary, the expectation of this multiplication gives us (5.3)
In Equation 5.3, the second-order moment, which is also regarded as “autocor- relation,” is defined as Equation 5.4. m2x[k,l]=E{x[m,n]x[m+k,n+l]} (5.4) Because the region of support of the impulse response is the entire nonsymmetric half plane, by applying the causal and stable filter assumptions we obtain (5.5)
Because h[k,l] is the impulse response of a causal filter, Equation 5.5 becomes
where Because h[0,0] is assumed to be unity, the two-dimensional Yule-Walker equa- tions [12] become (5.6)
For simplicity in our AR model coefficient estimation methods, the region of support is assumed to be a quarter plane (QP), which is a special case of the NSHP. Examples of QP models can be found in Figure 5.2. The shape filled with vertical lines indicates the region of support of QP when p1=2 and p2=3, and the shape filled with horizontal lines is the region of support of QP when p1=p2=1. The Yule-Walker system of equations for a QP model can be written [12] as
Texture characterization using autoregressive models
191
FIGURE 5.2 Examples of two quarter-plane region of supports for the AR parameters. (5.7)
Generalizing Equation 5.7 leads to the equations Mxxal=h (5.8) where Mxx is a matrix of size [(p1+1)(p2+1)]×[(p1+1)(p2+1)], and al and h are both vectors of size [(pl+ l)(p2+1)]×1. More explicitly, Equation 5.8 can be written as (5.9)
where is a vector of size (p2+1)×1 h1=[1,0,…,0]T is a vector of size (p2+1)×1
Medical image analysis method
192
0=[0,0,…,0]T is a vector of size (p2+1)×1
is a matrix of size (p2+1)×(p2+1). An example of the Yule-Walker system of equations for a 1×1 AR model is given below. (5.10)
These
equations
can
be
further
simplified
because
the
variance,
is unknown, and the AR model coefficient a00 is assumed to be 1 in general. The Yule-Walker system of equations can be rewritten as ( 5 . 1 1 ) Let the Yule-Walker system of equations for an AR model with model order pl ×p2 be represented in the matrix form as Ra=−r (5.12) where R is a [(p1+l)(p2+1)−1]×[(pl+1)(p2+1)–1] matrix of autocorrelation samples a is a [(p1+1)(p2+1)–1]×1 vector of unknown AR model coefficients r is a [(p1+1)(p2+1)−1]×1 vector of autocorrelation samples
Texture characterization using autoregressive models
193
5.4 EXTENDED YULE-WALKER SYSTEM OF EQUATIONS IN THE THIRD-ORDER STATISTICAL DOMAIN The Yule-Walker system of equations is able to estimate the AR model coefficients when the power of the external noise is small compared with that of the signal. However, when the external noise becomes larger, the estimated values are influenced by the external noise statistics. These results correspond to the well-known fact that the autocorrelation function (ACF) samples of a signal are sensitive to additive Gaussian noise because the ACF samples of Gaussian noise are nonzero [1, 15]. Estimation of the AR model coefficients using the Yule-Walker system of equations for a signal with large external Gaussian noise is poor, therefore we are forced to work in the third-order statistical domain, where third-order cumulants are employed [2]. Consider the system y[m,n] that is contaminated with external Gaussian noise v[m,n]: y[m,n]=x[m,n]+v[m,n]. The third-order cumulant of a zero-mean twodimensional signal, y[m,n], 1≤m≤M, 1≤n≤N, is estimated [1] by (5.13) The number of terms available is not necessarily the same as the size of the image because of the values k1,l1, k2, and l2. All the pixels outside the range are assumed to be zero. The difference in formulating the Yule-Walker system of equations between the second-order and third-order statistical domain is that in the latter version, we multiply the output of the AR model by two shifted versions instead of just one in the earlier version [1]. The extended Yule-Walker system of equations in the thirdorder statistical domain can be written as shown in Equation 5.14 [11]. (5.14) where γu=E{u3[m,n]} is the skewness of the input driving noise, and a00=1. From the derivation of the above relationship, it is evident that using Equation 5.14 implies that it is unnecessary to know the statistical properties of the external Gaussian noise, because they are eliminated from the equations following the theory that the thirdorder cumulants of Gaussian signals are zero [16]. For a two-dimen- sional AR model with order p1×p2, we need at least a total of (p1+1)(p2+1) equations from Equation 5.14, where k1=0,…, p1 k2=k2 l1=0,…, p2 l2=l1 in order to estimate the [(p1+1)(p2+1) – 1] unknown AR parameters and the skewness of the driving noise, γu. Because we are only interested in estimating the AR model coefficients, we can rewrite Equation 5.13 as follows [2]
Medical image analysis method
194
FIGURE 5.3 Different third-order cumulant slices for a one-dimensional signal. ( 5.1 5) where k1+l1+k2+l2≠0 and k1,l1,k2,l2≥0. In this form, [(p1+1)(p2+1)−1] equations are required to determine the aij parameters (for details, see the literature [17–21]). When the third-order cumulants are used, an implicit and additional degree of freedom is connected with the specific direction chosen for these to be used in the AR model [2]. Such a direction is referred to as a slice in the cumulant plane, as shown on the graph for third-order cumulants for one-dimensional signals in Figure 5.3 [2, 22]. Consider the third-order cumulant slice of a one-dimensional process, y, which can be estimated using C3y(k,l)=E{y(m) y(m+k) y(m+l)} [16]. The diagonal slice indicates that the value of k is the same as the value of l, whereas the vertical slices have a constant k value, and the horizontal slices have a constant l value. The idea can be extended into the third-order cumulants for two-dimensional signals. In Equation 5.13, if k1=l1 and k2=l2, the slice is diagonal; if k1 and l1 remain constant, the slice is vertical; if k2 and 12 are constant, the slice is horizontal. Let us assume that (k2,l2)=(k1+c1, l1+c2), where c1 and c2 are both constants. Then [2] ( 5.
Texture characterization using autoregressive models
195
16 ) By applying the symmetry properties of cumulants we obtain (5.17) Let k=c1+k1 and l=c1+l1. Hence, the equations above take the form [2,10,11] (5.18)
The extended Yule-Walker system of equations in the third-order statistical domain is formed from Equation 5.18, with k =0,…, p1 l=0,…, p2 [k,l]≠[0,0] Thus Equation 5.18 can be written in matrix-vector form as Cyyal=–cyy (5.19) More explicitly, Equation 5.19 can be written as [1, 16, 18–20] (5.20)
where is a vector of size (p2+1)×1 h1=[1,0,…,0]T is a vector of size (p2+1)×1 0=[0,0,…,0]T is a vector of size (p2+1)×1
is a matrix of size (p2+1)×(p2+1) The system in Equation 5.20 can be further simplified, as shown in Section 5.3. Let us take a 1×1 AR model as an example. We apply a diagonal slice, i.e., [c1, c2]=[k–i, l–j]; therefore, we obtain
Medical image analysis method
196
Let us write the system of equations for the model order p1×p2 by Ca=−c (5.21) where C is a [(pl+1)(p2+1)−1]×[(P1+1)(P2+1)−1] matrix of third-order cumulants a is a [(p1+1)(p2+1)−1]×1 vector of unknown AR model coefficients c is a [(p1+1)(p2+1)−1]×1 vector of third-order cumulants In theory, everything seems to work properly. However, in practice, one of the main problems we face when we work in the third-order statistical domain is the large variances that arise from the cumulant estimation [2]. 5.5 CONSTRAINED-OPTIMIZATION FORMULATION WITH EQUALITY CONSTRAINTS A method for estimating two-dimensional AR model coefficients is proposed in this section. The extended Yule-Walker system of equations in the third-order statistical domain is related to the conventional Yule-Walker system of equations through a constrained-optimization formulation with equality constraints [23]. The YuleWalker system of equations is used in the objective function, and we consider most of the extended Yule-Walker system of equations in the third-order statistical domain as the set of constraints. In this work only, the last row of the extended Yule-Walker system of equations in the third-order statistical domain is eliminated. The last row is chosen after some statistical tests were carried out. Eliminating any other rows in this case did not lead to robust estimations. It can be written mathematically [23] as (5.22) subject to Cla=−cl where w=number of rows in matrix R in Equation 5.12 Ri=ith row of the matrix R in Equation 5.12 ri=ith element of the vector r in Equation 5.12 and where Cl is defined as matrix C in Equation 5.21 without the last row, cl is defined as matrix c in Equation 5.21 without the last row, and a is a [(p1+1)(p2+ 1)–1]×1 vector of unknown AR model coefficients. We use sequential quadratic programming [24] to solve Equation 5.22.
Texture characterization using autoregressive models
197
5.5.1 SIMULATION RESULTS Two types of synthetic images of size 256×256 are generated for simulation purpose. The first one is a 2×2 AR symmetric model, which can be expressed as follows. x[m,n]=−0.16x[m−2,n−2]−0.2x[m−2,n−1]−0.4x[m−2,n] −0.2x[m−l,n−2]-0.25x[m−l,n−1]−0.5x[m−l,n] −0.4x[m,n−2]−0.5x [m,n−1]+w [m, n] Another type of synthetic image is created using a set of 2×2 nonsymmetric AR model coefficients and is expressed as x[m,n]=−0.12x[m−2,n−2]−0.15x[m−2,n−l]−0.3x[m−2,n] −0.16x[m−l,n−2]−0.2x[m−l,n−l]−0.4x[m−l,n] −0.4x[m,n−2]−0.5x [m,n−1]+w[m,n] The input driving noise to both systems is zero-mean, exponential-distributed with The final image, y[m,n], is contaminated with external Gaussian variance noise, v[m,n], where y[m,n]=x[m,n]+v[m,n]. The noise has zero mean and unity variance. The signal-to-noise ratio (SNR) of the system is calculated using the following equation (5.23) is the variance of the signal and is the variance of the noise. where The estimation results are evaluated using a relative error measurement defined in the following equation [24] (5.24)
TABLE 5.1 Results from ConstrainedOptimization Formulation with Equality Constraints for Estimation of Two-Dimensional Symmetric AR Model Coefficients SNR=5 dB Parameter Real Value
Estimated Value
SNR=30 db
Variance (10–3)
Estimated Value
Variance (10–3)
a01
0.5
0.4987
0.1913
0.4982
0.05743
a02
0.4
0.4033
0.6382
0.3984
0.08289
a10
0.5
0.5002
0.2259
0.4972
0.04793
a11
0.25
0.2505
0.6006
0.2486
0.07768
a12
0.2
0.2056
1.6108
0.1973
0.08340
Medical image analysis method
198
a20
0.4
0.4019
0.6581
0.3992
0.07907
a21
0.2
0.2052
1.5428
0.1976
0.1058
a22
0.16
0.1670
2.0575
0.1633
0.2712
Relative error
0.08903
0.02788
TABLE 5.2 Results from ConstrainedOptimization Formulation with Equality Constraints for Estimation of Two-Dimensional Nonsymmetric AR Model Coefficients SNR=5 dB Parameter Real Value
Estimated Value
SNR=30 db
Variance (10–3)
Estimated Value
Variance (10–3)
a01
0.5
0.4981
0.1441
0.4986
0.03209
a02
0.4
0.3985
0.4544
0.3988
0.07261
a10
0.5
0.4001
0.1849
0.3967
0.05428
a11
0.25
0.2012
0.2489
0.1991
0.06819
a12
0.2
0.1617
1.0757
0.1567
0.1029
a20
0.4
0.3039
0.4474
0.2984
0.06941
a21
0.2
0.1546
0.8747
0.1458
0.09315
a22
0.16
0.1289
1.1657
0.1279
0.2361
Relative error
0.08362
0.03629
where âij is the estimated AR model coefficient, aij is the original AR model coef- ficient, and p1×p2 is the AR model order. The simulation results obtained from 100 realizations can be found in Table 5.1 for the symmetric model and in Table 5.2 for the nonsymmetric model. In Table 5.1, the simulation results show that the proposed method is able to estimate symmetric AR model coefficients in both low- and high-SNR systems. The variances for the 100 realizations are small, particularly in the case of high-SNR system. Similar performance is obtained when the method is applied to the nonsymmetric AR model. 5.6 CONSTRAINED OPTIMIZATION WITH INEQUALITY CONSTRAINTS Based on the constrained optimization with equality constraints method, two meth- ods that use both the Yule-Walker system of equations and the extended Yule-Walker system
Texture characterization using autoregressive models
199
of equations in the third-order statistical domain are proposed through con- strainedoptimization formulations with inequality constraints. Mathematically, it can be written as
subject to −ε≤Ca+c≤ε (5.25) where w=number of rows in matrix R in Equation 5.12 Ri=ith row of the matrix R in Equation 5.12 ri=ith element of the vector r in Equation 5.12 a=a [(pl+1)(p2+1)−1]×1 vector of unknown AR model coefficients and where C and c are as derived in Equation 5.21 and ε is defined as shown below. Inequality constraints are introduced with an additional vector, ε. Two methods for estimating ε are proposed here, and both are related to the average difference between the estimated AR model coefficients of each block and the average AR model coefficients of all the blocks. We use sequential quadratic programming [24] to solve Equation 5.25. 5.6.1 CONSTRAINED-OPTIMIZATION FORMULATION WITH INEQUALITY CONSTRAINTS 1 Based on Equation 5.25, the constrained-optimization formulation with inequality constraints 1 can be implemented using the following steps [25]: 1.Divide the image into a number of blocks with a fixed size, z1×z2, so that B1×B2 blocks can be obtained. 2.Estimate the AR model coefficients of each block using the extended YuleWalker system of equations in the third-order statistical domain in Equa- tion 5.21. 3. From all of the AR model coefficient sets obtained, calculate the average AR model coefficients, aA, [i, j]≠[0,0]. 4. The ε value is calculated using the following equation.
(5 .2 6) where B1×B2 is the number of blocks available, (b1,b2) is the block index, is the matrix C in Equation 5.21 for the block (b1, b2), is the vector c in Equation 5.21 for the block (b1, b2), and sum indicates the sum- mation of all the
Medical image analysis method
200
items in a vector. The vector, ε, is defined as ε=[ε,…,ε]T, which is a [(p1+1)(p2+1)−1]×1 vector. 5 Apply Equation 5.25 to obtain the AR model coefficient estimation. 5.6.2 CONSTRAINED-OPTIMIZATION FORMULATION WITH INEQUALITY CONSTRAINTS 2 Constrained optimization with inequality constraints 2 is almost the same as the first method, except that for each coefficient an E value is generated [26]. In Step 4, (5.27) where b1=1,…, B1 b2=1,…, B2 B1×B2 is the number of blocks available is a [(p1+1)(p2+1)−1]×1 vector (5.28) where
(i×p1+j) is the (i×p1+j)-th value of the vector
The vector, ε, is defined [(p1+l)(p2+1)−1]×1 vector.
as
which
is
a
5.6.3 SIMULATION RESULTS As shown in Section 5.1, the constrained-optimization formulations with inequality constraints are applied to the output—y[m,n], 1≤m≤256, 1≤n≤256—of both the twodimensional symmetric and nonsymmetric AR models shown below, respec- tively. x[m,n]=−0.16x[m−2,n−2]−0.2x[m−2,n−l]−0.4x[m−2,n] −0.2x[m−1,n−2]−0.25x[m−1,n−1]−0.5x[m−1,n] −0.4x[m,n−2]−0.5x [m,n−1]+w[m,n] and
Texture characterization using autoregressive models
201
TABLE 5.3 Results from ConstrainedOptimization Formulation with Inequality Constraints 1 for Estimation of TwoDimensional Symmetric AR Model Coefficients SNR=5 dB
SNR=30 db
Parameter Real Value
Estimated Value
Variance (10–4)
Estimated Value
Variance (10–4)
a01
0.5
0.5010
0.2163
0.4996
0.05580
a02
0.4
0.3953
0.6608
0.3988
0.06677
a10
0.5
0.4970
0.2482
0.4975
0.05795
a11
0.25
0.2451
0.5459
0.2487
0.05670
a12
0.2
0.2104
1.3664
0.2001
0.08460
a20
0.4
0.3966
0.6276
0.3990
0.9472
a21
0.2
0.1951
1.2547
0.2003
0.1038
a22
0.16
0.1852
3.7670
0.1630
0.1767
Relative error
0.03136
0.004137
x[m, n]=−0.12x[m−2,n−2]−0.15x[m−2,n−1]−0.3x[m−2,n] −0.16x[m−1, n−2]−0.2x[m−1, n−1]−0.4x[m−1, n] −0.4x[m, n−2]−0.5x[m, n−1]+w[m,n]. The output y[m, n]=x[m, n]+v[m, n], where v[m, n] is the additive Gaussian noise with zero mean and unity variance. The results obtained using two different types of ε values are shown in the following tables. For the symmetric model, the results obtained from 100 realizations for the constrained-optimization formulation with inequality constraints 1 can be found in Table 5.3, and the results from the same formulation with inequality constraints 2 can be found in Table 5.4 and Table 5.5 for SNR equal to 5 and 30 dB, respectively. For the nonsymmetric model, the results can be found in Table 5.6, Table 5.7, and Table 5.8 in the same order as for the symmetric model. The ε values of the constrained-optimization formulation with inequality constraints 1 is 9.0759 ×10–4 for the case of SNR equal to 5 dB and 6.8434×10−5 for the case of SNR equal to 30 dB for the symmetric model. For the nonsymmetric model, the equivalent values are 8.2731×10−4 and 5.9125×10−5. The average ε values for each coefficient are also shown in the tables for both methods with constraint optimization with inequality constraints 2 (Table 5.4 and Table 5.5 for the symmetric model and Table 5.7 and Table 5.8 for the nonsymmetric model). From Table 5.3 and Table 5.6, the AR model coefficients—estimated for symmetric and nonsymmetric models, respectively, using the constrained-optimiza- tion formulation with inequality constraints 1—show high accuracy, as evidenced
Medical image analysis method
202
TABLE 5.4 Results from ConstrainedOptimization Formulation with Inequality Constraints 2 for Estimation of TwoDimensional Symmetric AR Model Coefficients, SNR=5 dB Parameter
Real Value
Estimated Value
Variance (10−3)
Average ε (10−3)
a01
0.5
0.5044
0.2347
0.7625
a02
0.4
0.4017
0.7948
0.9159
a10
0.5
0.4087
0.1773
0.7403
a11
0.25
0.2493
0.4205
0.8332
a12
0.2
0.2183
1.5445
0.7781
a20
0.4
0.3981
0.6508
0.8602
a21
0.2
0.2011
1.2485
0.9326
a22
0.16
0.1924
4.4217
1.0811
Relative error
0.03581
TABLE 5.5 Results from ConstrainedOptimization Formulation with Inequality Constraints 2 for Estimation of TwoDimensional Symmetric AR Model Coefficients, SNR=30 dB Parameter
Real Value
Estimated Value
Variance (10–3)
Average ε (10–3)
a01
0.5
0.4997
0.04016
0.1342
a02
0.4
0.3996
0.08402
0.1501
a10
0.5
0.4970
0.04693
0.1334
a11
0.25
0.2474
0.05505
0.1458
a12
0.2
0.1978
0.1291
0.1388
a20
0.4
0.3974
0.09040
0.1535
a21
0.2
0.1974
0.07485
0.1471
a22
0.16
0.1605
0.1453
0.1676
Relative error
0.005722
by the small relative error in both low- and high-SNR systems. In Table 5.4 and Table 5.7, the estimated results for the constrained-optimization formulation (with inequality
Texture characterization using autoregressive models
203
constraints 2 and a 5-dB SNR for both the symmetric and nonsymmetric AR models) are very close to the original AR model coefficient values except for
TABLE 5.6 Results from ConstrainedOptimization Formulation with Inequality Constraints 1 for Estimation of TwoDimensional Nonsymmetric AR Model Coefficients SNR=5 dB
SNR=30 db
Parameter Real Value
Estimated Value
Variance (10−3)
Estimated Value
Variance (10−3)
a01
0.5
0.5004
0.1899
0.4981
0.04704
a02
0.4
0.4002
0.4406
0.3994
0.08673
a10
0.4
0.3997
0.2003
0.3978
0.04047
a11
0.2
0.2005
0.3900
0.1982
0.05897
a12
0.16
0.1697
0.9674
0.1595
0.08203
a20
0.3
0.3006
0.3426
0.2998
0.05015
a21
0.15
0.1514
0.7107
0.1493
0.07926
a22
0.12
0.1350
1.8185
0.1221
0.1085
Relative error
0.02242
0.005107
TABLE 5.7 Results from ConstrainedOptimization Formulation with Inequality Constraints 2 for Estimation of TwoDimensional Nonsymmetric AR Model Coefficients, SNR=5 dB Parameter
Real Value
Estimated Value
Variance (10−3)
Average ε (10−3)
a01
0.5
0.4986
0.1486
0.4249
a02
0.4
0.3965
0.4471
0.5933
a10
0.4
0.3975
0.2005
0.4561
a11
0.2
0.1961
0.4790
0.4723
a12
0.16
0.1672
1.2616
0.5535
a20
0.3
0.2976
0.3899
0.5625
a21
0.15
0.1459
0.7963
0.5121
a22
0.12
0.1314
2.3533
0.6261
Medical image analysis method
Relative error
204
0.02367
the coefficient a22 (whose variance for the 100 realizations of this coefficient is also greater than other coefficients). In the high-SNR system, as shown in Table 5.5 and Table 5.8 for the symmetric and nonsymmetric AR models, respectively, the relative errors obtained are even smaller than in the low-SNR system, and the average ε value for each coefficient is smaller than in the low-SNR system.
TABLE 5.8 Results from ConstrainedOptimization Formulation with Inequality Constraints 2 for Estimation of TwoDimensional Nonsymmetric AR Model Coefficients, SNR=30 dB Parameter
Real Value
Estimated Value
Variance (10–4)
Average ε (10–3)
a01
0.5
0.4985
0.3714
0.1121
a02
0.4
0.3979
0.6378
0.1443
a10
0.4
0.3966
0.4305
0.1093
a11
0.2
0.1971
0.5739
0.1413
a12
0.16
0.1578
0.9436
0.1301
a20
0.3
0.2970
0.5353
0.1211
a21
0.15
0.1480
0.6240
0.1377
a22
0.12
0.1212
0.8914
0.1465
Relative error
0.008605
5.7 AR MODELING WITH THE APPLICATION OF CLUSTERING TECHNIQUES In Sections 5.3 to 5.6, the AR modeling methods are applied to the entire image. In this section, we divide images into a number of blocks under the assumption that the texture remains the same throughout the entire image. After applying an AR modeling method to each of these blocks, a number of sets of AR model coefficients are obtained, to which we apply a clustering technique and the weighting scheme to determine the final estimation of the AR model coefficients. Two clustering schemes are applied: the minimum hierarchical clustering scheme and the k-means algorithm.
Texture characterization using autoregressive models
205
5.7.1 HIERARCHICAL CLUSTERING SCHEME FOR AR MODELING A hierarchical clustering scheme was proposed by Johnson in 1967 [27]. The inten- tion was to put similar objects from a number of clusters in the same group. The hierarchical clustering scheme uses the agglomerative approach, i.e., it begins with each set of AR model coefficients in a distinct (singleton) cluster and successively merges clusters until the desired number of clusters are obtained or until the stopping criterion is met [27]. The modified minimum hierarchical clustering scheme for two-dimensional AR modeling is explained in the following steps [27, 28]. Let the size of the image be M×N. 1. We divide the image of interest into a number of blocks of size z1×z2. 2. For each block, we estimate a set of AR model coefficients, 1≤m≤S, using the constrained-optimization formulation with inequality constraints
FIGURE 5.4 Distance Matrix for Hierarchical Clustering Scheme 1 in Section 5.6.1. Thus, we obtain S sets of AR model coefficients, where S=(M/z1)×(N/z2). M is divisible by z1, and N is divisible by z2. 3.The minimum hierarchical clustering scheme starts with S clusters, i.e., one set of AR model coefficients in each cluster. 4.We calculate the Euclidean distance between any two clusters using Equa- tion 5.29.
( 5 . 2 9 ) where Bm indicates Block m, m=1,…,S, and Bn indicates Block n, n= 1,…,S. 5.We form a distance matrix using the distances obtained in Step 4. An example of a distance matrix can be found in Figure 5.4. 6. We search for the shortest distance in the distance matrix, i.e., blocks with the greatest similarity, and merge the corresponding blocks into one cluster to form a new distance
Medical image analysis method
206
matrix. The distances between the new cluster and the other clusters need to be recalculated. Because a minimum hierarchical clustering scheme is used, it means that the minimum distance between any member of the new cluster and any member in one of the other clusters is taken as the distance between the new cluster and that cluster. 7.Step 6 is repeated until the desired number of clusters is obtained. 5.7.2 k-MEANS ALGORITHM FOR AR MODELING In addition to the minimum hierarchical clustering scheme, the k-means algorithm is also applied to selecting AR model coefficients obtained from images [25]. Unlike the minimum hierarchical clustering, the k-means algorithm starts with the number of desired clusters, i.e., k. The details of the k-means clustering scheme for AR modeling are described in the following steps [20, 29, 31]. 1. Decide on how many clusters we would like to divide sets of AR model coefficients into. Let the number of clusters be k. 2. Randomly choose k sets of AR model coefficients and assign one set to one cluster. 3. For each of the rest of the sets of data, calculate the distance between the set and the mean of each cluster using Equation 5.29. Assign the set of AR model coefficients to the cluster with which it has the shortest distance, i.e., its closest cluster. Update the mean of the corresponding cluster. 4. Repeat Step 3 until no more changes in clusters take place. 5.7.3 SELECTION SCHEME We propose a selection scheme for sets of AR model coefficients obtained from the clustering schemes [26]. 1. If the total number of sets in one cluster is 75% or more, then the mean of the AR model coefficient values in that cluster is taken to be our final estimation. In other words, any cluster containing less than 25% of the total number of sets is ignored. 2. Otherwise the new estimation is calculated using Equation 5.30. Any cluster with less than 25% of total number of sets is ignored, and the rest of clusters (1,…, T) are valid clusters.
(5 .3 0)
5.7.4 SIMULATION RESULTS
Texture characterization using autoregressive models
207
We provide two synthetic examples to verify the above approaches. Two 1024× 1024 synthetic images are generated using the following stable 2×2 AR models, symmetric and nonsymmetric, respectively. x[m, n]=−0.16x[m−2,n−2]−0.2x[m−2, n−1]−0.4x[m−2, n] −0.2x[m−1, n−2]−0.25x[m−1, n−1]−0.5x[m−1, n] −0.4x[m, n−2]−0.5x [m, n−1]+w[m, n]. and x [m, n]=−0.12x[m−2, n−2]−0.15x[m−2, n−1]−0.3x[m−2, n] −0.16x[m−1, n−2]−0.2x[m−1, n−1]−0.4x[m−1, n] −0.4x [m, n−2]−0.5x[m, n−1]+w[m, n].
TABLE 5.9 AR-Modeling Results of the Symmetric Model with Application of Clustering Schemes (two clusters, SNR=5 dB) AR Model Coefficient
Real Value
Estimated Value (all)
Estimated Value (MHC)
Estimated Value (k-means)
a01
0.5
0.4918
0.4929
0.4927
a02
0.4
0.3842
0.3863
0.3894
a10
0.5
0.4921
0.4930
0.4946
a11
0.25
0.2528
0.2537
0.2524
a12
0.2
0.1963
0.1978
0.1993
a20
0.4
0.3844
0.3863
0.3907
a21
0.2
0.1963
0.1979
0.1985
a22
0.16
0.1528
0.1547
0.1576
Relative error
0.06925
0.02774
0.004986
y[m, n]=x[m, n]+v[m, n], where v[m, n] is the additive Gaussian noise with zero mean and unity variance. The image is divided into 16 blocks of size 256×256. For each block, we estimate a set of AR model coefficients using the constrained optimization with inequality constraints 1 from Section 5.6. The minimum hierarchical clustering (MHC) and k-means algorithm proposed in Sections 5.7.1 and 5.7.2, respectively, are applied to the sets of AR model coefficients obtained. The selection scheme is then applied to the results from the clustering scheme. The SNR of the system is set to be 5 dB. The results of dividing sets of AR model coefficients into two clusters can be found in Table 5.9, where the third column shows the average results from all clusters, the fourth column shows the results after applying the MHC scheme, and the last column shows the results after applying the k-means algorithm. The results for classifying sets of AR model coefficients into three clusters can be found in Table 5.10. In Table 5.11, the results for the nonsymmetric
Medical image analysis method
208
model with two clusters can be found, and in Table 5.12 the results for the nonsymmetric model with three clusters are shown. From these results, we conclude that applying the clustering techniques to these sets of AR model coefficients improves the overall AR model coefficient estimation. The greatest improvement in performance is from the K-means algorithm with the number of clusters equal to 2. 5.8 APPLYING AR MODELING TO MAMMOGRAPHY In this section, we apply the constrained-optimization technique with equality constraints to mammograms for the purpose of texture analysis. Masses and calcifica- tions are two major abnormalities that radiologists look for in mammograms [32]. We concentrate on the texture characterization of the mammogram with a mass under the assumption that the texture of the problematic area is different from the textures of its neighbor blocks, i.e., the AR model coefficients representing them are different. The mammograms used here are extracted from the MIAS database [33]. The images from the database come with detailed information, including the character- istics of background tissue (fatty, fatty-glandular, or dense-glandular), class of abnor- mality (calcification, well-defined/circumscribed masses, spiculated masses, other ill-defined masses, architectural distortion, asymmetry, or normal), severity of abnor- mality (benign or malignant), the image coordinates of center of abnormality, and approximate radius in pixels of a circle enclosing the abnormality.
TABLE 5.10 AR Modeling Results of the Symmetric Model with Application of Clustering Schemes (three clusters, SNR=5 dB) AR Model Coefficient
Real Value
Estimated Value (all)
Estimated Value (MHC)
Estimated Value (k-means)
a01
0.5
0.4918
0.4946
0.4119
a02
0.4
0.3842
0.3900
0.3218
a10
0.5
0.4921
0.4949
0.4122
a11
0.25
0.2528
0.2554
0.2112
a12
0.2
0.1963
0.2009
0.1641
a20
0.4
0.3844
0.3902
0.3221
a21
0.2
0.1963
0.2009
0.1640
a22
0.16
0.1528
0.1587
0.1270
Relative error
0.06925
0.02980
0.03896
Texture characterization using autoregressive models
209
TABLE 5.11 AR Modeling Results of Nonsymmetric Model with Application of Clustering Schemes (two clusters, SNR=5 dB) AR Model Coefficient
Real Value
Estimated Value (all)
Estimated Value (MHC)
Estimated Value (k-means)
a01
0.5
0.4936
0.4941
0.4949
a02
0.4
0.3879
0.3892
0.3925
a10
0.4
0.3927
0.3933
0.3948
a11
0.2
0.2036
0.2041
0.2031
a12
0.16
0.1592
0.1601
0.1612
a20
0.3
0.2884
0.2896
0.2934
a21
0.15
0.1486
0.1495
0.1507
a22
0.12
0.1163
0.1173
0.1199
Relative error
0.06817
0.02484
0.02038
TABLE 5.12 AR Modeling Results of the Nonsymmetric Model with Application of Clustering Schemes (three clusters, SNR=5 dB) AR Model Coefficient
Real Value
Estimated Value (all)
Estimated Value (MHC)
Estimated Value (k-means)
a01
0.5
0.4936
0.4959
0.4145
a02
0.4
0.3879
0.3926
0.3259
a10
0.4
0.3927
0.3953
0.3302
a11
0.2
0.2036
0.2061
0.1710
a12
0.16
0.1592
0.1630
0.1341
a20
0.3
0.2884
0.2926
0.2425
a21
0.15
0.1486
0.1521
0.1251
a22
0.12
0.1163
0.1205
0.09807
Relative error
0.06817
0.02801
0.04677
Medical image analysis method
210
FIGURE 5.5 Example of the mass and its 3×3 neighborhood in a mammogram. For simplicity, we take the square block with the length equal to the given radius as the block of interest. We form a 3×3 neighborhood around the block of interest and then estimate the AR model coefficients of each block, as shown in Figure 5.5. The order of the AR model is assumed to be 1×1.
Texture characterization using autoregressive models
211
FIGURE 5.6 The mammogram with the mass marked: mdb023. 5.8.1 MAMMOGRAMS WITH A MALIGNANT MASS We take three examples of mammograms with a malignant mass. The mammograms are named by their original index numbers in the database. The origin of the system coordinates is the bottom-left corner. 5.8.1.1 Case 1: mdb023 A well-defined mass with fatty-glandular background tissue is found in the square centered at (538, 681) with 59 pixels as its length [33], and the mammogram is shown in Figure 5.6. The AR model coefficients estimated from the block of size 59×59 centered at pixel (538, 681) and eight blocks in its 3×3 neighborhood are shown in Table 5.13. From the results, we find that the AR model coefficients of the block of tumor are almost symmetrical. The degree of symmetry is calculated using Equation 5.31 [23], with smaller values indicating greater symmetry of the set of AR model coefficients, a11−a01×a10 (5.31) 5.8.1.2 Case 2: mdb028
Medical image analysis method
212
In mammogram mdb028, which is shown in Figure 5.7, the malignant mass is found within the square centered (338, 314) with length as 113 pixels. The background tissue is fatty [33]. The AR model coefficients estimated from the block of size 113× 113 centered at pixel (338, 314) and eight blocks in its neighborhood are shown in Table 5.14. The block, Bp, has the smallest value for the degree of symmetry, i.e., the AR model coefficients are more symmetric in this block than others.
TABLE 5.13 AR Model Coefficients for Blocks of Pixels in Mammogram mdb023 AR Model Coefficient
Blocks B1
B2
a01
−0.9104
7.0822
a10
−0.9643 −0.3474 −1.3119 −0.9151 −1.0291 −0.7296 −0.8346 −1.1864 −1.0080
B3
B4
BP
B5
B6
B7
2.4479 −1.3301 −1.0403 −1.1647 −0.1890 −0.8935
a11
0.8759 −7.7154 −2.3145
Degree of symmetry
0.0020
1.2453
5.2553 −1.0768 −0.0282
1.0696
0.8944
0.0010 −0.0446
0.0255
0.6717
1.0800 −0.6707
0.1322 −0.0199
FIGURE 5.7 The mammogram with the mass marked: mdb028.
B8
−0.016
Texture characterization using autoregressive models
213
5.8.1.3 Case 3: mdb058 The mammogram mdb058 is shown in Figure 5.8. A malignant mass is found in the square centered at (318, 359) with length equal to 55 pixels [33]. The AR model coefficients estimated from the block of size 55×55 centered at pixel (318, 359) and eight blocks in its neighborhood are shown in Table 5.15. As in the previous cases, the AR model coefficients in block Bp are more symmetric than the other blocks. 5.8.2 MAMMOGRAMS WITH A BENIGN MASS Apart from mammograms with a malignant mass, we also apply the same method to estimate the AR model coefficients of mammograms with a benign mass. Three examples taken are mdb069, 091, and 142 from the database [33]. 5.8.2.1 Case 1: mdb069 The mammogram mdb069 is shown in Figure 5.9 with its benign mass marked. The background tissue is fatty, and the mass is situated in the square centered (462, 402) with 89 pixels as its length. The AR model coefficients estimated from the block of size 89×89 centered at pixel (462, 402) and eight blocks in its neighborhood are shown in Table 5.16. The results obtained are similar to the results from the mam- mograms with a malignant mass, i.e., the block containing the benign mass can also be represented by a set of AR model coefficients that is more symmetric than the other blocks.
TABLE 5.14 AR Model Coefficients for Blocks of Pixels in Mammogram mdb028 AR Model Coefficient
Blocks B1
B2
B3
B4
BP
B5
B6
B7
a01
−0.0939 −0.9536 −0.8448 −1.2181 −1.0197 −1.0970 −1.1433 1.4875
a10
−3.1346 −3.3854 0.4720
a11
2.2266
Degree of symmetry
−1.9321 −0.1121 0.2288
3.3406
B8 −0.7789
−0.7490 −1.0253 −0.9208 −2.6646 −0.1951 −1.6102
−0.6276 0.9675
1.0450
−0.0551 0.0005
1.0176
2.8055
−0.0075 0.2410
−2.2943 1.3893 2.0041
−0.1351
Medical image analysis method
214
FIGURE 5.8 The mammogram with the mass marked: mdb058. 5.8.2.2 Case 2: mdb091 Figure 5.10 shows the mammogram mdb091, whose background tissue is fatty. The benign mass is situated in the square centered at (680, 494) with the length equal to 41 pixels. The AR model coefficients estimated from the block of size 41×41 centered at pixel (680, 494) and eight blocks in its neighborhood are shown in Table 5.17. By comparing the degree of symmetry calculated for each block, the AR model coefficients from the block Bp are more symmetric. 5.8.2.3 Case 3: mdb142 The mammogram mdb142 is shown in Figure 5.11, with its benign mass highlighted. The background tissue is again fatty, and the mass is within the square centered at (347, 636), with length equal to 53. Table 5.18 shows the AR model coefficients estimated from the block of size 53×53 centered at pixel (347, 636) and eight blocks in its neighborhood. The degree of symmetry is small for all the blocks, and block Bp has the smallest degree of symmetry.
Texture characterization using autoregressive models
215
5.9 SUMMARY AND CONCLUSION In this chapter, we investigated the possibility of applying the two-dimensional autoregressive (AR) modeling technique to characterize textures in mammograms. The two-dimensional AR model, the Yule-Walker system of equations, and the extended Yule-Walker system of equations in the third-order statistical domain were revisited. Three methods for estimating AR model coefficients using both the Yule-Walker system
TABLE 5.15 AR Model Coefficients for Blocks of Pixels in Mammogram mdb058 AR Model Coefficient
Blocks B1
B2
B3
B4
BP
B5
B6
B7
B8
a01
−0.6095 −0.8822 −0.9133 −0.5295 −1.0697 12.5210
−0.4072 −0.9271 −1.1034
a10
−1.6463 −1.2162 −0.6749 −5.0211 −1.0366 −1.6406
−0.6327 −0.8503 −0.8627
a11
1.2561
Degree of symmetry
−0.2528 −0.0256 0.0278
1.0985
0.5886
4.5482
1.1062
−1.8893 0.0026
−11.8717 0.0399
0.7775
0.9662
−8.6708
0.0108
−0.0142
0.2177
FIGURE 5.9 The mammogram with the mass marked: mdb069.
Medical image analysis method
216
of equations and the extended Yule-Walker system of equations in the third-order statistical domain were proposed. Their simulation results showed that these methods are able to estimate two-dimensional AR model coefficients in both low- and highSNR (signal-to-noise ratio) systems, and the variances generated from 100 realiza- tions were sufficiently small. The AR modeling results were further improved for images with a single texture by clustering methods. Finally, one of the proposed methods was applied to characterize the texture of mammograms. Preliminary obser- vations concerned the fact that the 1×1 AR model coefficients representing the tumor area seemed to be more symmetric compared with the AR model coefficients of its neighbor blocks.
TABLE 5.16 AR Model Coefficients for Blocks of Pixels in Mammogram mdb069 AR Model Coefficient
Blocks B1
B2
B3
B4
Bp
B5
B6
B7
B8
a01
−18.1696 −1.5331 −0.8151 −0.6781 −1.0925 −1.5952 −0.3248 −0.9324 1.7081
a10
−19.0165 −0.7074 −1.2883 −0.6861 −1.1285 −1.6451 −1.2173 −0.6381 −14.3620
a11
36.1743
1.2413
0.3653
1.2211
2.2409
0.5417
Degree of symmetry
309.4
−0.1568 −0.0536 0.1000
0.0118
0.3832
−0.1463 0.0241
1.1037
FIGURE 5.10 The mammogram with the mass marked: mdb091.
0.5708
11.5490 −35.9096
Texture characterization using autoregressive models
217
TABLE 5.17 AR Model Coefficients for Blocks of Pixels in Mammogram mdb091 AR Model Coefficient
Blocks B1
B2
B3
B4
BP
B5
B6
B7
B8
a01
−1.0586 −0.8702 −1.1097 −1.0722 −1.0645 −0.8432 −0.9022 −0.3007 −0.7324
a10
−0.7826 −0.9344 −1.0844 0.7697
a11
0.8416
0.8048
1.1943
−0.6972 1.1042
0.9626
Degree of symmetry
−0.0132 0.0083
0.0091
−0.1282 0.0023
−0.0188 −0.0081 −0.0318 −0.1056
−1.0395 −1.1193 −1.0801 −1.0465 −1.3942 0.9626
FIGURE 5.11 The mammogram with the mass marked: mdb142.
0.3465
1.1268
Medical image analysis method
218
TABLE 5.18 AR Model Coefficients for Blocks of Pixels in Mammogram mdb142 AR Model Coefficient
Blocks B1
B2
B3
B4
BP
B5
B6
B7
B8
a01
−1.0429 −0.8479 −0.9107 −0.8579 −1.0396 −1.0608 −0.7939 −0.8885 −0.6772
a10
1.0748
a11
−1.0319 0.9858
0.7859
0.6767
1.0992
0.9243
0.6357
0.2868
Degree of symmetry
−0.0889 −0.0213 0.0110
0.0255
0.0022
−0.0083 −0.2517 0.0280
0.1259
−1.1384 −0.8749 −0.8168 −1.0595 −0.8636 −2.2192 −0.7471 −0.6094 2.0136
REFERENCES 1. Giannakis, G.B., Mendel, J.M., and Wang, W., ARMA modeling using cumulants and autocorrelation statistics, Proc. Int. Conf. Acoustics, Speech Signal Process. (ICASSP), 1, 61, 1987. 2. Stathaki, P.T., Cumulant-Based and Algebraic Techniques for Signal Modelling, Ph.D. Thesis, Imperial College, London, 1994. 3. Ifeachor, E.G., Medical Applications of DSP, presented at IEEE Younger Members Tutorial Seminar on DPS: Theory, Applications and Implementation, IEEE, Wash- ington, DC, 1996. 4. Bloem, D. and Arzbaecher, R., Discrimination of atrial arrhythmias using autoregressive modelling, Proc. Comput. Cardiol., Durham, NC, USA. 235–238, 1992. 5. Nepal, K., Biegeleisen, E., and Ning, T., Apnea detection and respiration rate esti- mation through parametric modelling, Proc. IEEE 28th Ann. Northeast Bioeng. Conf., Philadelphia, USA. 277–278, 2002. 6. Thonet, G. et al., Assessment of stationarity horizon of the heart rate, Proc. 18th Ann. Int. Conf. IEEE Eng. Medicine Biol. Soc., Bridging Disciplines Biomedicine, 4, 1600, 1996. 7. Economopoulos, S.A. et al., Robust ECG coding using wavelet analysis and higherorder statistics, IEE Colloq. Intelligent Methods Healthcare Medical Appl., 15/1–15/6, Digest number 1998/514, York, UK. 1998. 8. Palaniappan, R. et al., Autoregressive spectral analysis and model order selection criteria for EEG signals, Proc. TEN CON 2000, 2, 126. Kuala Lumpur, Malaysia. 9. Stathaki, P.T. and Constantinides, A.G., Noisy texture analysis based on higher order statistics and neural network classifiers, Proc. IEEE Int. Conf. Neural Network Appli- cation to DSP, 324–329, 1993. 10. Stathaki, P.T. and Constantinides, A.G., Robust autoregressive modelling through higher order spectral estimation techniques with application to mammography, Proc. 27th Ann. Asilomar Conf. Signals, Systems Comput., 1, 189, 1993. 11. Stathaki, T. and Constantinides, A.G., Neural networks and higher order spectra for breast cancer detection, Proc. IEEE Workshop Neural Network for Signal Processing, 473–481, 1994. 12. Kay, S.M., Modern Spectral Estimation: Theory and Application, Signal Processing Series, Prentice-Hall, Englewood Cliffs, NJ, 1987. 13. Bhattacharya, S., Ray, N.C., and Sinha, S., 2-D signal modelling and reconstruction using thirdorder cumulants, Signal Process., 62, 61, 1997.
Texture characterization using autoregressive models
219
14. Lee, S., Novel Methods on 2-D AR Modelling, M.Phil, to Ph.D. Transfer Report, Dept. Electrical Electronic Engineering, Imperial College, London, 2003. 15. Nikias, C.L. and Raghuveer, M., Bispectrum estimation: a digital signal processing framework, Proc. IEEE, 75, 869, 1987. 16. Mendel, J.M., Tutorial on higher order statistics (spectra) in signal processing and system theory: theoretical results and some applications, Proc. IEEE, 79, 278, 1991. 17. Giannakis, G.B., Cumulants: a powerful tool in signal processing, Proc. IEEE, 75, 1333, 1987. 18. Giannakis, G.B., Identification of nonminimum-phase systems using higher order statistics, IEEE Trans. ASSP, 37, 360, 1989. 19. Giannakis, G.B., On the identifiability of non-Gaussian ARMA models using cumu- lants, IEEE Trans. Automatic Control, 35, 18, 1990. 20. Giannakis, G.B., Cumulant-based order determination of non-Gaussian ARMA mod- els, IEEE Trans. ASSP, 38, 1411, 1990. 21. Giannakis, G.B. and Swami, A., On estimating noncausal nonminimum phase ARMA models of non-Gaussian processes, IEEE Trans. ASSP, 38, 478, 1990. 22. Dickie, J.R. and Nandi, A.K., AR modelling of skewed signals using third-order cumulants, IEEE Proc. Vision, Image Signal Process., 142, 78, 1995. 23. Lee, S. and Stathaki, T., Texture Characterisation Using Constrained Optimisation Techniques with Application to Mammography, presented at 5th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), on CD, Lisbon, Portugal, 2004. 24. Gill, P.E., Murray, W., and Wright, M.H., Practical Optimization, Academic Press, New York, 1981. 25. Lee, S., Stathaki, T., and Harris, F., Texture characterisation using a novel optimisation formulation for two-dimensional autoregressive modelling and k-means algorithm, 37th Asilomar Conf. Signals, Systems Comput., 2, 1605, 2003. 26. Lee, S., Stathaki, T., and Harris, F., A two-dimensional autoregressive modelling technique using a constrained optimisation formulation and the minimum hierarchical clustering scheme, 38th Asilomar Conf. Signals, Systems Comput., 2, 1690, 2004. 27. Johnson, S.C., Hierarchical clustering schemes, Psychometrika, 32, 241, 1967. 28. Borgatti, S.P., How to explain hierarchical clustering, Connections, 17, 78, 1994. 29. Anderberg, M.R., Clustering Analysis for Applications, Academic Press, New York, 1973; Statistics, John Wiley & Sons, New York, 1975. 30. Hartigan, J.A., Clustering Algorithms, Wiley Series in Probability and Mathematical 31. Jain, A.K., Murty, M.N., and Flynn, P.J., Data clustering: a review, ACM Computing Surveys, 31, 264–323, 1999. 32. Jain, A. et al., Artificial Intelligence Techniques in Breast Cancer Diagnosis and Prognosis, Series in Machine Perception and Artificial Intelligence, 39, World Sci- entific, Singapore, London, 1–15, 2000. 33. Mammographic Image Analysis Society (MIAS), MiniMammography Database; available online at http://www.wiau.man.ac.uk/services/MIAS/MIASmini.html, last accessed 5/25/2004.
6 Locally Adaptive Wavelet Contrast Enhancement Lena Costaridou, Philipos Sakellaropoulos, Spyros Skiadopoulos, and George Panayiotakis 6.1 INTRODUCTION Breast cancer is the most frequently occurring cancer in women [1–3]. Detecting the disease in its early stages increases the rate of survival and improves the quality of patient life [4, 5]. Mammography is currently the technique with the highest sensitivity available for early detection of breast cancer on asymptomatic women. Detection of early signs of disease, such as microcalcifications (MCs) and masses in mammography programs, is a particularly demanding task for radiologists. This is attributed to the high volume of images reviewed as well as the low-contrast character of mammographic imaging, especially in the case of dense breast, accounting for about 25% of the younger female population [6, 7]. Calcifications are calcium salts produced by processes carried out inside the breast ductal system. They are radiodense, usually appearing lighter than surrounding parenchyma, due to their inherently high attenuation of X-rays. Depending on the X-ray attenuation of surrounding parenchyma (i.e., dense breast), they can be low-contrast entities, with their low-contrast resolution limited by their size. Magnification of mammographic views, characterized by improved signal-to-noise ratio, result in improved visualization of MCs. Masses, which represent a more invasive process, are compact radiodense regions that also appear lighter than their surrounding parenchyma due to higher attenuation of Xrays. The major reason for the low contrast of malignant masses is the minor difference in X-ray attenuation between even large masses and normal dense surrounding parenchyma. The use of complementary mammographic views, craniocaudal (CC) and mediolateral (MLO), is intended to resolve tissue superimposition in different projections [8, 9]. Identification and differentiation (benign vs. malignant) of MCs and masses have been the major subject of computer-aided diagnosis (CAD) systems that are aimed at increasing the sensitivity and specificity of screening and interpretation of findings by radiologists. CAD systems in mammography have been an active area of research during the last 20 years [10–17]. In addition to dense breast regions, mammography periphery is also poorly imaged due to systematic lack of compressed breast tissue in this region [18, 19]. Although periphery visualization is associated with more advanced stages of disease, such as skin thickening and nipple retraction, it has attracted research attention, either as a
Medical image analysis method
222
preprocessing stage of CAD system [10] or enhancement [18–26] and for skin detection [27–29]. 6.2 BACKGROUND Digital image-enhancement methods have been widely used in mammography to enhance contrast of image features. Development of mammographic image-enhancement methods is also motivated by recent developments in digital mammography and soft-copy display of mammograms [30, 31]. Specifically, image display and enhancement methods are needed to optimally adapt the increased dynamic range of digital detectors, up to 212 gray levels, to the human dynamic range, up to 27 gray levels for expert radiologists. Different algorithms have advantages and disadvantages for the specific tasks required in breast imaging: diagnosis and screening. A simple but effective method for image enhancement is intensity windowing (IW) [32]. IW stretches a selected range of gray levels to the available display range. However, in mammography (unlike CT), there is not an absolute correspondence between the recorded intensities and the underlying tissue, and thus IW settings cannot be predetermined. Manual contrast adjustment of a displayed digital mammogram with IW resembles adjustment of a screen-film mammogram’s contrast on a light-view box. Automated algorithms have been developed to avoid userdependent and time-consuming manual adjustments. Component-based IW techniques segment the mammographic image into its components (background, uncompressed-fat, fat, dense, and muscle) and adjust IW parameters to emphasize the information in a single component. Mixture-modeling-based IW [33] uses statistical measures to differentiate fat from dense-component pixels to accentuate lesions in the dense part of the mammogram. A preprocessing step is applied to separate the edge border. Adaptive local-enhancement methods modify each pixel value according to some local characteristics of the neighborhood around the pixel’s location. Adaptive histogram equalization (AHE) is a well-known technique that uses regional histograms to derive local mapping functions [34]. Although AHE is effective, it tends to overemphasize noise. Contrast-limited AHE (CLAHE) was designed to overcome this problem, but the contrast-limit parameter is image and user dependent [35]. Local-range modification (LRM) is an adaptive method that uses local minima-maxima information to calculate local linear stretching functions [36]. LRM enhances image contrast, but it tends to create artifacts (dark or bright regions) in the processed image. Spatial filtering methods, like unsharp masking (UM) [37], adaptive contrast enhancement (ACE) [38], multichannel filtering [39], and enhancement using first derivative and local statistics [40] amplify mid- to high-spatial-frequency components to enhance image details. However, these methods are characterized by noise overenhancement and ringing artifacts caused by amplification of noise and high-contrast edges [41]. More complex filtering methods like contrast enhancement based on histogram transformation of local standard deviation [42] and just-noticeable-difference-guided ACE [41] attempt to overcome these problems by using smaller gains for smooth or high-contrast regions. Adaptive neighborhood contrast enhancement (ANCE) methods [43–46] directly manipulate the local contrast of regions, computed by comparing the intensity of each region with the intensity of its background. Region growing is used to identify regions and corresponding backgrounds.
Locally adaptive wavelet contrast enhancement
223
A common characteristic of the above-mentioned techniques is that they are based on the single-scale spatial domain. Due to this fact, they can only enhance the contrast of a narrow range of sizes, as determined by the size of local-processing region. Additionally, they tend to increase the appearance of noise. To enhance features of all sizes simultaneously, multiresolution enhancement methods, based on the wavelet transform [47], have been developed. A multiscale representation divides the frequency spectrum of an image into a low-pass subband image and a set of band-pass subband images, indexed by scale s and orientation. The spatial and frequency resolution of the subband images are proportional to 1/s and s, respectively. Because sharp image variations are observed at small scales, they are analyzed with fine spatial resolution. By exploiting the location and frequency-selectivity properties of the wavelet transform, we can progressively “zoom” into image features and characterize them through scale-space. Mammographic image analysis can benefit from this strategy, because mammograms contain features with varying scale characteristics. The main hypothesis of image wavelet analysis is that features of interest reside at certain scales. Specifically, features with sharp borders, like MCs, are mostly contained within high-resolution levels (small scales) of a multiscale representation. Larger objects with smooth borders, like masses, are mostly contained in low-resolution levels (coarse scales). Different features can thus be selectively enhanced (or detected) within different resolution levels. Also, a noise-reduction stage could be applied prior to enhancement, exploiting the decorrelation properties of the wavelet transform. The main approach for wavelet-based enhancement (WE) uses a redundant wavelet transform [48] and linear or nonlinear mapping functions applied on Laplacian or gradient wavelet coefficients [49–52]. Such methods have demonstrated significant contrast enhancement of simulated mammographic features [50], and also improved assessed visibility of real mammographic features [51]. Another approach uses a multiscale edge representation, provided by the same type of wavelet transform, to accentuate multiscale edges [53]. Recently, spatially adaptive transformation of wavelet coefficients has been proposed [54] for soft-copy display of mammograms, aiming at optimized presentation of mammographic image contrast on monitor displays. Spatial adaptivity is motivated from the fact that mapping functions in previous methods [49, 50] are typically characterized by global parameters at each resolution level. Global parameters fail to account for regions of varying contrasts such as fat, heterogeneously dense, and dense in mammograms. This method provides an adaptive denoising stage, taking into account recent works for wavelet-based image denoising [55, 56], in addition to locally adaptive linear enhancement functions. Performance of contrast-enhancement methods is important for soft-copy display of mammograms in the clinical environment. It is usually differentiated with respect to task (detection or characterization) or type of lesion (calcifications or masses). Several enhancement methods have been evaluated as compared with their unprocessed digitized versions [46, 57–60], and a small number of intercomparison studies has been performed [54, 61, 62]. Intercomparison studies are useful in the sense that they are a first means of selecting different contrast-enhancement methods to be evaluated later on, carried out with an identical sample of original (unprocessed) images and observers. These intercomparison studies are usually based on observer preference as an initial step for
Medical image analysis method
224
selection of an appropriate contrast-enhancement method (i.e., those with high preference). Receiver operating characteristics (ROC) studies should be conducted as a second step for comparative evaluation of these methods with respect to detection and classification accuracy of each lesion type [63]. Sivaramakrishna et al. [61] conducted a preference study for performance evaluation of four image contrast-enhancement methods (UM, CLAHE, ANCE, and WE) on a sample of 40 digitized mammograms containing 20 MC clusters and 20 masses (10 benign and 10 malignant in each lesion type). In the case of MCs, processed images based on the ANCE and WE methods were preferred in 49% and 28% of cases, respectively. For masses, the digitized (unprocessed) images and UM-based processed images were preferred in 58% and 28% of cases, respectively. The authors concluded that different contrast-enhancement approaches may be necessary, depending on the type of lesion. Pisano et al. [62] carried out a preference study for performance evaluation of eight image contrast-enhancement methods on a sample of 28 images containing 29 cancerous and 36 benign pathological findings (masses or MCs) produced from three different digital mammographic units. All processed images were printed on film and compared with respect to their corresponding screen-film images. Screen-film images were preferred to all processed images in the diagnosis of MCs. For the diagnosis of masses, all processed images were preferred to screen-film images. This preference was statistically significant in the case of the UM method. For the screening task of the visualization of anatomical features of main breast and breast periphery, screen-film images were generally preferred to processed images. No unique enhancement method was preferred. Recently, the spatially adaptive wavelet (AW) enhancement method has been compared with CLAHE, LRM, and two wavelet-based enhancement methods (global linear and nonlinear enhancement methods) in a sample of 18 MC clusters [54]. The AW method had the highest preference. The results of these preference studies show that a contrast-enhancement method with high performance in all tasks and types of lesions has not been developed. In addition, the small number of preference studies is not adequate to indicate the promising contrastenhancement methods for clinical acceptance. Further preference studies are needed comparing the performance of contrast-enhancement methods presented in the literature. Observer preference as well as ROC studies are not time-consuming nowadays because (a) a case sample can be selected from common mammographic databases (e.g., Digital Database for Screening Mammography—DDSM [64, 65], Mammographic Image Analysis Society—MIAS [66, 67]) and (b) high-speed processors can be used for lower computational times. A brief summary of redundant dyadic wavelet analysis is given in Sections 6.3.1 and 6.3.2. The basic principles of wavelet denoising and contrast enhancement are presented in Sections 6.3.3.1 and 6.3.4.1, while details of an adaptive denoising and enhancement approach are provided in Sections 6.3.3.2 and 6.3.4.2. The performance of the AW method is quantitatively assessed and compared with the IW method, by means of simulated MC clusters superimposed on dense breast parenchyma in Section 6.3.7. In Section 6.4, evaluation is carried out by an observer performance comparative study between original-plus-AW-processed and original-plus-IW-processed images with
Locally adaptive wavelet contrast enhancement
225
respect to three tasks: detection, morphology characterization, and pathology classification of MC clusters on dense breast parenchyma. 6.3 MATERIALS AND METHODS 6.3.1 DISCRETE DYADIC WAVELET TRANSFORM REVIEW The dyadic wavelet transform series of a function f(x) with respect to a wavelet function ψ(x) is defined by the convolution (6.1) is the dilation of ψ(x) by a factor of 2j. In general, f(x) can where be recovered from its dyadic wavelet transform from the summation (6.2) where the reconstruction wavelet χ(x) is any function whose Fourier transform satisfies (6.3) The approximation of f(x) at scale 2j is defined as (6.4) where
is a smoothing function called the scaling function that satisfies the equation (6.5)
In practice, the input signal is measured at a certain resolution, and thus the wavelet transform cannot be computed at any arbitrary fine scale. However, a discrete periodic signal D, derived from a periodic extension of a discrete signal, can be considered as the sampling of a smoothed version of a function f(x) at the finest scale 1: (6.6) As the scale 2j increases, more details are removed by the 1
j
operator. Dyadic wavelet
transform series between scales 2 and 2 contain the details existing in the S1f(x) representation that have disappeared in Sjf(x).
Medical image analysis method
226
6.3.2 REDUNDANT DYADIC WAVELET TRANSFORM Redundant (overcomplete) biorthogonal wavelet representations are more suitable for enhancement compared with orthogonal, critically sampled wavelet representations.
FIGURE 6.1 (a) A cubic spline function and (b) a wavelet that is a quadratic spline of compact support. Avoiding the downsampling step after subband filtering ensures that wavelet coefficient images are free from aliasing artifacts. Additionally, the wavelet representation is invariant under translation [68]. Smooth symmetrical or antisymmetrical wavelet functions can be used [69] to alleviate boundary effects via mirror extension of the signal. Mallat and Zhong have defined a fast, biorthogonal, redundant discrete wavelet transform (RDWT) that can be used to derive multiscale edges from signals [48]. It is based on a family of wavelet functions ψ(x) with compact support that are derivatives of corresponding Gaussian-like spline functions θ(x). Fourier transforms of these functions are defined as follows (6.7) (6.8) By choosing n=1, we obtain a wavelet function ψ(x) that is a quadratic spline, while θ(x) is a cubic spline. These functions are displayed in Figure 6.1. For this particular class of wavelet functions, the wavelet transform series of f(x, y) for −∞ <j<+∞ has two components and is given by
Locally adaptive wavelet contrast enhancement
227
(6.9)
FIGURE 6.2 Filter-bank scheme used to implement the RDWT for two scales. The discrete wavelet transform is a uniform sampling of the wavelet transform series, discretized over the scale parameter s at dyadic scales (wavelet transform series). The analyzing wavelets ψ1(x, y) and ψ2(x, y) are partial derivatives of a symmetrical, smoothing function θ(x, y) approximating the Gaussian and j the dyadic scale. The DWT is calculated up to a coarse dyadic scale J. Therefore, the original image is decomposed into a multiresolution hierarchy of subband images, consisting of a coarse approximation
image
and
a
set
of
wavelet
images
which provide the details that are available in S1f but have disappeared in All subband images have the same number of pixels as the original, thus the representation is highly redundant. Figure 6.2 shows the filter bank scheme used to implement the DWT (two dyadic scales). The transform is implemented using a filter bank algorithm, called algorithme à trous (algorithm with holes) [70], which does not involve subsampling. The filter bank is characterized by discrete filters H(ω), G(ω), K(ω), and L(ω). All filters have compact support and are either symmetrical or antisymmetrical. At dyadic scale j, the discrete filters are Hj(ω), Gj(ω), Kj(ω), and Lj(ω) obtained by inserting zeros (“holes”) between each of the coefficients of the corresponding filters. The coefficients of the filters are listed in Table 6.1. Equation 6.9 shows that the DWT computes the multiscale gradient vector. Coefficient subband images are proportional to the sampled horizontal and vertical components of the multiscale gradient vector, and thus they are related to local contrast. The magnitudephase representation of the gradient vector, in the discrete case, is given by
Medical image analysis method
228
(6.10)
Demonstrations of gradient-magnitude vector, superimposed on two mammogram regions containing masses, are presented in Figure 6.3. Magnitude of the
TABLE 6.1 Filter Coefficients for the Filters— H(n), G(n), K(n), and L(n)—Corresponding to the Quadratic Spline Wavelet of Figure 6.1 N
H(n)
G(n)
K(n)
L(n)
−3
—
—
0.0078125
0.0078125
−2
—
—
0.0546875
0.046875
−1
0.125
—
0.171875
0.1171875
0
0.375
2.0
−0.171875
0.65625
1
0.375
−2.0
−0.0546875
0.1171875
2
0.125
—
−0.0078125
0.046875
3
—
—
—
0.0078125
gradient vector at each location corresponds to the length of the arrow, while phase corresponds to the direction of the arrow. It can be observed that the gradient-magnitude vector is perpendicular to lesion contours. Because contrast enhancement should be perpendicular to edge contours to avoid orientation distortions, subsequent processing is applied on the multiscale magnitude values. 6.3.3 WAVELET DENOISING 6.3.3.1 Noise Suppression by Wavelet Shrinkage Digitized mammograms are corrupted by noise due to the acquisition and digitization process. Prior to contrast enhancement, a denoising stage is desirable to avoid or reduce amplification of noise. Conventional noise-filtering techniques reduce noise by suppressing the high-frequency image components. The drawback of these methods is that they cause edge blurring. Wavelet-based denoising methods, however, can effectively reduce noise while preserving the edges. The two main approaches for wavelet-based noise suppression are: (a) denoising by analyzing the evolution of multiscale edges across scales [48] and (b) denoising by wavelet shrinkage [71]. The algorithm of Mallat and Hwang [48] is based on the behavior of multiscale edges across scales of the wavelet transform. They proved that signal singularities (edges) are characterized by positive Lipschitz exponents, and thus the magnitude values of edge
Locally adaptive wavelet contrast enhancement
229
points increase with increasing scale. Noise singularities, on the other hand, are characterized by negative Lipschitz exponents, and thus the magnitude values of edge points caused by noise decrease with increasing scale. The algorithm computes Lipschitz exponents from scales 22 and 23 to eliminate edge points with negative exponents and reconstruct the maxima at the finest scale 21, which is mostly affected by noise. The drawback of this method is although the reconstruction from the multiscale edge representation produces a close approximation of the initial image, some image details are missed. It is also very computationally intensive. A simpler denoising method, widely used in signal and image processing, is wavelet shrinkage [71]. This method consists of comparing wavelet coefficients against a threshold that distinguishes signal coefficients from noise coefficients. This approach is justified by the decorrelation and energycompaction properties of the wavelet transform. Signal (or image) energy in the wavelet domain is mostly concentrated in a few large coefficients. Therefore, coefficients below a threshold are attributed to noise and are set to zero. Coefficients above the threshold are either kept unmodified (hard-thresholding) or modified by subtracting the threshold (softthresholding). Using the RDWT as a basis for wavelet shrinkage is beneficial because thresholding in a shift-invariant transform outperforms thresholding in an orthogonal transform by reducing artifacts, such as pseudo-Gibbs phenomena [72]. Thresholding is applied on gradient-magnitude values to avoid orientation distortions [50]. Softthresholding can be mathematically described by Equation 6.11
Medical image analysis method
230
FIGURE 6.3 Gradient magnitude vector superimposed on mammogram regions containing lesions.
Locally adaptive wavelet contrast enhancement
231
(6.11)
is the denoised gradient value and Ts the threshold at scale s, specified at where a fixed percentile of the cumulative histogram of the gradient-magnitude subimage. The denoised image is obtained after reconstruction from the thresholded wavelet coefficients. Hard-thresholding can be mathematically described as follows (6.12)
is the denoised gradient value and Ts is the threshold at scale s, again where specified at a percentile value of the cumulative histogram of the gradient-magnitude subimage. The soft- and hard-thresholding functions are graphically displayed in Figure 6.4. Soft-thresholding has the advantage that the transformation function is continuous and thus artifacts are avoided. However, it can result in some edge blurring, especially if a large threshold is used. 6.3.3.2 Adaptive Wavelet Shrinkage The thresholding methods described above use a global threshold at each subband. Although algorithms have been proposed for its calculation, it is difficult to define an optimal threshold. Moreover, a global threshold cannot accommodate for varying image characteristics. In smooth regions the coefficients are dominated by noise, thus most of these coefficients should be removed. In regions with large variations, the coefficients carry signal information, thus they should be slightly modified to preserve signal details. In this work, a spatially adaptive thresholding strategy is proposed. Specifically, softthresholding using a local threshold is applied on wavelet
Medical image analysis method
232
FIGURE 6.4 Examples of (a) softthresholding and (b) hard-thresholding functions. coefficient magnitudes. The threshold is calculated at each (dyadic) scale and position by applying a local window and using the formula (6.13) where σN,s is noise standard deviation at scale s, and σM,s (m, n) is the signal standard deviation at scale s, position (m, n). This formula has been proposed by Chang et al. [55] with the assumption that wavelet coefficients can be modeled as generalized Gaussian distribution random variables and the noise as Gaussian distribution. Local standard deviation of the signal is calculated from a local window, centered at each coefficient position (6.14)
where N=2L+1 is the window size, and Ms(m, n) is the gradient magnitude at position (m, n). The term σ2N, s is subtracted because gradient magnitudes are noisy, and the noise is considered independent of the signal. Noise standard deviation σN, s is estimated from the mammogram background area. An automatic procedure is used to define a background area, used for noise estimation. Specifically, a rectangular window with size equal to the 20% of the mammogram width scans the image, and two metrics are measured: mean value of gradient magnitudes and the 98% percentile of gradient magnitudes. At the position where the two metrics are minimized, the area under the window defines the background area. Use of the 100% percentile (maximum) is avoided to exclude bright pixels caused by artifacts. This background identification process has been accurate in all cases tested. After background identification, noise standard deviation at each dyadic scale (σN, s) is calculated by applying the transform on a background area of the mammogram and measuring the standard deviation of gradient magnitudes. Denoised gradient are given by soft-thresholding using the local threshold (6.15) To speed up local standard deviation calculations, an interpolation procedure [34] was used. Specifically, the local threshold is first calculated on a sample grid, defined from the centers of contiguous blocks (windows). Then, the local threshold at each position is calculated using interpolation from the local thresholds assigned to the four surrounding grid points. A window size of 9×9 was used.
Locally adaptive wavelet contrast enhancement
233
6.3.4 WAVELET CONTRAST ENHANCEMENT 6.3.4.1 Global Wavelet Mapping In the framework of the RDWT, the main approach for contrast enhancement is linear or nonlinear mapping of wavelet coefficients [50]. This approach can be justified by the nature of wavelet coefficients. The wavelet used in the RDWT is the first derivative of a smoothing function. Consequently, wavelet coefficients are proportional to imageintensity variations and are related to local contrast. Evidence of the relationship between contrast and wavelet coefficients for mammographic images has been provided in a recent work [73]. Using the RDWT as a basis for contrast enhancement is beneficial because of the shift invariance and lack of aliasing characteristics of the wavelet transform. Because contrast enhancement should be perpendicular to edge contours to enhance the contrast of structures, subsequent processing is applied on the multiscale magnitude values. Contrast enhancement by linear enhancement consists of linear stretching of the multiscale gradients. This ensures that all regions of the image are enhanced in the same way. It can be mathematically expressed by (6.16) is the enhanced gradient-magnitude value at position (m, n)-scale s, and where ks>1 is a gain parameter. The gain can vary across scales to selectively enhance features of different sizes. For example a larger k1 ensures that the fine structures will be enhanced more. Usually however, the same value k is used across all scales. Linear enhancement is equivalent to multiscale UM [50]. This can be shown easily in the one-dimensional case. If we denote the frequency channel responses of the wavelet transform Cm(ω) and assume that the same gain k>1 is used for all band-pass channels (0≤m≤N−1), the system-frequency response becomes (6.17)
The spatial response of the system is thus y(m)=x(m)+(k−1)[x(m)−(x*CN)(m)] (6.18) CN(ω) is a low-pass filter, and thus [x(m)−(x×CN)(m)] is a high-pass version of the signal. Because UM consists of adding a scaled high-pass version to the original, Equation 6.18 describes an unsharp masking operation. A drawback of linear enhancement is that it leads to inefficient usage of the dynamic range available because it emphasizes high-contrast and low-contrast edges with the same gain. For example, a single high-contrast MC in a mammogram, enhanced by a linear enhancement, will cause gross rescaling within the availably dynamic range of the
Medical image analysis method
234
display. The subtle features contained in the processed mammogram will have low contrast, and their detection will be difficult. To avoid the previously mentioned drawback of linear enhancement and enhance the visibility of low-contrast regions, the mapping function must avoid overenhancement of the large gradient-magnitude values. A nonlinear mapping function that is used to emphasize enhancement of low-contrast features has the following form (6.19)
where is the “enhanced” magnitude gradient value at position (m, n)-scale s, ks>1 is a gain parameter, and Ts is a low-contrast threshold. By selecting different gain parameters at each dyadic scale, the contrast of specific-sized features can be selectively enhanced. The low-contrast threshold parameter can be set in two ways: (a) as a percentage of the maximum gradient value in the gradient-magnitude subimage and (b) as a percentile value of the cumulative histogram of the gradient-magnitude subimage. The linear and nonlinear contrast enhancement mapping functions are graphically displayed in Figure 6.5. 6.3.4.2 Adaptive Wavelet Mapping With respect to wavelet contrast enhancement, in the framework of the redundant wavelet transform, the main approach is linear or nonlinear mapping of wavelet coefficients. Linear mapping uses a uniform gain G to multiply wavelet coefficients at each scale. However, linear enhancement emphasizes strong and low contrasts in the same way. When the processed image is rescaled to fit in the available display dynamic range, weak signal features with low contrast are suppressed. For this
FIGURE 6.5 Examples of (a) linear and (b) nonlinear contrastenhancement mapping functions.
Locally adaptive wavelet contrast enhancement
235
reason, nonlinear enhancement was introduced. It uses a nonlinear mapping function that is a piecewise linear function with two linear parts. The first part has a slope G>1 (where G is the gain) and is used to emphasize enhancement of low-contrast features, up to a threshold T. The second part has slope equal to 1, to avoid overenhancement of highcontrast features. However, a drawback of the method is that the parameters of the transformation function at each scale are global. Because mammograms contain regions characterized by different local energy of wavelet coefficients, a global threshold and gain cannot be optimal. If a large gain G is used to ensure that all low-contrast features are emphasized, the second part of the mapping function essentially clips coefficient values and thus distorts edge information. A satisfactory value for the global threshold T can also be easily determined. If a large T value is used to include a greater portion of low-contrast features to be enhanced, the nonlinear mapping function approximates the linear one. Sakellaropoulos et al. [54] tried an adaptive approach using a locally defined linear mapping function, similar to the LRM method [36]. The enhancement process of LRM has been modified and is applied on gradient-magnitude values. The enhanced gradientmagnitude coefficient values are given by (6.20) The limited adaptive gain GL, s(m, n) is derived by (6.21) where M1, max is the maximum value of the magnitude subband image at scale 1, Ms, n) is the local maximum value in a N×N window at the magnitude subband image at scale s and position (m, n), and L is a local gain-limit parameter. Before the application of the adaptive mapping function, a clipping of the magnitude values at the top 2% of their histogram is performed. Clipping is used to alleviate a problem inherent with the LRM method. Specifically, if the undipped maximum values are used, isolated bright points in the magnitude subband image result in a significantly decreased gain Gs(m, n) at a region around them. After reconstruction, the respective regions would appear in the processed image as blurred. Because the gradient-magnitude mapping has to be monotonically increasing, the local minimum values that are used in the LRM method are not used in Equation 6.9. The adaptive gain GL, s(m, n) forces local maxima to become equal or close to a target global maximum. Therefore, the local enhancement process stretches gradient magnitudes in low-contrast regions. However, overenhancement of contrast in such regions can yield unnatural-looking images, thus the local gain limit parameter L is used to limit the adaptive gain. Setting L equal to 20 provided satisfactory results for all images processed in this study. Use of the same target global maximum value (M1, max) for all subband images was found to result in sharp-looking processed images, emphasizing local details. To speed up the calculation of local maxima, the interpolation procedure followed by the LRM method is used [36]. This procedure involves two passes through the image. In the first pass, the maximum gradient-magnitude values are found for half-overlapping max(m,
Medical image analysis method
236
windows of size N, centered at a rectangular sample grid. In the second pass, local maximum values at each pixel position are calculated by interpolating the maximum values assigned at the four surrounding grid points. The interpolation procedure results in local gain varying smoothly across the image, therefore it is more preferable than direct calculation of local maximum values at each position. Calculation time is significantly reduced, even if a large window size is used. The method is not sensitive to the window size. For the results of this study, a constant window size of 21×21 pixels was used. The gradient-magnitude mapping function defined in Equation 6.20 can be extended to a nonlinear one by introducing a gamma factor g, as follows (6.22) Note that Equation 6.20 is a special case of Equation 6.22 for g=1. Values of the g factor smaller than 1 favor enhancement of low-contrast features, while values of g factor higher than 1 favor enhancement of high-contrast features. In this work, only linear local enhancement (g=1) is used. Figure 6.6 demonstrates global nonlinear coefficient mapping vs. adaptive linear mapping. The magnitude subband image at scale 2 is shown. It can be observed that the adaptive process emphasizes more the low-contrast edge information while avoiding overenhancement of high-contrast edge information. To obtain the processed image after denoising and contrast enhancement in the wavelet domain, two more steps are needed: first, polar-to-Cartesian transformation to calculate the horizontal and vertical wavelet coefficients from the magnitude and phase of the gradient vector, and second, reconstruction (inverse two-dimensional DWT) from the modified wavelet coefficients.
FIGURE 6.6 Example of mapping of gradient-magnitude coefficients at scale 2 corresponding to a mammographic region, (a) Original gradient-magnitude coefficients, (b) result of global mapping, (c) result of adaptive mapping.
Locally adaptive wavelet contrast enhancement
237
6.3.5 IMPLEMENTATION The method was implemented in Visual C++ 6.0 and was integrated in a previously developed medical-image visualization tool [74, 75]. Redundant wavelet transform routines have been taken from the C software package “Wave2” [76]. Software implementation of the method was simplified by exploiting an object-oriented C++ code framework for image processing that was established during the development of the above-mentioned tool. The benefits of ROI tools, wavelet coefficient display, and windowing operations have been helpful during development and refinement of the method. In addition, the capability of the tool to execute scripts written in the standard Windows VBscript language enabled batch processing and measurements. The methods to which the proposed method is compared are also implemented and integrated in this tool. The computer used for processing has a P4 processor running at 1.5 GHz and 1 GB of RAM. Computation time for a 1400×2300 (100-µm resolution) DDSM image is 96 sec, and the average computation time for the image sample was 122 sec. The computational cost of adaptive modification of wavelet coefficients accounts for the 25% of the total computation time. For a 2800×4600 (50-µm resolution) image, the computation time scales by a factor more than four (628 sec) due to RAM size limitation. Method computation time has been kept as low as possible by exploiting interpolation methods and the speed offered by the C++ language. In addition, to overcome virtualmemory inefficiency for limited-RAM configurations, a memory manager was used to swap wavelet coefficient images not currently processed to hard disk. Further reductions of processing time could be accomplished by using the lifting scheme to compute the RDWT [77], by exploiting techniques such as parallel processing and compiling optimization, and by employing faster computer systems. 6.3.6 TEST IMAGE DEMONSTRATION AND QUANTITATIVE EVALUATION To demonstrate the effectiveness of the denoising and enhancement processes a digital phantom was created. It contains five circular details, with contrasts ranging
FIGURE 6.7 Digital phantom with five circular objects of varying contrast and added noise.
Medical image analysis method
238
FIGURE 6.8 Digital phantom scan line profiles for global wavelet processing, (a) Original, (b) global denoising, and (c) global nonlinear enhancement. from 1 to 10%, added on a uniform background. Gaussian noise with normalized standard deviation 2% was also added. The resulting image is shown in Figure 6.7. A horizontal scan line passing through the middle of the objects was used to generate profiles of signal and multiscale gradient magnitudes. Figure 6.8 and Figure 6.9 show the original and processed profiles for global nonlinear and adaptive wavelet enhancement, respectively. The corresponding gradient-magnitude values are also shown to demonstrate the effect of processing on magnitude wavelet coefficients. It can be observed that adaptive processing preserves the sharpness of the object edges and also significantly enhances the contrast of the lowest-contrast object. The aim of the quantitative evaluation is to measure improvement of contrast for features of interest (i.e., MCs) with respect to their background [78]. However, to measure correctly the contrast of features, an exact definition of their borders is required. An approach to overcome this difficulty is to use simulated calcifications and lesions, enabling for quantitative assessment of image-enhancement algorithms. This approach allows also varying quantitatively the characteristics of input features and thus analyzing the behavior of the enhancement. Mathematical models of MCs and masses have been previously used to construct simulated lesions [49, 79]. The simulated lesions are blended in normal mammograms, and a contrast improvement index is derived for each type of lesion between original and processed images. In this study, we follow a similar approach to quantify contrast enhancement of MCs. A set of phantoms of simulated MC clusters was constructed, based on the
Locally adaptive wavelet contrast enhancement
239
FIGURE 6.9 Digital phantom scan line profiles for adaptive wavelet processing, (a) Original, (b) adaptive denoising, and (c) adaptive enhancement. assumption of Strickland and Hahn [80] that MCs can be modeled as two-dimensional Gaussian functions. The input parameters for each cluster were the size and amplitude of MCs, while positions of individual MCs were randomly determined and kept fixed. In this study, we used three MC sizes (400, 600, and 800 (µm) and ten amplitudes (ranging linearly between 10 and 400 gray-level values). Simulated clusters were subsequently blended into two normal mammograms characterized by density 3 (heterogeneously dense breast) and 4 (extremely dense breast), according to Breast Imaging Reporting and Data System (BIRADS) lexicon. The resulting images were processed with IW and AW methods using the same set of parameters for all images. Following this, the contrast of each MC in the cluster was measured, and the average of contrast values was derived to determine the cluster contrast. For contrast measurements, we adopted the optical definition of contrast introduced by Morrow et al. [45]. The contrast C of an object is defined as (6.23) where f is the mean gray level of the object (foreground), and b is the mean gray level of its background, defined as a region surrounding the object. Figure 6.10 and Figure 6.11 show graphs of IW and AW processed cluster contrast vs. original cluster contrast (size 600 µm). It can be observed that both methods produce MC contrast enhancement. However, the AW method is more effective, especially in the case of the dense breast parenchyma. Similar results were obtained for the other MC cluster sizes studied. Figure 6.12a and Figure 6.13a present two examples of original ROIs containing simulated clusters superimposed on heterogeneously dense parenchyma and dense
Medical image analysis method
240
parenchyma, respectively. Figure 6.12b, Figure 6.13b and Figure 6.12c, Figure 6.13c present the resulting ROIs after IW and AW processing, respectively.
FIGURE 6.10 Contrast enhancement of simulated MC cluster (600-µm size) superimposed on dense parenchyma of B-3617_1.RMLO of DDSM database, for IW and AW enhancement methods.
Locally adaptive wavelet contrast enhancement
241
FIGURE 6.11 Contrast enhancement of simulated MC cluster (600-µm size) superimposed on heterogeneously dense parenchyma of B3009_1.LMLO of DDSM database, for IW and AW enhancement methods.
FIGURE 6.12 ROIs with simulated MCs (600-µm size, 230 gray-level amplitude) on heterogeneously dense parenchyma, (a) Original region, (b) result of processing with IW
Medical image analysis method
242
enhancement method, and (c) result of processing with AW enhancement method.
FIGURE 6.13 ROIs with simulated MCs (600-µm size, 90 gray-level amplitude) on dense parenchyma, (a) Original region, (b) result of processing with IW enhancement method, and (c) result of processing with AW enhancement method. 6.4 OBSERVER PERFORMANCE EVALUATION The objective of the observer performance evaluation study is to validate the effectiveness of spatially AW enhancement and manual IW methods with respect to detection, morphology characterization, and pathology classification of MC clusters on dense breast parenchyma. IW was selected as a representative of one of the most effective contrast-enhancement methods. 6.4.1 CASE SAMPLE Our sample consists of 86 mammographic images, 32 of density 3 and 54 of density 4 according to BIRADS lexicon. Specifically, the sample consists of 43 mammographic images, each one containing a cluster of MC (29 malignant and 14 benign), and 43 images without abnormalities (normal) originating from the DDSM mammographic database [64]. Concerning MC cluster morphology, from 29 malignant and 14 benign clusters, 2 and 4 are punctuate, 24 and 5 are pleomorphic (granular),
Locally adaptive wavelet contrast enhancement
243
TABLE 6.2 Volume, Density, Morphology, and Pathology of Each Microcalcification Cluster for the 86 Mammographic Images (43 abnormal and 43 normal) Images with Cluster (abnormal) A/A
Volume
Mammogram
a
D
Images without Cluster (normal) M
b
P
c
A/A
Volume
Mammogram
Da
1
Cancer 01
B-3005_1.LCC
3
3 M
1 Cancer 01
B-3005_1.RCC
3
2
Cancer 01
B-3009_1.RCC
3
2 M
2 Cancer 01
B-3009_1.LCC
3
3
Cancer 01
B-3009_1.RMLO
3
2 M
3 Cancer 01
B-3009_1.LMLO
3
4
Cancer 06
A-1113_1.RCC
4
2 M
4 Cancer 06
A-1113.1 .LCC
4
5
Cancer 06
A-1113_1.RMLO
4
2 M
5 Cancer 06
A-1113_1.LMLO
4
6
Cancer 06
A-1152_1.LCC
4
2 M
6 Cancer 06
A-1152_1.RMLO
4
7
Cancer 06
A-1152_1.LMLO
4
2 M
7 Cancer 06
A-1185_1.LMLO
4
8
Cancer 06
A- 1185_1.RCC
4
2 M
8 Cancer 06
A-1188_1.LMLO
4
9
Cancer 06
A-1188_1.RMLO
4
2 M
9 Cancer 07
A-1220_1.LCC
4
10
Cancer 07
A-1220_1.RCC
4
2 M
10 Cancer 07
A-1238_1.RCC
4
11
Cancer 07
A-1238_1.LCC
4
2 M
11 Cancer 08
A-1508_1.LCC
3
12
Cancer 07
A-1238_1.LMLO
4
2 M
12 Cancer 08
A-1517_1.LCC
4
13
Cancer 08
A-1508_1.RCC
3
2 M
13 Cancer 08
A-1517_1.LMLO
4
14
Cancer 08
A-1517_1.RCC
4
2 M
14 Cancer 12
D-4110_1.LCC
3
15
Cancer 08
A-1517_1.RMLO
4
2 M
15 Cancer 12
D-4110_1.LMLO
3
16
Cancer 12
D-4110_1.RCC
3
3 M
16 Cancer 12
D-4158_1.RMLO
4
17
Cancer 12
D-4110_1.RMLO
3
3 M
17 Cancer 14
A-1897_1.RCC
4
18
Cancer 12
D-4158_1.LCC
4
2 M
18 Cancer 14
A-1897_1.RMLO
4
19
Cancer 12
D-4158_1.LMLO
4
2 M
19 Cancer 14
A-1905_1.RMLO
3
20
Cancer 14
A-1897_1.LCC
4
2 M
20 Cancer 14
A-1930_1.RMLO
4
21
Cancer 14
A-1897_1.LMLO
4
2 M
21 Cancer 15
B-3509_1.RCC
3
22
Cancer 14
A-1905_1.LCC
3
2 M
22 Cancer 15
B-3440_1.LCC
4
23
Cancer 14
A-1905_1.LMLO
3
2 M
23 Cancer 15
B-3440_1.LMLO
4
24
Cancer 14
A-1930_1.LMLO
4
2 M
24 Benign 04
B-3120_1.LCC
4
25
Cancer 15
B-3002_1.LCC
4
2 M
25 Benign 04
B-3120_1.LMLO
4
26
Cancer 15
B-3440_1.RCC
4
1 M
26 Benign 04
B-3363_1.LCC
4
27
Cancer 15
B-3440_1.RMLO
4
1 M
27 Benign 04
C-300_1.RCC
3
Medical image analysis method
244
28
Cancer 15
B-3509_1.LCC
3
2 M
28 Benign 06
B-3418_1.RCC
3
29
Cancer 15
B-3510_1.LCC
3
2 M
29 Benign 06
B-3418_1.RMLO
3
30
Cancer 01
B-3030_1.RMLO
3
2 B
30 Benign 06
B-3419_1.LCC
4
31
Benign 04
B-3120_1.RCC
4
1 B
31 Benign 06
B-3422_1.LCC
3
32
Benign 04
B-3120_1.RMLO
4
1 B
32 Benign 06
B-3423_1.RCC
3
33
Benign 04
B-3363_1.RCC
4
2 B
33 Benign 06
B-3425_1.RCC
4
34
Benign 04
C-300_1.LCC
3
2 B
34 Benign 06
B-3425_1.RMLO
4
35
Benign 06
B-3418_1.LCC
3
3 B
35 Benign 06
B-3426_1.RCC
3
36
Benign 06
B-3418_1.LMLO
3
3 B
36 Normal 07
D-4506_1.RCC
4
37
Benign 06
B-3419_1.RCC
4
2 B
37 Normal 07
D-4522_1.LCC
4
38
Benign 06
B-3422_1.RCC
3
2 B
38 Normal 07
D-4582_1.RCC
4
39
Benign 06
B-3423_1.LCC
3
1 B
39 Normal 07
D-4591_1.LCC
4
40
Benign 06
B-3425_1.LCC
4
4 B
40 Normal 09
B-3606_1.RCC
4
41
Benign 06
B-3425_1.LMLO
4
4 B
41 Normal 09
B-3614_1.LMLO
4
Images with Cluster (abnormal) a
Images without Cluster (normal)
A/A Volume
Mammogram
D
M
42
Benign 06
B-3426_1.LCC
3
1
43
Benign 06
C-407_1.RMLO
3
3
b
A/A Volume
Mammogram
Da
B
42
Normal 09
B-3617_1.RMLO
4
B
43
Normal 09
B-3653_1.RMLO
4
P
c
a
Density: 3=heterogeneously dense breast; 4=extremely dense breast. Morphology: 1=punctuate; 2=pleomorphic (granular); 3=amorphous; 4=fine linear branching (casting). c Pathology: B=benign; M=malignant. b
3 and 3 are amorphous, as well as 0 and 2 are fine linear branching (casting), respectively, according to DDSM database. Mammographic images selected correspond to digitization either with Lumisys or Howtek scanner, at 12 bits pixel depth, with spatial resolution of 50 µm and 43.5 µm, respectively. The images were subsampled to 100 µm to overcome restrictions in RAM and processing time. Table 6.2 provides the volume, name, and density of each mammographic image of the sample as offered by the DDSM database for both groups (images with MC clusters and normal ones), as well as the MC cluster morphology and pathology (malignant or benign). The entire sample (86 mammographic images) has been processed with two image contrast-enhancement methods, the manual IW and the AW methods. 6.4.2 OBSERVER PERFORMANCE Two general-purpose display LCD monitors (FlexScan L985EX, EIZO NANAO Corp., Ishikawa, Japan) were used for the observer performance study. Specifically, one monitor
Locally adaptive wavelet contrast enhancement
245
was used for presentation of each original mammogram of the sample, and the other was used for presentation of each corresponding IW- or AW-processed version of the mammogram. The two mammograms (original plus IW-processed or AW-processed) were simultaneously presented to radiologists. The evaluation study was performed utilizing a medical-image visualization software tool developed in our department [74, 75]. The sample (original-plus-IW-processed images as well as original-plus-AWprocessed images) was presented to two experienced, qualified radiologists specialized in mammography in a different, random order. The radiologists were asked to perform three tasks. First, they rated their interpretation of the absence or presence of a MC cluster in the mammographic image (detection task) using a five-point rating (R) scale: 1=definite presence of calcification clusters, 2 =probable presence, 3=cannot determine, 4=probable absence, and 5=definite absence. Second, in the case of the presence of a cluster (a rating 1 or 2 in the detection task), they were asked to assess its morphology, according to BIRADS lexicon, using one of four categories: 1=punctate, 2=pleomorphic (granular), 3 =amorphous, and 4=fine linear branching (casting). The third task for each radiologist was to classify each MC cluster with respect to pathology, benign or malignant (classification task), according to BIRADS lexicon, using again five levels (L) of confidence: level 1, definitely benign; level 2, probably benign, suggest short-interval follow-up; level 3, cannot determine, additional diagnostic workup required; level 4, probably malignant, biopsy should be considered; level 5, definitely malignant, biopsy is necessary. Radiologists were informed of the percentages (half images with clusters and half normal images) of the sample. Missed detections, namely cases with calcification clusters rated as 5, 4, or 3 in the detection process, were not classified in the classification process. During reading, the room illumination was dimmed and kept constant, while reading time and radiologist-monitor distance were not restricted. 6.4.3 STATISTICAL ANALYSIS In this study, three (null) hypotheses were tested: 1. There is no difference between the method based on original-plus-IW-processed images and the method based on original-plus-AW-processed images with respect to MC detection (presence or absence) task. 2. There is no difference between the two above-mentioned methods with respect to MC morphology task. 3. There is no difference between the two methods with respect to MC pathologyclassification (benign or malignant) task. Two statistical analyses were used. One was based on a nonparametric statistical test (Wilcoxon signed ranks test), and the other was based on ROC curves. Both of them are briefly described in the following subsections. 6.4.3.1 Wilcoxon Signed Ranks Test The Wilcoxon signed ranks test [81] was selected for the analysis of radiologists’ responses because the data can be transformed as difference responses/scores from two
Medical image analysis method
246
related subsamples. (One subsample consists of the original-plus-IW-processed images and the other of original-plus-AW-processed images). For each case of the sample (abnormal or normal original mammogram), two responses are provided, one response from the original-plus-IW-processed image (RIW), and one from the original-plus-AWprocessed image (RAW). Then the difference in responses is given by D=RIW−RAW for each case (paired data). These differences indicate which member of a pair is “greater than” (i.e., the sign of the difference between any pair) and also can be ranked in order of absolute size. In the statistical analysis, pairs with D=0 are not taken into account. This test is used for statistical analysis of radiologists’ responses in MC clusters detection task as well as pathology classification task, because these two tasks employ two related subsamples and yield difference scores that can be ranked in order of absolute magnitude. 6.4.3.2 ROC Analysis ROC analysis is a statistical method used in evaluation of medical images [82–84]. This method analyzes observer responses usually obtained from a five- or six-point rating scale (or confidence levels). There are several free software packages available for the analysis of ROC data [85, 86]. In this study, concerning the MC clusters detection task, the radiologists’ responses were analyzed using the ROCKIT program (Metz CE, University of Chicago, IL) [87]. A conventional binormal ROC curve was individually fitted to the scores of each radiologist with a maximum-likelihood procedure, and the area under the estimated ROC curve (Az), standard error (SE), as well as the asymmetric 95% confidence interval (CI) was calculated [88, 89]. Differences in Az value, both for the responses of each radiologist independently and combining the responses of the two radiologists (pooled data), were statistically analyzed using two-tailed student’s t-test. Derived values of p<0.05 indicate superiority of one contrast-enhancement method against the other, whereas values of p>0.05 indicate no difference between the two methods. 6.4.4 RESULTS 6.4.4.1 Detection Task Table 6.3 shows the number of cases per category of difference with respect to the rating (R) between the IW and the AW methods (D=RIW−RAW) for the MC cluster detection task. In this table, we observe that more differences are zero (D=0), and the number of “positive” differences (D>0) is almost equal to the number of “negative” differences (D<0) for both abnormal and normal mammographic images for each of the two radiologists. More analytically, the distribution of the differences between the two methods for abnormal and normal images for both radiologists is provided in Table 6.4. Concerning abnormal images, the frequency of positive differences (D>0) represents the number of images where the AW method is superior to the IW method. The frequency of negative differences (D<0) represents the number of images where
Locally adaptive wavelet contrast enhancement
247
TABLE 6.3 Frequency of Abnormal and Normal Images with Respect to Differences in Ratings (D=RIW–RAW) between the Original-Plus-IW and the Original-Plus-AW Methods in the Microcalcification-Cluster Detection Task for Two Radiologists Radiologist A D=RIW−RAW
Abnormal
Normal
Radiologist B Total
Abnormal
Normal
Total
D>0
6
9
15
7
7
14
D=0
31
22
53
29
27
56
D<0
6
12
18
7
9
16
Total
43
43
86
43
43
86
TABLE 6.4 Frequency of Differences in Ratings (D=RIW−RAW) between the Original-Plus-IW and the Original-Plus-AW Methods in the Microcalcification-Cluster Detection Task for Abnormal and Normal Mammographic Images Analyzed by Two Radiologists Abnormal
Normal
Radiologist A Radiologist B D=RIW– RAW
D>0
D<0
D>0
D<0
Radiologist A Radiologist B D=RIW– RAW
D<0
D>0
D<0
D>0
D=±1 2→1
1
5
4
4
1→2
0
1
0
0
3→2
2
0
0
1
2→3
1
0
1
0
4→3
0
0
0
0
3→4
3
3
3
3
5→4
0
0
0
0
4→5
3
3
1
2
D=±2 3→1
1
0
1
1
1→3
0
0
0
0
4→2
0
0
0
0
2→4
1
0
2
0
5→3
0
0
0
0
3→5
2
2
0
2
Medical image analysis method
248
D=±3 4→1
1
1
1
1
1→4
1
0
1
0
5→2
0
0
1
0
2→5
0
0
1
0
D=±4 5→1
1
0
0
0
1→5
1
0
0
0
Total
6
6
7
7
Total
12
9
9
7
Note: D=±i (with i=1 to 4) means that the difference is ±i. For abnormal images, the frequency of positive differences (D>0) represents the number of images in which the AW method is superior to the IW method; the opposite occurs for negative differences (D<0). However, for normal images, the frequency of negative differences (D<0) represents the number of images in which the AW method is superior to the IW method; the opposite occurs for positive differences (D>0).
the IW method is superior to the AW method. The total number of positive differences (D>0) is equal to the total number of negative differences (D<0) for both radiologists. Specifically, 50% (6/12) and 57% (8/14) of the differences were obtained from rating 2 (probable presence) to 1 (definite presence) for radiologists A and B, respectively. The only significant difference for the two methods is that both radiologists detected one more MC cluster by means of the AW method. Specifically, the rating of both radiologists was 5 (definite absence) for the IW method, and their corresponding ratings were 1 (definite presence) and 2 (probable presence), respectively, for the AW method. Concerning normal images, the frequency of negative differences (D<0) represents the number of images where the AW method is superior to the IW method, and the opposite occurs for positive differences (D>0). The total number of negative differences (D<0) is almost equal to the total number of positive differences (D>0) for both radiologists. Specifically,
TABLE 6.5 Results of Wilcoxon Statistical Test for Microcalcification-Cluster Detection Task for Two Radiologists Mammographic Images
Radiologist A N
a
SI
b
p-value
c
Radiologist B SD
d
N
a
SIb
p-valuec
SDd
Abnormal
12
T+:45
0.6772
NS
14
T+: 56.5
0.8316
NS
Normal
21
z: 1.01
0.3124
NS
16 z: 0.80
0.4238
NS
Entire sample
33
z: 1.10
0.2714
NS
30 z: 0.76
0.4472
NS
a
N: number of mammographic images. SI statistical index. The statistical index T+ (sum of positive ranks) was used for small samples (abnormal images), and the statistical index z-value, which assumes a normal distribution, was used for large samples (abnormal images and entire sample). c p-value: probability values were calculated for two-tailed statistical tests. d SD: statistical differences are not significant (NS). b
Locally adaptive wavelet contrast enhancement
249
67% (14/21) and 63% (10/16) of the differences were obtained with one rating difference (D=±1) for radiologists A and B, respectively. The only significant difference for the two methods is that both radiologists rated correctly two more normal images by means of the AW method. Specifically, the ratings of radiologist A for these two clusters were 1 (definite presence) for the IW method, and the corresponding ratings were 4 (probable absence) and 5 (definite absence), respectively, for the AW method. The ratings of radiologist B for the two clusters were 1 (definite presence) and 2 (probable presence) for the IW method, and the corresponding ratings were 4 (probable absence) and 5 (definite absence), respectively, for the AW method. The statistical results using the Wilcoxon signed ranks test for paired data are presented in Table 6.5. Statistical analysis was performed for abnormal and normal mammographic images, as well as for the entire sample (abnormal plus normal images) for both radiologists. The differences are not statistically significant (p> 0.05), indicating that the two contrast-enhancement methods, IW and AW methods, are equivalent with respect to detection performance of MC clusters. In other words, use of the IW-processed images with the original ones aids in the detection of MC clusters in a similar way as the AW-processed images do. In Table 6.6, Az values, SEs, as well as asymmetric 95% CIs for both the original-plusIW-processed and the original-plus-AW-processed images for each radiologist are provided. Combining radiologists’ responses, the Az values are 0.938 and 0.981, and their corresponding CIs are (0.892, 0.968) and (0.956, 0.992) for the original-plus-IWprocessed and the original-plus-AW-processed mammograms, respectively. Conventional binormal ROC curves for pooled data are presented in Figure 6.14. Although detection performance for the method based on the original-plus-AW-processed images is higher than those for the method based on the original-plus-IW-processed images, the differences in Az values are not statistically significant for each radiologist independently, as well as for the two radiologists as a whole
TABLE 6.6 Statistical Results for the OriginalPlus-IW-Processed and the Original-Plus-AWProcessed Images for Two Radiologists Independently and Collectively (pooled data) Original Plus Intensity Windowing Radiologist
Az
a
b
SE
CI
c
Original Plus Adaptive Wavelet A za
SEb
CIc
A
0.936
0.028
(0.862, 0.975)
0.984
0.011
(0.947, 0.996)
B
0.941
0.026
(0.871, 0.977)
0.978
0.013
(0.935, 0.994)
Pooled
0.938
0.019
(0.892, 0.968)
0.981
0.009
(0.956, 0.992)
a
Az: area under the ROC (receiver operating characteristics) curve. SE: standard error. c CI: asymmetric 95% confidence interval (CI) b
Medical image analysis method
250
FIGURE 6.14 ROC curves for MC cluster-detection task, combining the responses of the two radiologists (pooled data) for both the originalplus-AW-processed (Az=0.981) and the original-plus-IW-processed (Az=0.938) mammographic images. (student’s t-test, p>0.05), indicating that the two contrast-enhancement methods have high (>0.90) but similar detection performance. These differences in Az values are derived from the small differences in radiologists’ ratings between the two methods in abnormal and normal mammographic images, as discussed above. Representative examples of original, IW-processed, and AW-processed images containing MC clusters are presented in Figure 6.15 and Figure 6.16. Figure 6.15a and Figure 6.16a present two CC-view mammographic images, corresponding to extremely dense breasts, with arrows indicating MC clusters. Figure 6.15b, Figure 6.16b and Figure 6.15c, Figure 6.16c present the two mammographic images after processing with the IWand AW-enhancement methods, respectively. Magnified ROIs indicating the MC clusters for original (Figure 6.15d, Figure 6.16d), IW-processed (Figure 6.15e, Figure 6.16e), and AW-processed images (Figure 6.15f, Figure 6.16f) are also provided. In Figure 6.15, the MC cluster was not detected in the method based on original-plus-IW-processed images because the ratings were 3 (cannot determine) and 5 (definite absence) for the two
Locally adaptive wavelet contrast enhancement
251
radiologists. In the method based on original-plus-AW-processed images, the two radiologists were absolutely sure for the presence of the MC cluster (rating 1 for both radiologists). In Figure 6.16, the MC cluster was detected in both methods, but the confidence level for the MC cluster presence was higher in the original-plus-AWprocessed image for both radiologists because the ratings were 2 (probable presence) for the original-plus-IW-processed image and 1 (definite presence) for the original-plus-AWprocessed image. 6.4.4.2 Morphology Characterization Task Responses of each of the two radiologists with respect to morphology characterization of 43 MC clusters of the sample were compared with those of BIRADS lexicon for each of the two methods. Table 6.7 shows the number and percentage of true responses in cluster morphology characterization for each contrast-enhancement method. The average morphology characterization is higher (37/43) with the method based on original-plusAW-processed images as compared with morphology characterization with the method based on original-plus-IW-processed images (29/43). The average cluster morphology characterization increase achieved is 18.6%, indicating that the morphology of the MC clusters can be more accurately characterized with the AW method. In Table 6.8, frequencies of shiftings in morphology characterization of MC clusters are provided for both radiologists. Specifically, the frequency of shifting in the radiologists’ responses from the AW to the IW method is listed in the column named TIW (true-intensity windowing). For instance, the frequency of 3→2 shifting is 2 and 0 for radiologists A and B, respectively. However, the frequency of shifting in the radiologists’ responses from the IW to the AW method is listed in the column named TAW (true adaptive wavelet). For instance, the frequency of 3→2 shifting is 5 and 4 for radiologists A and B, respectively. The total number of shiftings in morphology characterization is 19 (44%) and 21 (49%) for radiologists A and B, respectively. Although the number of true shiftings is higher (i.e., morphology characterization is more accurate) in the case of the method based on original-plus-AW-processed images for both radiologists, there is no statistically significant difference between the two methods (twotailed sign statistical test, p>0.05). 6.4.4.3 Pathology Classification Task In this task, the radiologists’ responses were analyzed by means of confidence levels. For the 14 benign MC clusters of the case sample, true responses were considered when radiologists’ confidence levels were 1 or 2. On the other hand, for the 29
Medical image analysis method
252
FIGURE 6.15
(a) Original mammogram (B3419_1.RCC) containing MC cluster (arrow); (b, c) results of processing with the IW- and the AW-enhancement methods, respectively; (d-f) magnified ROIs containing the MC cluster for original, IW-processed, and AWprocessed image, respectively. MC cluster was detected only in the method based on original-plus-AWprocessed image.
Locally adaptive wavelet contrast enhancement
253
FIGURE 6.16
TABLE 6.7 Individual and Averaged Number (N) and Percentage of True Responses in Morphology Characterization of 43 Microcalcification Clusters for Two Radiologists Using the Original-Plus-IW and Original-PlusAW Methods Radiologist A
Radiologist B
Average
Enhancement Method
N Percentage
N Percentage
N Percentage
Original plus intensity windowing
30
69.8 28
65.1 29
67.4
Original plus adaptive wavelet
37
86.0 37
86.0 37
86.0
TABLE 6.8 Frequency of Shiftings in Morphology Characterization of Microcalcification Clusters for TIW and TAW Methods as Applied by Two Radiologists
Medical image analysis method
254
Radiologist A Morphology Shifting
TIW
Radiologist B
TAW
TIW
TAW
0
1
1: Punctuate 2→1
0
0
2: Pleomorphic (granular) 0→2
1
4
3
3
1→2
0
1
0
2
3→2
2
5
0
4
4→2
0
0
1
2
3: Amorphous 1→3
0
0
0
2
2→3
1
2
1
0
4→3
0
1
0
0
4: Fine Linear Branching (casting) 1→4
1
0
1
0
3→4
1
0
0
1
Total
6
13
6
15
Note: TIW (true intensity windowing) shows frequency of shiftings from adaptive wavelet method to intensity windowing one; TAW (true adaptive wavelet) shows frequency of shiftings from intensity windowing method to adaptive wavelet one.
TABLE 6.9 Individual and Averaged Percentage of True Responses in Pathology Classification (benign or malignant) of the 43 Microcalcification Clusters for Two Radiologists Using Two Contrast-Enhancement Methods Enhancement Method
Radiologist A
Radiologist B
Average
Original plus intensity windowing
62.8
74.4
69.8
Original plus adaptive wavelet
65.1
76.7
72.1
—
—
62.8
Original DDSM assessment
a
a
Corresponding percentage derived by radiologists’ assessments provided by the Digital Database for Screening Mammography (DDSM).
TABLE 6.10 Frequency of Benign and Malignant Microcalcification Clusters of the
Locally adaptive wavelet contrast enhancement
255
Case Sample with Respect to Difference in Confidence Levels (D=LIW—LAW) between the Original-Plus-IW and the Original-Plus-AW Methods in the Microcalcification-Cluster Classification Task for Two Radiologists Benign Cluster D=LIW–LAW
Radiologist A
Malignant Cluster
Radiologist B
Radiologist A
Radiologist B
D>0
5
4
7
7
D=0
7
3
12
15
D<0
1
4
7
5
Total
13
11
26
27
Note: Missed detections of benign or malignant microcalcification clusters are not taken into account.
malignant MC clusters of the case sample, true responses were considered when confidence levels were 3, 4, or 5. All other cases were considered as misclassification responses. Table 6.9 shows the percentages of true responses in pathology classification of the 43 MC clusters of the sample for each radiologist as well as their average for both contrast-enhancement methods. The average pathology classification of MC clusters is almost equal in both methods. Specifically, it is 72.1% (31/43) with the method based on original-plus-AW-processed images and 69.8% (30/43) with the method based on original-plus-IW-processed images. Similar accuracy for MC cluster classification is achieved with both contrast-enhancement methods. These classification accuracies are low, but they are higher than those derived by radiologists’ assessments (62.8%) provided from the DDSM database. Table 6.10 shows the number of cases per category of difference in the confidence levels (L) between the original-plus-IW and the original-plus-AW methods D=LIW− LAW. In this table, we observe that more differences are zero (D=0) and that the number of positive differences (D>0) is almost equal to the number of negative differences (D<0) for both benign and malignant MC clusters for each of the two radiologists. In the case of no difference in confidence levels (D=0), Table 6.11 shows the numbers of correctly classified (true) and misclassified (false) cases for benign and malignant clusters of the sample for both radiologists. For benign clusters, the true classification cluster rate is sufficiently low, 43% for radiologist A and 33% for radiologist B. In case of malignant clusters, the true classification cluster rate is sufficiently high, 100% for radiologist A and 87% for radiologist B. As a result, both contrast-enhancement methods misclassified the same number of clusters, especially benign ones.
Medical image analysis method
256
TABLE 6.11 Frequency of Correct (true) and Incorrect (false) Responses in Classification of Benign and Malignant Clusters in Cases Where There Were No Differences in Confidence Levels (D=0) between the Original-Plus-IW and the Original-Plus-AW Methods as Applied by Two Radiologists Benign Cluster
Malignant Cluster
Pathology Classification Radiologist A Radiologist B Radiologist A Radiologist B True
3
1
12
13
False
4
2
0
2
Total
7
3
12
15
Table 6.12 shows the distribution of the differences between the two methods for benign and malignant clusters for both radiologists. Concerning benign clusters, the frequency of positive differences (D>0) represents the number of clusters where the AW method is superior to the IW method. The frequency of negative differences (D<0) represents the number of images where the IW method is superior to the AW method. The total number of positive differences (D>0) is higher than the total number of negative differences (D<0) for radiologist A and is equal for radiologist B. Specifically, 83% (5/6) and 62% (5/8) of the differences were obtained from one level difference (D=±1) for radiologist A, and two levels difference (D =±2) for radiologist B, respectively. The only significant difference for the two methods is that both radiologists accurately classified two more benign MC clusters by means of the AW method. Specifically, the confidence levels of radiologist A for these two clusters were 3 (additional diagnostic workup is required) and 5 (definitely malignant) for the IW method, and the corresponding confidence levels were 2 (probably benign) in both clusters for the AW method. The confidence levels of radiologist B for the two clusters were 4 (probably malignant) and 5 (definitely malignant) for the IW method, and the corresponding confidence levels were 2 (probably benign) in both clusters for the AW method. Concerning malignant clusters, the frequency of negative differences (D<0) represents the number of images where the AW method is superior to the IW method, and the opposite occurs for positive differences (D>0). The total number of negative differences (D<0) is lower than the total number of positive differences (D>0) for radiologist B and equal for radiologist A. Specifically, 64% (9/14) and 50% (6/12) of the differences were obtained from one level difference (D=±1) for radiologist A, and two levels difference (D=±2) for radiologist B, respectively. There is no significant difference between the two methods with respect to malignant clusters.
Locally adaptive wavelet contrast enhancement
257
TABLE 6.12 Frequency of Differences in Confidence Levels (D=LIW−LAW) between the Original-Plus-IW and the Original-Plus-AW Methods in the Microcalcification-Classification Task for Benign and Malignant Clusters of the Case Sample for Two Radiologists Benign Cluster Radiologist A D=LIW−LAW
D>0
D<0
Malignant Cluster Radiologist B
D>0
D<0
Radiologist A D=LIW−LAW
Radiologist B
D<0
D>0
D<0
D>0
D=±1 2→1
0
0
0
0
1→2
0
1
0
0
3→2
1
0
0
0
2→3
1
0
0
0
4→3
0
0
0
0
3→4
2
0
1
0
5→4
3
1
0
1
4→5
3
2
0
2
D=±2 4→2
0
0
3
2
2→4
1
4
2
3
5→3
0
0
0
0
3→5
0
0
1
0
D=±3 4→1
0
0
0
0
1→4
0
0
0
1
5→2
1
0
1
0
2→5
0
0
1
1
D=±4 5→1
0
0
0
1
1→5
0
0
0
0
Total
5
1
4
4
Total
7
7
5
7
Note: D=±i (with i=1 to 4) means that the difference is ±i. For benign clusters, the frequency of positive differences (D>0) represents the number of clusters that the AW method is superior to the IW method; the opposite occurs for negative differences (D<0). However, for malignant clusters, the frequency of negative differences (D<0) represents the number of clusters that the AW method is superior to the IW method; the opposite occurs for positive differences (D>0).
Medical image analysis method
258
The statistical results using the Wilcoxon signed ranks test for paired data are presented in Table 6.13. Statistical analysis was performed for benign and malignant clusters, as well as for the entire abnormal sample (benign plus malignant clusters) for both radiologists. The differences are not statistically significant (p > 0.05), indicating that the two contrast-enhancement methods are equivalent with respect to pathology classification performance of MC clusters. In other words, use of the IW-processed images with original ones aids the pathology classification (benign vs. malignant) of MC clusters in a similar way as the AW-processed images do. Representative examples of original, IW-processed, and AW-processed ROIs of mammographic images containing MC cluster (arrows) are presented in Figure 6.17
TABLE 6.13 Results of Wilcoxon Statistical Test for Pathology-Classification Task for Two Radiologists Radiologist A N Benign
a
06
SI
b
T+: 18 +
p-value
c
0.1562
Radiologist B SD
d
NS
N
a
SIb
p-valuec
SDd
08
T+: 20
0.8438
NS
+
Malignant
14
T : 42
0.5416
NS
12
T : 32.5
0.7054
NS
Entire sample
20
z: 0.26
0.7948
NS
20
z: 0.28
0.7794
NS
a
N: number of mammographic images. SI statistical index. The statistical index T+ (sum of positive ranks) was used for small samples (benign and malignant), and the statistical index z-value, which assumes a normal distribution, was used for large samples (entire sample). c p-value: probability values were calculated for two-tailed statistical tests. d SD: statistical differences are not significant (NS). b
and Figure 6.18. In Figure 6.17, the benign MC cluster was misclassified by both radiologists in the method based on original-plus-IW-processed images (confidence level 5: definitely malignant). In the method based on original-plus-AW-processed images, the two radiologists rated the MC cluster with confidence level 2 (probably benign). In Figure 6.18, the MC cluster was correctly classified as malignant in both methods (confidence level 4 for each radiologist), but was incorrectly classified as probably benign (short-interval follow-up) by radiologists’ assessment provided by the DDSM database. 6.5 DISCUSSION Multiscale wavelet processing is one of the most promising approaches to mammographic image enhancement. The spatially adaptive wavelet (AW) enhancement method attempts to optimize medical-image contrast by local adaptive transformation of gradient-magnitude values obtained by the redundant wavelet transform of Mallat and Zhong. The method is generic and can also be applied to medical images or other
Locally adaptive wavelet contrast enhancement
259
imaging modalities. Emphasis in this work is directed to finding the best way to treat wavelet coefficients. However, the identification of the most appropriate basis functions for enhancing specific types of mammographic features needs further investigation. Denoising performance, and specifically local threshold estimation, could benefit from recent advances in the wavelet denoising field, such as context modeling to group coefficients according to their activity level and estimate the local standard deviation of the signal from coefficients belonging to the same context group, as proposed by Chang et al. [90]. In addition, a noise equalization preprocessing step would be beneficial because mammographic noise often has a dependence on the gray level and signal activity [91], especially in the case of certain digitizers [92, 93].
FIGURE 6.17 Magnified ROIs containing a benign MC cluster from (a) original, (b) IW-processed, and (c) AW-processed mammographic image (B-3425.1_LCC). The MC cluster was correctly classified only in the method based on original-plus-AW-processed image. Radiologists’ assessment for the method based on original-plus-IWprocessed image, as well as radiologists’ assessment provided by DDSM database was false (positive), since they both misclassified as definitely malignant and suspicious abnormality the MC cluster, respectively.
Medical image analysis method
260
FIGURE 6.18 Magnified ROIs containing a malignant MC cluster from (a) original, (b) IW-processed, and (c) AW-processed mammographic image (D-4110.1_RMLO). The MC cluster was correctly classified in both enhancement methods, in contrast to radiologists’ assessment provided by DDSM database, rated as probably benign. Beyond image contrast enhancement, a more interesting extension could be toward lesion-specific enhancement by exploiting interscale analysis [94] or orientation information [95, 96]. In this study, the effectiveness of the AW enhancement method was assessed and compared with the IW enhancement method with respect to detection, morphology characterization, and pathology classification of MC clusters on dense breast parenchyma. The detection accuracy of the method based on original-plus-AW-processed images is higher than those of the method based on original-plus-IW-processed images, but the differences in ratings (Wilcoxon signed ranks test), as well as in Az values (ROC test), are not statistically significant, indicating that the two contrast-enhancement methods have similar detection performance. The detection performance of both contrastenhancement methods is high (>0.93) in a difficult task, such as MC clusters on dense breast parenchyma. With respect to morphology characterization of MC clusters, the method based on original-plus-AW-processed images is more accurate (18.6% increase), but the differences between the two methods are not statistically significant, as proved by two-tailed sign statistical test. Concerning the pathology classification task, similar performance (≈70%) was achieved with both contrast-enhancement methods (Wilcoxon signed ranks test). Although this classification accuracy is relatively low, it is higher than those derived by radiologists’ assessments of DDSM database (≈63%), indicating the increased difficulty of classifying as benign or malignant MC clusters on dense parenchyma. The advantage of the AW enhancement method is the use of adaptive denoising and enhancement stages, which make the enhancement method less dependent on method parameter settings, an issue frequently associated with the performance of image postprocessing techniques [35, 58–62, 97–99], such as the manual IW enhancement
Locally adaptive wavelet contrast enhancement
261
method studied. However, the AW method, besides enhancing MC clusters, inevitably enhances the MCs’ background parenchyma within the adapting window. Further refinement of the method to selective lesion vs. background adaptation is expected to further improve its performance. Finally, a more complete evaluation study should consider: (a) a larger case sample, (b) participation of more radiologists, (c) detection and classification tasks for different types of lesions, such as circumscribed and stellate masses, and (d) intercomparisons with other contrast-enhancement methods proposed for soft-copy display of mammograms. ACKNOWLEDGMENT This work is supported by the European Social Fund (ESF), the Operational Program for Educational and Vocational Training II (EPEAEK II), and particularly the Program PYTHAGORAS and by the Caratheodory Programme (2765) of the University of Patras. We also thank the staff of the Department of Radiology at the University Hospital of Patras for their contribution to this work. REFERENCES 1. Dean, P.B., Overview of breast cancer screening, in Proc. 3rd Int. Workshop on Digital Mammography, Doi, K., Giger, M.L., Nishikawa, R.M., and Schmidt, R.A., Eds., Elsevier Science, Amsterdam, 1996, p. 19. 2. Landis, S.H. et al., Cancer statistics, Cancer J. Clin., 48, 6, 1998. 3. Smigel, K., Breast cancer death rates decline for white women, J. Nat. Cancer Inst., 87, 73, 1995. 4. Candelle, L.A. et al., Update on breast cancer mortality, Health Reports, 9, 31, 1995. 5. Schneider, M.A., Better detection: improving our chances, in Proc. 5th Int. Workshop on Digital Mammography, Yaffe, M.J., Ed., Medical Physics Publishing, Madison, WI, 2000, p. 3. 6. Page, D.L. and Winfield, A.C., The dense mammogram, AJR, 147, 487, 1986. 7. Jackson, V.P et al., Imaging of the radiographically dense breast, Radiology, 188, 297, 1993. 8. Tabar, L. and Dean, P.B., Anatomy of the breast, in Teaching Atlas of Mammography, 2nd ed., Frommhold, W. and Thurn, P., Eds., Thieme, New York, 1985, chap. 1. 9. Dahnert, W., Breast, in Radiology Review Manual, Dahnert, W., Ed., Williams and Wilkins, Baltimore, 1996, chap. 3. 10. Giger, M.L., Huo, Z., Kupinski, M., and Vyborny, C.J., Computer-aided diagnosis in mammography, in Handbook of Medical Imaging, Vol. 2, Medical Image Processing and Analysis, Sonka, M. and Fitzpatrick, J.M., Eds., SPIE Press, Bellingham, WA, 2000, p. 915. 11. Karssemeijer, N., Computer-aided detection and interpretation in mammography, in Proc. 5th Int. Workshop on Digital Mammography, Yaffe, M.J., Ed., Medical Physics Publishing, Madison, WI, 2000, p. 243. 12. Birdwell, R.L. et al., Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection, Radiology, 219, 192, 2001. 13. Freer, T.W. and Ulissey, M.J., Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center, Radiology, 220, 781, 2001. 14. Nishikawa, R., Assessment of the performance of computer-aided detection and computeraided diagnosis systems, Semin. Breast Disease, 5, 217, 2002.
Medical image analysis method
262
15. Nishikawa, R.M., Detection of microcalcifications, in Image-Processing Techniques for Tumor Detection, Strickland, R.N., Ed., Marcel Dekker, New York, 2002, chap. 6. 16. Karssemeijer, N., Detection of masses in mammograms, in Image-Processing Techniques for Tumor Detection, Strickland, R.N., Ed., Marcel Dekker, New York, 2002, chap. 8. 17. Kallergi, M., Computer-aided diagnosis of mammographic microcalcifications clusters, Med. Phys., 31, 314, 2004. 18. Panayiotakis, G. et al., Valuation of an anatomical filter-based exposure equalization technique in mammography, Br. J. Radiol., 71, 1049, 1998. 19. Skiadopoulos, S. et al., A phantom-based evaluation of an exposure equalization technique in mammography, Br. J. Radiol., 72, 977, 1999. 20. Bick, U. et al., Density correction of peripheral breast tissue on digital mammograms, RadioGraphics, 16, 1403, 1996. 21. Byng, J.W, Critten, J.P., and Yaffe, M.J., Thickness-equalization processing for mammographic images, Radiology, 203, 564, 1997. 22. Panayiotakis, G., Equalization techniques in mammography, in Proc. VI Int. Conf. on Med. Phys., Kappas, C. et al., Eds., Monduzzi Editore, Bologna, 1999, p. 163. 23. Highnam, R. and Brady, M., Mammographic Image Analysis, Kluwer Academic, Dordrecht, Netherlands, 1999. 24. Stefanoyiannis, A.P. et al., A digital density equalization technique to improve visualization of breast periphery in mammography, Br. J. Radial., 73, 410, 2000. 25. Attikiouzel, Y. and Chandrasekhar, R., DSP in mammography, in Proc. 14th Int. Conf. on Digital Signal Processing, Vol. 1, Skodras, A.N. and Constantinides, A.G., Eds., Typorama, Greece, 2002, p. 29. 26. Stefanoyiannis, A. et al., A digital equalization technique improving visualization of dense mammary gland and breast periphery in mammography, Eur. J. Radiol., 45, 139, 2003. 27. Mendez, A. et al., Automatic detection of breast border and nipple in digital mammograms, Comp. Meth. Prog. Biom., 49, 253, 1996. 28. Chandrasekhar, R. and Attikiouzel, Y., A simple method for automatically locating the nipple in mammograms, IEEE Trans. Medical Imaging, 16, 483, 1997. 29. Katartzis, A. et al., Model-based technique for the measurement of skin thickness in mammography, Med. Biol. Eng. Comp., 40, 153, 2002. 30. Yaffe, M.J., Technical aspects of digital mammography, in Proc. Int. Workshop on Digital Mammography, Doi, K. et al., Eds., Elsevier Science, Amsterdam, 1996, p. 33. 31. Evertsz, C.J.G. et al., Soft-copy reading environment for screening mammography-screen, in Proc. 5th Int. Workshop on Digital Mammography, Yaffe, M.J., Ed., Medical Physics Publishing, Madison, WI, 2000, p. 566. 32. Castleman, K.R., Digital Image Processing, Prentice Hall, Englewood Cliffs, NJ, 1979. 33. Aylward, S.R., Hemminger, M.B., and Pisano, E.D., Mixture modelling for digital mammogram display and analysis, in Proc 4th Int. Workshop on Digital Mammography, Karssemeijer, N. et al., Eds., Kluwer Academic, Dordrecht, Netherlands, 1998, p. 305. 34. Pizer, S.M. et al., Adaptive histogram equalization and its variations, Comput. Vision Graph. Image Process., 39, 355, 1987. 35. Pisano, E.D. et al., Contrast-limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms, J. Digit. Imaging, 11, 193, 1998. 36. Fahnestock, J.D. and Schowengerdt, R.A., Spatially variant contrast enhancement using local range modification, Opt. Eng., 22, 378, 1983. 37. Levi, L., Unsharp masking and related image-enhancement techniques, Comput. Graph. Image Proc., 3, 163, 1974. 38. Lee, J.S., Digital image enhancement and noise filtering by using local statistics, IEEE Trans. Pattern Anal. Machine Intell, 2, 165, 1980.
Locally adaptive wavelet contrast enhancement
263
39. Tahoces, P. et al., Enhancement of chest and breast radiographs by automatic spatial filtering, IEEE Trans. Medical Imaging, 10, 330, 1991. 40. Kim, J.K. et al., Adaptive mammographic image enhancement using first derivative and local statistics, IEEE Trans. Medical Imaging, 16, 495, 1997. 41. Ji, T.L., Sundareshan, M.K., and Roehrig, H., Adaptive image contrast enhancement based on human visual properties, IEEE Trans. Medical Imaging, 13, 573, 1994. 42. Chang, D. and Wu, W., Image contrast enhancement based on a histogram transformation of local standard deviation, IEEE Trans. Medical Imaging, 17, 518, 1998. 43. Dhawan, A.P., Buelloni, G., and Gordon, R., Enhancement of mammographic features by optimal adaptive neighborhood image processing, IEEE Trans. Medical Imaging, 5, 8, 1986. 44. Dhawan, A.P. and Royer, E.L., Mammographic feature enhancement by computerized image processing, Comput. Methods Programs Biomed., 27, 23, 1998. 45. Morrow, W.M., Paranjape, R.B., and Rangayyan, R.M., Region-based contrast enhancement of mammograms, IEEE Trans. Medical Imaging, 11, 392, 1992. 46. Rangayyan, R.M. et al., Region-based adaptive contrast enhancement, in ImageProcessing Techniques for Tumor Detection, Strickland, R.N., Ed., Marcel Dekker, New York, 2002, chap. 9. 47. Mallat, S., A Wavelet Tour of Signal Processing, Academic Press, London, 1998. 48. Mallat, S. and Zhong, S., Characterisation of signals from multiscale edges, IEEE Trans. Pattern Anal. Machine Intell., 14, 710, 1992. 49. Laine, A. et al., Mammographic feature enhancement by multiscale analysis, IEEE Trans. Medical Imaging, 13, 725, 1994. 50. Laine, A., Fan, J., and Yang, W., Wavelets for contrast enhancement of digital mammography, IEEE Eng. Med. Biol., 14, 536, 1995. 51. Laine, A. et al., Mammographic image processing using wavelet processing techniques, Eur. Radiol., 5, 518, 1995. 52. Sakellaropoulos, P., Costaridou, L., and Panayiotakis, G., Integrating wavelet-based mammographic image visualisation on a Web browser, in Proc. Int. Conf. Image Proc., 2, 873, 2001. 53. Lu, J. and Heally, D.M., Contrast enhancement of medical images using multiscale edge representation, Opt. Eng., 33, 2151, 1994. 54. Sakellaropoulos, P., Costaridou, L., and Panayiotakis, G., A wavelet-based spatially adaptive method for mammographic contrast enhancement, Phys. Med. Biol., 48, 787, 2003. 55. Chang, S., Yu, B., and Vetterli, M., Image denoising via lossy compression and wavelet thresholding, IEEE Trans. Image Proc., 9, 1532, 2000. 56. Chang., S., Yu, B., and Vetterli, M., Spatially adaptive wavelet thresholding with context modeling for image denoising, IEEE Trans. Image Proc., 9, 1522, 2000. 57. Kallergi, M. et al., Interpretation of calcifications in screen/film, digitized, and waveletenhanced monitor-displayed mammograms: a receiver operating characteristic study, Acad. Radiol, 3, 285, 1996. 58. Pisano, E.D. et al., The effect of intensity windowing on the detection of simulated masses embedded in dense portions of digitized mammograms in a laboratory setting, J. Digit. Imaging, 10, 174, 1997. 59. Pisano, E.D. et al., Does intensity windowing improve the detection of simulated calcifications in dense mammograms? J. Digit. Imaging, 10, 79, 1997. 60. Mekle, R., Laine, A.F., and Smith, S.J., Evaluation of a multiscale enhancement protocol for digital mammography, in Image-Processing Techniques for Tumor Detection, Strickland, R.N, Ed., Marcel Dekker, New York, 2002, chap. 7. 61. Sivaramakrishna, R. et al., Comparing the performance of mammographic enhancement algorithms: a preference study, Acad. J. Radiol., 175, 45, 2000. 62. Pisano, E.D. et al., Radiologists’ preferences for digital mammographic display, Radiology, 216, 820, 2000.
Medical image analysis method
264
63. Wagner, R.F. et al., Assessment of medical imaging and computer-assist systems: Lessons from recent experience, Acad. Radiol., 8, 1264, 2002. 64. Heath, M. et al., The digital database for screening mammography, in Proc. 5th Int. Workshop on Digital Mammography, Yaffe, M.J., Ed., Medical Physics Publishing, Madison, WI, 2000, p. 212. 65. DDSM: Digital Database for Screening Mammography; available on-line at http://%20marathon.csee.usf.edu/Mammography/Database.html. last accessed March 2005. 66. Suckling, J. et al., The mammographic image analysis society digital mammographic database, in Proc. 2nd Int. Workshop on Digital Mammography, Gale, A.G., Ed., Elsevier Science, York, U.K., 1994, p. 375. 67. MIAS: Mammographic Image Analysis Society; available on-line at http://www.wiau.%20man.ac.uk/services/MIAS/MIASweb.html. last accessed March 2005. 68. Simoncelli, E.P et al., Shiftable multiscale transforms, IEEE Trans. Inform. Theory, 38, 587, 1992. 69. Vetterli, M. and Kovacevi, J., Wavelets and Subband Coding, Prentice Hall, Englewood Cliffs, NJ, 1995. 70. Shensa, M.J., The discrete wavelet transform: wedding the a trous and Mallat algorithms, IEEE Trans. Signal Proc., 40, 2464, 1992. 71. Donoho, D.L., Denoising by soft-thresholding, IEEE Trans. Inform. Theory, 41, 613, 1995. 72. Coifman, R.R. and Donoho, D.L., Translation-invariant denoising, in Wavelets and Statistics, Antoniadis, A. and Oppenheim, G., Eds., Springer-Verlag, Berlin, 1995. 73. Costaridou, L. et al., Quantifying image quality at breast periphery vs. mammary gland in mammography using wavelet analysis, Br. J. Radiol., 74, 913, 2001. 74. Sakellaropoulos, R, Costaridou, L., and Panayiotakis, G., An image visualisation tool in mammography, Med. Inform., 24, 53, 1999. 75. Sakellaropoulos, P., Costaridou, L., and Panayiotakis, G., Using component technologies for Web-based wavelet enhanced mammographic image visualization, Med. Inform., 25, 171, 2000. 76. Bacry, E., Fralen, J., Kalifa, J., Pennec, E.L., Hwang, W.L., Mallat, S. and Zhong, S., Lastwave 2.0 Software, Centre de Mathématiques Appliquees, France; available on-line at http://www.cmap.polytechnique.fr/~bacry/Lastwave. last accessed March 2005. 77. Stoffel, A., Remarks on the unsubsampled wavelet transform and the lifting scheme, Signal Proc., 69, 177, 1998. 78. Bovis, K. and Singh, S., Enhancement technique evaluation using quantitative measures on digital mammograms, in Proc. 5th Int. Workshop on Digital Mammography, Yaffe, M.J., Ed., Medical Physics Publishing, Madison, WI, 2000, p. 547. 79. Skiadopoulos, S. et al., Simulating mammographic appearance of circumscribed lesions, Eur. Radiol., 13, 1137, 2003. 80. Strickland, R.N. and Hahn, H.I., Detection of microcalcifications in mammograms using wavelets, in Proc. SPIE Conf. Wavelet Applications in Signal and Image Processing II, 2303, 430, 1994. 81. Siegel, S. and Castellan, N.J., The case of one sample, two measures or paired replicates, in Nonparametric Statistics for the Behavioral Sciences, 2nd ed., McGrawHill, New York, 1988, p. 87. 82. Goodenough, D.J., Rossmann, K., and Lusted, L.B., Radiographic applications of receiver operating characteristic (ROC) curves, Radiology, 110, 89, 1974. 83. Metz, C.E., ROC methodology in radiologic imaging, Invest. Radiol., 21, 720, 1986. 84. Erkel, A.R. and Pattynama, P.M., Receiver operating characteristic (ROC) analysis: basic principles and applications in radiology, Eur. J. Radiol., 27, 88, 1998. 85. Medical Image Perception Society; available on-line at http://www.mips.ws/. last accessed March 2005. 86. Metz, C.E., University of Chicago; available on-line at http://wwwradiology.%20uchicago.edu/krl/roc_soft.htm. last accessed March 2005.
Locally adaptive wavelet contrast enhancement
265
87. Metz, C.E., Rockit version 0.9B, User’s Guide, Department of Radiology, University of Chicago, 1998. 88. Dorfman, D.D. and Alf, E., Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals: rating data method, J. Math. Psychol., 6, 487, 1969. 89. Metz, C.E., Quantification of failure to demonstrate statistical significance: the usefulness of confidence intervals, Invest. Radial, 28, 59, 1993. 90. Chang, S., Yu, B., and Vetterli, M., Spatially adaptive wavelet thresholding with context modeling for image denoising, IEEE Trans. Image Proc., 9, 1522, 2000. 91. Veldkamp, W. and Karssemeijer, N., Normalization of local contrast in mammograms, IEEE Trans. Medical Imaging, 19, 731, 2000. 92. Poissonnier, M. and Brady, M., “Noise equalization” for microcalcification detection? in Proc. 5th Int. Workshop on Digital Mammography, Yaffe, M.J., Ed., Medical Physics Publishing, Madison, WI, 2000, p. 334. 93. Efstathopoulos, E.P. et al., A protocol-based evaluation of medical image digitizers, Br. J. Radiol., 74, 841, 2001. 94. Heinlein, P., Drexl, J., and Schneider, W., Integrated wavelets for enhancement of microcalcifications in digital mammography, IEEE Trans. Medical Imaging, 22, 402, 2003. 95. Koren, I., Laine, A., and Taylor, R, Enhancement via fusion of mammographic features, in Proc. IEEE Int. Conf. on Image Proc., 2, 722, 1998. 96. Wang, Y.P., Wu, Q., and Castleman, K., Image enhancement using multiscale oriented wavelets, in Proc. Int. Conf. on Image Processing, Greece, Thessaloniki, October 2001, 1, 610. 97. Costaridou, L., Computer methods in mammography, in Proc. VI Int. Conf. on Med. Phys., Kappas, C. et al., Eds., Monduzzi Editore, Bologna, Italy, 1999, p. 201. 98. Hemminger, B.M. et al., Improving the detection of simulated masses in mammograms through two different image-processing techniques, Acad. Radiol., 8, 845, 2001. 99. Cole, E.B. et al., Diagnostic accuracy of digital mammography in patients with dense breasts who underwent problem-solving mammography: effects of image processing and lesion type, Radiology, 226, 153, 2003.
7 Three-Dimensional Multiscale Watershed Segmentation of MR Images loannis Pratikakis, Hichem Sahli, and Jan Cornells 7.1 INTRODUCTION The goal of image segmentation is to produce primitive regions that exhibit homogeneity and then to impose a hierarchy on those regions so that they can be grouped into largerscale objects. The first requirement concerning homogeneity can be very well fulfilled by using the principles of watershed analysis [1]. Specifically, our primitive regions are selected by applying the watershed transform on the modulus of the gradient image. We argue that facing an absence of contextual knowledge, the only alternative that can enrich our knowledge concerning the significance of our segmented pixel groups is the creation of a hierarchy, guided by the knowledge that emerges from the superficial and deep image structure. The current trends about the creation of hierarchies among primitive regions that have been created by the watershed transformation consider either the superficial structure [1–4] or the deep image structure [5, 6] alone. In this chapter, we present the novel concept of dynamics of contours in scale-space, which integrates the dual-image structure type into a single one. Along with the incorporation of a stopping criterion, the proposed integration embodies three different features, namely homogeneity, contrast, and scale. Application will be demonstrated in a medical-image analysis framework. The output of the proposed algorithm can simplify scenarios used in an interactive environment for the precise definition of nontrivial anatomical objects. Specifically, we present an objective and quantitative comparison of the quality of the proposed scheme compared with schemes that construct hierarchies using information either from the superficial structure or the deep image structure alone. Results are demonstrated for a neuroanatomical structure (white matter of the brain) for which manual segmentation is a tedious task. Our evaluation considers both phantom and real images. 7.2 WATERSHED ANALYSIS 7.2.1 THE WATERSHED TRANSFORMATION In the field of image processing, and more particularly in mathematical morphology, gray-scale images are considered as topographic reliefs, where the numerical value of a pixel stands for the elevation at this point. Taking this representation into account we can provide an intuitive description of the watershed transformation as in geography, where
Three-dimensional multiscale watershed segmentation
267
watersheds are defined in terms of the drainage patterns of rainfall. If a raindrop falls on a certain point of the topographic surface, it flows down the surface, following a line of steepest descent toward some local surface minima. The set of all points that have been attracted to a particular minimum defines the catchment basin for that minimum. Adjacent catchment basins are separated by divide lines or watershed lines. A watershed line is a ridge, a raised line where two sloping
FIGURE 7.1 (Color figure follows p. 274.) Watershed construction during flooding in two dimensions (2-D). surfaces meet. Raindrops falling on opposite sides of a divide line flow into different catchment basins (Figure 7.1). Another definition describes the watershed line as the connected points that lie along the singularities (i.e., creases or curvature discontinuities) in the distance transform. It can also be considered as the crest line, which consequently can be interpreted by two descriptions: firstly, as the line that consists of the local maxima of the modulus of the gradient, and secondly, as the line that consists of the zeros of the Laplacian. These intuitive descriptions for the watershed-line construction have been formalized in both the continuous and discrete domain. 7.2.1.1 The Continuous Case In the continuous domain, formal definitions of the watershed have been worked out by Najman [7] and Meyer [8]. The former definition is based on a partial ordering relation among the critical points that are above several minima. Definition 1: A critical point b is above a if there exists a maximal descending line of the gradient linking b to a. Definition 2: A path γ: ]−∞, +∞ [→R2 is called a maximal line of the gradient if
and
Medical image analysis method
268
Definition 3: A maximal line is descending if
Definition 4: Let P(f) be the subset of the critical points a of f that are above several minima of f. Then the watershed of f is the set of the maximal lines of the gradient linking two points of P(f). This definition of Meyer [8] is based on a distance function that is called topographical distance. Let us consider a function f: Rn→R and let supp(f) be its support. The topographical distance between two points p and q can be easily defined by considering the set Γ(p,q) of all paths between p and q that belong to supp(f). Definition 5: If p and q belong to a line of steepest slope between p and q (f(q)>f(p)), then the topographical distance is equal to TD(p,q)=f(q)−f(p) Definition 6: We define a catchment basin of a regional minimum mi, CB(mi), as the set of points that are closer to mi than to any other regional minimum with respect to the topographical distance
Definition 7: The watershed line of a function f is the set of points of the support of f that do not belong to any catchment basin
7.2.1.2 The Discrete Case Meyer’s definition [8] can also be applied for the discrete case if we replace the continuous topographical distance TF by its discrete counterpart. Another definition is given by Beucher [1] and Vincent [9]. The basic idea of the watershed construction is to create an influence zone to each of the regional minima of the image. In that respect, we attribute a one-to-one mapping between the regional minima and the catchment basin. Definition 8: The geodesic influence zone IZA(Bi) of a connected component Bi of B in A is the set of points of A for which the geodesic distance to Bi is smaller than the geodesic distance to any other component of B. Definition 9: The skeleton by influence zones of B in A, denoted as SKIZAB, is the set of points of A that do not belong to any IZA(Bi).
Three-dimensional multiscale watershed segmentation
269
SKIZA(B)=A/IZA (B) with Definition 10: The set of catchment basins of the gray-scale image I is equal to the set
obtained after the following recursion (Figure 7.2)
where hmin, hmax are the minimum and maximum gray level of the image, respectively Th(I) is the threshold of the image I at height h minh is the set of the regional minima at height h Definition 11: The set of points of an image that do not belong to any catchment basin correspond to the watersheds.
Medical image analysis method
270
COLOR FIGURE 3.1 (a) An ultrasound b-scan image of the carotid artery bifurcation with the atherosclerotic plaque outlined, (b) The corresponding color blood flow image of the carotid artery, which is used by the physician in order to identify the exact plaque region.
Three-dimensional multiscale watershed segmentation
271
COLOR FIGURE 7.1 Watershed construction during flooding in 2D.
COLOR FIGURE 7.2 Illustration of the recursive Immersion process.
Medical image analysis method
272
COLOR FIGURE 7.11 Generic events for the gradient magnitude evolution.
Three-dimensional multiscale watershed segmentation
273
COLOR FIGURE 7.20 Segmentation of the fingers in the HAND 100 volume.
COLOR FIGURE 7.21 Segmentation of the palm in the HAND 100 volume.
Medical image analysis method
274
COLOR FIGURE 7.22 Segmentation of the whole hand in the HAND 100 volume.
Three-dimensional multiscale watershed segmentation
275
COLOR FIGURE 7.23 Volume rendering of HAND 100 in the case of (a) thresholding and (b) segmentation.
COLOR FIGURE 7.24 Hierarchical hyperstack: Level 1.
Medical image analysis method
276
COLOR FIGURE 7.25 Hierarchical hyperstack: Level 2.
Three-dimensional multiscale watershed segmentation
277
COLOR FIGURE 7.26 Hierarchical hyperstack: Level 3.
Medical image analysis method
278
COLOR FIGURE 7.27 Hierarchical hyperstack: Level 4.
Three-dimensional multiscale watershed segmentation
COLOR FIGURE 7.28 Coarse segmentation.
279
Medical image analysis method
280
COLOR FIGURE 7.29 Coarse segmentation.
Three-dimensional multiscale watershed segmentation
COLOR FIGURE 7.30 Coarse segmentation.
281
Medical image analysis method
282
COLOR FIGURE 7.31 Coarse segmentation superimposed on a final hierarchical level.
Three-dimensional multiscale watershed segmentation
COLOR FIGURE 7.32 Final segmentation.
283
Medical image analysis method
284
COLOR FIGURE 7.33 Coarse segmentation superimposed on a final hierarchical level.
Three-dimensional multiscale watershed segmentation
COLOR FIGURE 7.34 Final segmentation.
285
Medical image analysis method
286
COLOR FIGURE 7.35 Coarse segmentation superimposed on a final hierarchical level.
Three-dimensional multiscale watershed segmentation
COLOR FIGURE 7.36 Final segmentation.
COLOR FIGURE 9.2 MRI T1weighted axial slice of human brain and its structure tensors. Hot color represents high structure.
287
Medical image analysis method
288
COLOR FIGURE 11.1 The figure shows the physiologic principle at the base of the generation of fMRI signals. A) Neurons increase their firing rates, increasing also the oxygen consumption. B) Hemodynamic response in a second scale increases the diameter of the vessel close to the activated neurons. The induced increase in blood flow overcomes the need for oxygen supply. As a consequence, the percentage of deoxyhemoglobin in the blood flow decreases in the vessel with respect to the figure A).
Three-dimensional multiscale watershed segmentation
289
COLOR FIGURE 11.5 Figure shows the cortical connectivity pattern
Medical image analysis method
290
obtained with the SEM method, for the period preceding and following the movement onset in the subject, in the alpha (8–12 Hz) frequency band. The realistic head model and cortical envelope of the subject analyzed obtained from sequential MRIs is used to display the connectivity pattern. Such pattern is represented with arrows moving from one cortical area toward another one. The colors and sizes of arrows code the level of strengths of the functional connectivity observed between ROIs. The labels are relative to the name of the ROIs employed. A): Connectivity pattern obtained from ERP data before the onset of the right finger movement (electromyographic onset; EMG). B): Connectivity patterns obtained after the EMG onset. Same conventions as above.
Three-dimensional multiscale watershed segmentation
291
COLOR FIGURE 11.6 Cortical connectivity patterns obtained by the DTP method, for the period preceding and following the movement onset, in
Medical image analysis method
292
the alpha (8–12 Hz) frequency band. The patterns are shown on the realistic head model and cortical envelope of the subject analyzed, obtained from sequential MRIs. Functional connections are represented with arrows, moving from a cortical area toward another one. The arrows’ colors and sizes code thestrengths of the connections. A): Connectivity pattern obtained from ERP data before the onset of the right finger movement (electromyographic onset; EMG). B): Connectivity patterns obtained after the EMG onset. Same conventions as above.
COLOR FIGURE 11.7 Figure presents on the left a view of some cerebral areas active during the selfpaced movement of the right finger, as reported by fMRI. At the righ of the figure is represented the dura mater potential distribution estimated with the use of the SL operator over a
Three-dimensional multiscale watershed segmentation
293
cortical surface reconstruction. The deblurred distribution is obtained at the 100 ms after the EMG onset of the right middle finger.
COLOR FIGURE 11.8 The upper row of the Figure (A), presents three cortical current density distributions. The left one showed the simulated cortical regions roughly corresponding to the supplementary motor area and the left motor cortex with the imposed activations represented in black. The current density reconstruction at the centre of the figure presents the results of the estimation of sources presented on the left map without the use of fMRI priors, by using the minimum norm estimate procedure. The current density reconstruction on the right of the figure presents the cortical
Medical image analysis method
294
activations recovered by the use of fMRI priors in agreement with the Eq.27. The second row of the Figure B) illustrates the scalp (left) and the cortical distributions (center and right) of the current density estimated with the linear inverse approaches from the potential distribution relative to the movement preparation, about 200 ms before a right middle finger extension. The distributions are represented on the realistic subject’s head volume conductor model. Left: scalp potential distribution recorded 200 ms before movement execution. Center: cortical estimate obtained without the use of fMRI constraints, based on the minimum norm solutions. Right: cortical estimate obtained with theuse of fMRI constraints based on the eq. 27.
Three-dimensional multiscale watershed segmentation
295
FIGURE 7.2 (Color figure follows p. 274.) Illustration of the recursive immersion process. 7.2.1.3 The 3-D Case A brief but explicit discussion about watersheds in three-dimensional (3-D) space was initiated by Koenderink [10], who considered watersheds as a subset of the density ridges. According to his definition, “the density ridges are the surfaces generated by the singular integral curves of the gradient, that is, those integral curves that separate the families of curves going to distinct extrema.” In cases where we consider only families of curves that go to distinct minima, then the produced density ridges are the watersheds. For a formal definition of the watersheds in 3-D, the reader can straightforwardly extend the definitions in Sections 7.2.1.1 and 7.2.1.2. For the definition of Najman [7] in the 3-D case, we have to consider that the points in P(f) are the maxima and that the two types of hypersaddles are connected to two distinct minima. These points have, in one of the three principal curvature directions, slope lines descending to the distinct minima; the two slope lines run in opposite directions along the principal curvature direction. These points make the anchor points for a watershed surface defined by these points and the slope lines that connect them.
Medical image analysis method
296
7.2.1.4 Algorithms about Watersheds The implementation of the watershed transformation has been done by using the following methods: iterative, sequential, arrowing, flow-line oriented, and flooding. The iterative methods were initiated by Beucher and Lantuéjoul [11], who suggested an algorithm based on the immersion paradigm. The method expands the influence zones around local minima within the gray-scale levels via binary thickenings until idempotence is achieved. The sequential methods rely on scanning the pixels in a predefined order, in which the new value of each pixel is taken into account in the processing of subsequent pixels. Friedlander and Meyer [12] have proposed a fast sequential algorithm based on horizontal scans. The arrowing method was presented by Beucher [1] and involves description of the image with a directed graph. Each pixel is a node of the graph, and the node is connected to those neighbors with lower gray value. The word “arrowing” comes from the directed connections of the pixels. The flow-line oriented methods are those that make an explicit use of the flow lines in the image to partition it by watersheds [5]. The flooding methods are based on immersion simulations. In this category, there are two main algorithms. The algorithm of Vincent and Soille [9] and the algorithm of Meyer [13]. For an extensive analysis and comparisons of the algorithms that are based on flooding, the interested reader can refer to the literature [14, 15]. 7.2.2 THE GRADIENT WATERSHEDS Whenever the watershed transformation is used for segmentation, it is best to apply it only on the gradient magnitude of an image, because then the gradient-magnitude information will guide the watershed lines to follow the crest lines, and the real boundaries of the objects will emerge. It has no meaning to apply it on the original image. Therefore, from now on, we will refer to gradient watersheds, thus explicitly implying that we have retrieved the watershed lines from the modulus of the gradient image. Examples of gradient watersheds in two dimensions (2-D) and 3-D can be seen in Figure 7.3 and Figure 7.4–7.5, respectively. The singularities of the gradient squared in 2-D occur in the critical points of the image and in the points where the second-order structure vanishes in one direction. This can be formulated as: (7.1) (7.2)
Three-dimensional multiscale watershed segmentation
297
FIGURE 7.3 Gradient watersheds in 2-D.
FIGURE 7.4 (a) The cross-sections of the 3-D object and (b) their 3-D gradient watersheds. where x, y denote Cartesian coordinates and w, ν denote gauge coordinates [16]. The gradient can be estimated in different ways. It can be computed as (a) the absolute maximum difference in a neighborhood, (b) a pixelwise difference between a unit-size morphological dilation of the original image and a unit-size morphological erosion of the original image, and (c) a computation of horizontal and vertical differences of local sums guided by operators such as the Roberts, Prewitt, Sobel, or isotropic operators. The
Medical image analysis method
298
application of gradient operators as in case c reduces the effect of noise in the data [17]. In the current study, the computation of the gradient magnitude is done by applying the Sobel operator. Accordingly, in the case of 3-D, the singularities of the gradient squared occur due to the following conditions (7.3) (7.4) where x, y, z denote Cartesian coordinates and w, ν, u denote gauge coordinates with w in the gradient direction and (u, ν) in the perpendicular plane to w (the tangent plane to the isophote). Similar to the 2-D case, the gradient magnitude in 3-D can be estimated in different ways. All of the existing approaches issue from a generalization of 2-D
FIGURE 7.5 A rendered view of the 3-D gradient watershed surface and the orthogonal sections. edge detectors. Lui [18] has proposed to generalize the Roberts operator in 3-D by using a symmetric gradient operator. Zucker and Hummel [19] have extended to 3-D the Hueckel operator [20]. They propose an optimal operator that turns out to be a generalization of the 2-D Sobel operator. The morphological gradient in 2-D has been extended to 3-D by Gratin [21]. Finally, Monga [22] extends to 3-D the optimal 2-D Deriche edge detector [23]. For the implementation of the gradient watersheds in 3-D, the current study has adopted the 3-D Zucker operator for the 3-D gradient-magnitude computation.
Three-dimensional multiscale watershed segmentation
299
7.2.3 OVERSEGMENTATION: A PITFALL TO SOLVE IN WATERSHED ANALYSIS The use of the watershed transformation for segmentation purposes is advantageous due to the fact that • Watersheds form closed curves, providing a full partitioning of the image domain; thus, it is a pure region-based segmentation that does not require any closing or connection of the edges. • Gradient watersheds can play the role of a multiple-point detector, thus treating any case of multiple-region coincidence [7]. • There is a one-to-one relationship between the minima and the catchment basins. Therefore, we can represent a whole region by its minima. Those advantages can be useful provided that oversegmentation, which is inherent to the watershed transformation, can be eliminated. An example of oversegmentation is shown in Figure 7.6. This problem can be treated by following two different strategies. The first strategy considers the selection of markers on the image and their introduction to the watershed transformation, and the second considers the construction of hierarchies among the regions that will guide a merging process. The next sections of this chapter are dedicated to the study of methods following the second strategy.
FIGURE 7.6 Example of an oversegmented image. 7.3 SCALE-SPACE AND SEGMENTATION 7.3.1 THE NOTION OF SCALE As Koenderink mentions [24], in every imaging situation one has to face the problem of scale. The extent of any real-world object is determined by two scales: the inner and the outer scale. The outer scale of an object corresponds to the minimum size of a window
Medical image analysis method
300
that completely contains the object and is consequently limited by the field of view. The inner scale corresponds to the resolution that expresses the pixel size and is determined by the resolution of the sampling device. If no a priori knowledge for the image being measured is available, then we cannot decide about the right scale. In this case, it makes sense to interpret the image at different scales simultaneously. The same principle has been followed by the human visual frontend system. Our retina typically has 108 rods and cones, and a weighted sum of local groups of them make up a receptive field (RF). The profile of such an RF takes care of the perception of the details in an image by scaling up to a larger inner scale in a very specific way. Numerous physiological and psychophysical results support the theory that the cortical RF profiles can be modeled by Gaussian filters (or their derivatives) of various widths [25]. 7.3.2 LINEAR (GAUSSIAN) SCALE-SPACE Several authors [24, 26–35] have postulated that the blurring process must essentially satisfy a set of hypotheses like linearity and translation invariance, regularity, locality, causality, symmetry, homogeneity and isotropy, separability, and scale invariance. These postulates lead to the family of Gaussian functions as the unique filter for scale-space blurring. It has been shown that the normalized Gaussian Gσ(X) is the only filter kernel that satisfies the conditions listed above: (7.5)
Three-dimensional multiscale watershed segmentation
301
FIGURE 7.7 An MR brain image blurred at different scales (a) σ=1, (b) σ=4, (c) 0=8, (d) a=16. Here x·x is the scalar product of two vectors, and d denotes the dimension of the domain. The extent of blurring or spatial averaging is defined by the standard deviation σ of the Gaussian, which represents the scale parameter. An example of this spatial blurring can be seen in Figure 7.7. From this example, it can clearly be seen how the level of detail in the image decreases as the level of blurring increases and how the major structures are retained. The scale-space representation of an image is denoted by the family of derived images L(x,σ) and can be obtained as follows: let L(x) be an image acquired by some acquisition method. Because this image has a fixed resolution determined by the acquisition method, it is convenient to fix the inner scale as zero. The linear scale-space L(x,σ) of the image is defined as (7.6)
Medical image analysis method
302
where denotes spatial convolution. Note that the family of derived images L(x,σ) depends only on the original image and the scale parameter a. Lindeberg [29] has pointed out that the scale-space properties of the Gaussian kernel hold only for continuous signals. For discrete signals, it is necessary to blur with a modified Bessel function, which, for an infinitesimal pixel size, approaches the Gaussian function. Koenderink [24] has also shown that the generation of the scale-space as defined in Equation 7.6 can be viewed as solving the heat equation or diffusion equation (7.7) The conductance term c controls the rate of blurring at each time step. If c is a constant, the diffusion process is called linear diffusion, and the Gaussian kernel is the Green’s function of Equation 7.7. In this case, the time parameter replaces the scale parameter in Equation 7.6 with t=σ2/2c, given the initial condition L(x,0)= L(x). The diffusion flow is a local process, and its speed depends only on the intensity difference between neighboring pixels and the conductance c. The diffusion process reaches a state of equilibrium at t→∞ when all pixels approach the same intensity value. 7.3.3 SCALE-SPACE SAMPLING The scale-space representation is a continuous representation. In practice, however, it is necessary to sample the scale-space at some discrete values of scale. An equidistant sampling of scale-space would violate the important property of scale invariance [30]. The basic argument for scale invariance has been taken from physics expressing the independence of physical laws from the choice of fundamental parameters. This corresponds to what is known as dimensional analysis, which defines that a function that relates physical observables must be independent of the choice of dimensional units. The only way to introduce a dimensionless parameter is by introducing a logarithmic measure [30]. Thus, the sampling should follow a linear and dimensionless scale parameter δτ, which is related to a according to the following: (7.8) where n denotes the quantization level. A convenient choice for τ0 is zero, which implies that the inner scale σ0 of the initial image is taken to be equal to the linear grid measure ε. At coarse scales, the ratio between successive scales will be about constant, while at fine scales the differences between successive scales will be approximately equal. 7.3.4 MULTISCALE IMAGE-SEGMENTATION SCHEMES The concept of scale-space has numerous applications in image analysis. For a concise overview, the interested reader can refer to the literature [16]. In this paper, scale-space theory concepts are used for image-segmentation purposes.
Three-dimensional multiscale watershed segmentation
303
7.3.4.1 Design Issues For the implementation of a multiscale image-segmentation scheme, a number of considerations must be kept in mind. A general recipe for any multiscale segmentation algorithm consists of the following tasks: 1. Select a scale-space generator that will build the deep structure and govern the simplification process for the image structure. 2. Determine the linking scheme that will connect the components (features) in the deep image structure. Naturally, an immediate question arises about which features will be the ones that will be linked. The answer is one of the components that are apparent for the linking-scheme description. The other components are the rules that will steer the linking and the options that will be examined for the linkages (bottom-up, top-down, doubly linked lists). 3. Attribute a significance measure of the scale-space segment. This implies that a valuation has to be introduced at the localization level, for either the region or the segmentation contours, by retrieving information from their scale-space hierarchical tree. All the above considerations have been combined in different ways and led different authors to advocate their own multiscale segmentation scheme. In the following section, the state of the art is presented. 7.3.4.2 The State of the Art In Lifshitz and Pizer’s work [36], a multiscale “stack” representation was constructed considering isointensity paths in scale-space. The gray level at which an extremum disappears is used to define a region in the original image by local thresholding on that gray level. The same authors observed that this approach can be used to meet the serious problem of noncontainment. This problem refers to the case that a point, which at one scale has been classified as belonging to a certain region (associated to a local maximum), can escape from that region when the scale parameter increases. Moreover, the followed isointensity paths can be intertwined in a rather complicated way. Lindeberg [37] has based his approach on formal scale-space theory to construct his scale-space primal sketch. This representation is achieved by applying a linking among light or dark blobs. Because a blob corresponds to an extremum, he used catastrophe theory to describe the proposed linking as interactions between saddles and extremum. To attribute a significance measure for the scale-space blob, he considered three features: spatial extent, contrast, and lifetime in the scale-space. Correspondence between two blobs of consecutive scale is attributed by measuring the degree of overlap. Multiscale segmentation of unsharp blobs has also been reported by Gerig et al. [38]. They applied Euclidean shortening flow, which progressively smoothes the level curves and lets them converge to circles before they disappear at singularity points. Object detection is interleaved with shape computation by analyzing the continuous extremum paths of singularities in scale-space. Assuming radially symmetric structures, the singularity traces are attributed to the evolution time. Eberly [39] constructed a hierarchy based on annihilation of ridges in scale-space. He segmented each level of the scale-space by decomposing the ridges into curvilinear
Medical image analysis method
304
segments, followed by labeling. Using a ridge flow model, he made a one-to-one correspondence of each ridge segment to a region. At each pixel in the image, the flow line is followed until the flow line intersects a ridge. Every pixel along the path is assigned the label of the terminal ridge point. The links at the hierarchical tree are inserted, based on how primitive regions at one scale become blurred into single regions at the next scale. The latter single primitive region is considered to be the parent of the original two regions because it overlaps those two regions more than any other region at the current scale. The segmentation scheme of Vincken [40, 41] and Koster [42] is based on the hyperstack, which is a generalization to 3-D of the stack proposed by Koenderink [24]. Between voxels at adjacent scale levels, child-parent linkages are established according to a measure of affection [42]. This measure is a weighted sum of different measures such as intensity difference, ground volume size, and ground volume mean intensity. A ground volume is the finest-scale slice of a 4-D scale-space segment. This linking-model-based segmentation scheme has been applied not only for the linear scale-space, but experiments have also been reported [43] for gradient-dependent diffusion and Euclidean shortening flow. Vincken et al. [40, 41] used the hyperstack in combination with a probabilistic linking, wherein a child voxel can be linked to more than one parent voxel. The multiparent linkage structure is translated into a list of probabilities that also indicate the partial-volume voxels and to which extent these voxels belong to the partial-volume class of voxels. Thus, an explicit solution for the treatment of partial-volume effects caused by the limited resolution, due to the acquisition method and leading to multiple object voxels, is proposed. Using linear-scale evolution of gray-scale images, Kalitzin et al. [44] proposed a hierarchical segmentation scheme where, for each scale, segments are generated as Voronoi diagrams, with a distance measure defined on the image landscape. The set of centers of the Voronoi cells is the set of the local extrema of the image. This set is localized by using the winding number distribution of the gradient-vector field. The winding number represents the number of times that the normalized gradient turns around its origin, as a test point circles around a given contour. The process is naturally described in terms of singularity catastrophes within the smooth scale evolution. In short, this approach is a purely topological segmentation procedure, based on singular isophotes. Griffin et al. [45] proposed a multiscale n-ary hierarchy. The basic idea is to create a hierarchical description for each scale and then link these hierarchies across scale. In a hierarchical description of the structure, the segments are ordered in a tree structure. A segment is either the sum of its subobjects or a single pixel. This hierarchy is built by iteratively merging adjacent objects. The order of merging is based on an edge-strength measure that combines pure edge strength along with perceptual significance of the edge, determined by the angle of the edge trajectory. The linking of the hierarchies proceeds from coarse to fine scales and from the top of the hierarchies to the bottom. First, the roots in the hierarchies are linked, then the subobjects of the roots are matched, etc. This results in a multiscale n-ary hierarchy. The multiscale segmentation framework presented in this chapter deals with regions produced after the application of the watershed transformation and its subsequent
Three-dimensional multiscale watershed segmentation
305
tracking in scale-space. In a similar spirit, other authors have produced works in this field: Jackway [46] applied morphological scale-space theory to control the number of extrema in the image, and by subsequent homotopy-linking of the gradient extrema to the image extrema, he obtained a scale-space segmentation via the gradient-watershed transformation [46]. In this case, the watershed arcs that are created at different scales move spatially and are not a subset of those at zero scale. Gauch and Pizer [5] presented an association of scale with watershed boundaries after a gradual blurring with a series of Gaussians. When an intensity minimum annihilates into a saddle point, the water that drains towards the annihilated minimum now drains to some other intensity minimum in the image. This defines the parent-child relationship between these two watershed regions. By continuing this process for all intensity minima in the image, a hierarchy of watershed regions is defined. Olsen [6] analyzed the deep structure of segmentation using catastrophe theory. In this way, he advocated a correspondence between regions produced by the gradient watersheds at different scales.
7.4 THE HIERARCHICAL SEGMENTATION SCHEME The relationship between watershed analysis and scale-space can be attributed to the simplification process that is offered by the scale-space. On the one hand, a decreasing number of singularities occur during an increasing smoothing of the image. On the other hand, the duality of the watershed segments increases with their respective minima in the gradient image. Both contribute to an attractive framework for the examination of a merging process in a segmentation task. A detailed explanation of this relationship, along with the produced results, will be given in the following. 7.4.1 GRADIENT MAGNITUDE EVOLUTION As discussed in Section 7.3.4.1, when we think about the implementation of a multiscale segmentation scheme, certain considerations have to draw our attention. The very first consideration is the selection of the image-evolution scheme. In this work, we have studied the gradient-magnitude evolution. The basic motivation is that treating a problem of an uncommitted front-end, contrast and scale are the only useful information. Gradient magnitude provides the contrast information, and scale is inherent to the evolution itself. During the image evolution according to the heat equation Lt=Ln, the gradient-squared image follows an evolution according to the following: Tensor notation (7.9) Using a Cartesian coordinate system
Medical image analysis method
306
(7.10) Computing the Laplacian for the gradient squared (7.11)
We observe that the gradient-squared evolution is not governed by the diffusion equation, and subsequently the corresponding singular points or regional areas evolve in a different manner. 7.4.2 WATERSHED LINES DURING GRADIENT MAGNITUDE EVOLUTION The second consideration (see Section 7.3.4.1) for building up a multiscale segmentation scheme is the determination of the linking scheme for the selected features in the deep image structure. In a watershed-analysis framework, the selected features are the regions that are produced by the gradient watersheds, each of them corresponding to a singularity (regional minimum) of the gradient-magnitude image. Because the proposed segmentation scheme relies on the behavior of singularities in time, we have used catastrophe theory to study an explicit classification of the topological changes that occur during evolution and to explain their linking in scale-space. In this study, we have drawn the conclusion that two types of catastrophes (fold and cusp) occur during the gradientmagnitude evolution. The detailed algebraic analysis can be found in a work by Pratikakis [47]. Using the duality between the regional minima and the regions produced by the gradient watersheds, we can observe how the watershed lines behave during this evolution. Figure 7.8, Figure 7.9, and Figure 7.10 give a clear insight of this behavior. Looking at Figure 7.9, we can observe both catastrophe types. The fold catastrophe is perceived as an annihilation of the regional minimum, and the cusp catastrophe is perceived as merging between two regional minima to give one minimum. This behavior is reflected on the watershed-line construction by an annihilation of watershed-line segments. Obviously, this demonstrates why the placement of watershed
Three-dimensional multiscale watershed segmentation
307
FIGURE 7.8 Successive blurring of the original image. analysis into a scale-space framework makes it an attractive merging process. Nevertheless, there is a major pitfall. In Figure 7.10, it is clearly evident that, during the evolution of the gradient magnitude, the watershed lines become increasingly delocalized. This situation does not permit us to have a merging process by only considering the critical-point events and retrieving the produced segments at the desired scale. This also explains why the deep image structure has to be viewed as one body and not as a
Medical image analysis method
308
collection of separated scaled versions of the image under study. To achieve a single-body description of the deep image structure, we need to link (connect) all the components or features of this structure. For segmentation purposes, this linking is useful because it guides us to achieve a segmentation at the localization scale. This is feasible by describing all the spatial changes and interactions of the singularities that also influence the saliency measure of the localized watershed segments. The next section of this chapter provides a detailed description of the proposed linking scheme.
Three-dimensional multiscale watershed segmentation
309
FIGURE 7.9 Behavior of the regional minima during the gradient-magnitude evolution. 7.4.3 LINKING ACROSS SCALES We have already explained that interaction between singularities during the magnitudegradient evolution is attached to behaviors of either a fold or a cusp catastrophe. The critical points disappear with increasing scale, and this event is the generic way in which it happens. The term generic means that if the image is changed slightly, the event may change position in scale-space, but it will still be present. Apart from the disappearance, another event is also generic. This is the appearance of two critical points [36, 48, 49]. In a more detailed way, the generic events of the minima in the gradient magnitude are as follows: No interaction with other singularities (Figure 7.11 a) Creation in a pair with a saddle (Figure 7.11b) Splitting into saddle and two minima (Figure 7.11c)
Medical image analysis method
310
FIGURE 7.10 Watershed segment merging and delocalization of the watershed lines during the gradientmagnitude evolution. Annihilation with a saddle (Figure 7.11b) Merging with a saddle and another minimum into one minimum (Figure 7.11c) In Figure 7.11, all the generic events are schematically described, using broken lines to indicate linking between the minima of two adjacent regions in scale-space. As scale increases from bottom to top, one can observe how interactions between critical points can lead to merging of two adjacent regions due to the underlying one-to-one correspondence between a minimum and a region. Linking of the minima (parent-child relationship) for successive levels is applied by using the proximity criterion [24]. This criterion checks the relative distance for all the minima at scale a, that have been projected on the same influence zone IZA(Bj)i+1 at scale
Three-dimensional multiscale watershed segmentation
311
σi+1 with respect to the original minimum of this influence zone. An example can be seen in Figure 7.12, which represents the linking for two
FIGURE 7.11 (Color figure follows p. 274.) Generic events for gradientmagnitude evolution.
Medical image analysis method
312
FIGURE 7.12 Linking in two successive levels: (a) scale N and (b) scale N+1. successive levels of the evolution example that is depicted in Figure 7.8, Figure 7.9, and Figure 7.10. Figure 7.12a shows the regional minima at scale σi that have been spatially projected onto level σi+1. The watershed lines at level σi+1 are also shown, and these delimit the influence zones at this level. The regional minima at scale σi+1 can be seen in Figure 7.12b. For the sake of clarity, for each regional minimum in Figure 7.12a and Figure 7.12b, there is a marker of different shape and gray value that makes them distinct. A linking for the minima (mj)σi at scale σi and the minima at scale σi+1 appears in Figure 7.12c. After the linking stage, we have for each quantization scale level a labeling for the minima with respect to their linking ability. These labels are of two types. Either the minimum is annihilated/merged and will not be considered in the next levels, or the minimum does not interact with other singularities and takes up the role of the father label for all the minima that were annihilated or merged and were situated at the same influence zone. This labeling is guided by the proximity criterion. The projected which is closest to the minimum is considered minimum the father, and the rest of the projected minima onto the same influence zone are
Three-dimensional multiscale watershed segmentation
313
considered annihilated. Closeness is defined with respect to the topographic distance (see Section 7.2.1.1), which is a natural distance measure following the steepest gradient path inside the catchment basin. From the implementation point of view, we have to mention toward In that way, we avoid problems that we use ordered queues to guide caused by the presence of plateaus. Being consistent with the theory, we have to keep in mind that a generic event in gradient-magnitude evolution is also the creation/splitting of minima. In practice, this event can be understood as an increasing of minima in successive levels in the evolution. Due to the quantization of scale, such an increase in the amount of minima rarely occurs, and even if it happens, its scale-space lifetime is very short. This motivated us to keep the same linking scheme for this event, too. In the case that a creation contributes to an increasing amount of minima, then linking is done with the closest minimum of the two new ones, while the other is ignored. The proposed linking scheme is a single-parent scheme that links the regional minima and their respective dual watershed regions in successive evolution levels. This regionbased approach is chosen to avoid problems of a pixel-based linking caused by the noncontainment problem (see Section 7.3.4). An additional advantage of a region-based approach, specifically when a watershed-analysis framework is used, is the inherent definition of a search area for the linking process, namely the influence zones, that otherwise, in a pixel-based approach, has to be defined in an ad hoc manner. The aim of the proposed linking is to determine which contours (common borders) can be regarded as significant, without any a priori information about scale, spatial location, or the shape of primitives. 7.4.4 GRADIENT WATERSHEDS AND HIERARCHICAL SEGMENTATION IN SCALE-SPACE Once the linking between the regional minima in the deep image structure has been obtained, an explicit hierarchy is attributed to these minima. The currently obtained
Medical image analysis method
314
FIGURE 7.13 Dynamics of contours in scale-space algorithm. hierarchy is only based on scale. At this point, we go toward the description of how to enrich this hierarchy and make it more consistent. Consistency will be obtained because the hierarchy is based on more features than only scale. A hierarchical segmentation of an image is a tree structure by inclusion of connected image regions. The construction of the tree structure follows a model consisting of two modules. The first module is dedicated to evaluate each contour arc with a salient measure, and the second module identifies the different hierarchical levels by using a stopping criterion. A schematic diagram can be seen in Figure 7.13. As mentioned in Section 7.3.4.1, the third consideration for constructing a multiscale segmentation scheme is the significance measure. The following subsection explains this measure and how we attribute it to the watershed segments at the localized scale. 7.4.5 THE SALIENT-MEASURE MODULE 7.4.5.1 Watershed Valuation in the Superficial Structure-Dynamics of Contours The principle of dynamics of contours [3] uses the principle of dynamics of minima [2] as an initial information for the common contour valuation of adjacent regions (see Figure 7.14). The additional information that is used is based on the tracking of the flooding history. In such a way, a contour valuation can be found by the comparison of dynamics between the regions that have reached the contour of interest during a flooding. The dynamics of a minimum m1 is easily defined with a flooding scenario. Let h be the altitude of the flood when, for the first time, a catchment basin with a deeper minimum m2 (m2<m1) is reached. The dynamics of m1 is then simply equal to h-altitude (m1). Each catchment basin is attributed the value of the dynamics of its minimum. The contour valuation that is attributed to each common border of the regions at a certain scale σ=σ0 (superficial structure) is denoted as where a denotes the lower point (saddle point) of the arc that belongs to the watershed, and Bi,σ denotes an open connected component that belongs to the topological open set Bas(a)σ, which is defined as
Three-dimensional multiscale watershed segmentation
315
FIGURE 7.14 Flooding list for point P, the basis for computing dynamics of contours that correspond to P. (7.12) (7.13) This contour valuation is based on contrast and is characterized by a high degree of noise sensitivity. Experimental results have been reported in the literature [50]. Motivated by the shortcoming in noise sensitivity, it came as a natural consequence to obtain a contour valuation by considering the behavior of the catchment basins in scale-space. 7.4.5.2 Dynamics of Gradient Watersheds in Scale-Space Once the parent-child linkages have been completed, the next step is to valuate the gradient watersheds at the localization scale σ0. Let us assume that we want to valuate the gradient watershed that separates regions A and B (see Figure 7.15). From the linking step, we have created a linkage list, Λ(m, n) (see Figure 7.15a), where m and n denote the regions at the localization scale. In our example, m=A and n=B. This list provides the following information: region F is attributed to a root for regions A and B at scale S4. This has occurred because (a) region A has been linked to region D at scale S2, and region D has been
Medical image analysis method
316
FIGURE 7.15 Down-projection and contour valuation (The linkage list Λ(A,B)). linked to region F; and (b) region B has been linked to region C at scale S1, region C has been linked to region E at scale S3, and region E has been linked to region F. Based upon this linkage list, we compute the multiscale contour dynamics for the contour arc that separates region A and B at localization scale σ0. As demonstrated in Figure 7.15b, at each scale quantization level l in the linkage list Λ(m,n), where no root creation has occurred, we compute the dynamics of contours DCl for the region couple which constitute part of the branches in the linkage list at the scale quantization level l. This computation answers the question of how contrasted the regions are, rather than attributing a valuation to a contour arc that separates those regions at a scale quantization level St. The reason is that regions may not be adjacent at a certain quantization scale other than S0. Nevertheless, all the valuations DCk during the evolution of regions in scale-space will be integrated to provide the dynamics of the localized gradient watersheds at σ0. Taking into account Equation 7.12, the dynamics of contours in scale-space are denoted as (7.14)
where p and q denote the regions that have the common contour a at the localization scale, S0 denotes the localization-scale quantization level, and Sa denotes the annihilation-scale quantization level. The difference ∆t=Sa−S0 denotes the scale-space lifetime. The determination of a salient measure with respect to the dynamics of contours in scale-space is described by the flow diagram in Figure 7.13. To retrieve the different hierarchical levels HLi, a second phase follows that we call the stopping-criterion phase.
Three-dimensional multiscale watershed segmentation
317
This phase involves a merging criterion of region couples whose semantics have been sorted according to their dynamics of contours in scale-space valuation. 7.4.6 THE STOPPING-CRITERION STAGE When the calculation of the saliency measure has been completed, we sort these values. As a result, we obtain the ranking of the adjacent region couples that are to be merged. From there on, the consecutive hierarchical levels can be constructed based on the ability of each merged couple to satisfy the null hypothesis during a hypothesis testing as we scan the ranked values. For our problem, the hypothesis set is defined as: Two adjacent regions belong to the same label at level k Two adjacent regions belong to different labels at level k Upon that definition, it is critical to say that for every level transition from k to k+1, we update the statistics used to formulate the hypothesis. This update is expressed by the error term Considering that (a) our initial partitioning of the image (oversegmented image) consists of homogeneous regions and (b) objects at a certain hierarchical level k are expected to exhibit a certain homogeneity, we choose as the most suitable statistic the variance σ2. Therefore, we formulate our hypothesis testing as follows: (7.15) (7.16) where
denotes the variance of the merged region as a result of the updated regions
Oi and Oj, and
where is expressed as
(7.17) where denotes the estimated variance of the merged region as a result of the updated regions Oi and Oj. It is a result of weighting the variances of the updated constituent regions by the region area, and it is equal to The term denotes the error in variance at level k—1, due to which the alternative hypothesis was validated, and consequently the hierarchical level k—1 was is set to zero. Applying the chi-squared test to the calculated created. Initially, variance s2 of the merged regions Oi and Oj, we get
Medical image analysis method
318
(7.18) (7.19) P(i,j) is a decision function that is true if updated partitions i and j belong to the same object at level k; otherwise it is false, and n is the number of pixels in the merged region. In our case, n>>0, so we introduce the following approximation of the chi-square [51] (7.20) where ua is the right critical value for a one-tailed hypothesis test carried out on a standard normal distribution [N(1,0)]. Substituting the approximation of Equation 7.20 in Equation 7.18 and Equation 7.19 leads to Equation 7.21 (7.21)
Therefore, our test is expressed as (7.22)
(7.23)
During all the experimental work, the confidence interval used was a=0.05, which corresponds to ua=1.96. As a last note, it must be mentioned that when we update the variances of the merged regions, we do not have to compute them, because statistics for a merged region can be computed by using statistics of the constituent partitions, which already have been computed at the very beginning. 7.4.7 THE INTELLIGENT INTERACTIVE TOOL Automatic segmentation methods are known to be unreliable due to the complexity and variability of medical images, and they cannot be applied without supervision by the user. On the other hand, manual segmentation is a tedious and time-consuming process, lacking precision and reproducibility. Moreover, it is impractical when applied to extensive temporal and spatial sequences of images. Therefore, to perform an imagesegmentation task in a generic and efficient way, a compromise has to be found between automatic and manual modes. Under this principle, a hybrid scheme has to be constructed that minimizes the user input and allows the presence of appropriate tools to provide fast and accurate manual correction. The user input is minimized by constructing an image
Three-dimensional multiscale watershed segmentation
319
description rich in meaningful regions with low cardinality, and an interactive tool ensures accuracy and reproducibility without requiring any special training by the user. A meaningful image description is obtained by following the reported hierarchy on an initial partitioning extracted by the gradient watersheds. Similar efforts that use the watershed transformation have been reported by other authors. Pizer et al. [52] reported a system for interactive object definition that was implemented based on an automatically computed image description of sensible image regions obtained after successive blurring. This image description is a quasihierarchy of ridges and subridges in the intensity surface corresponding to the image. The same approach could not be applied in our case because, during successive blurring, the watersheds that belong to the same ridge do not always merge together before the parent ridge merges to a neighbor. In a work by Maes et al. [53], partitioning the image in segmentation primitives was treated as a global optimization problem based on the minimum description length of a criterion derived from attributes that describe similarity between regions. An updated hierarchy was provided, using the maximum current description length reduction as the indication of the merging at each iteration. At a second stage, the output-image partitioning provides the input for an intelligent paintbrush tool. This approach lacks flexibility because it always stops at the global minimum. Therefore, there is no way to merge further and compare different levels. Instead, our proposed hierarchical segmentation scheme can help the application through the automatic retrieval of a hierarchy that consists of a small number of levels. With this scheme in place, browsing through does not become a tedious task, and each of the levels can provide meaningful delineations with respect to a certain degree of homogeneity. Following our approach, once the complete stack of all possible hierarchical levels has been calculated, the user can interact and intervene for the final precise delineation of the 3-D structures [54]. Interactivity is applied directly on the volume of interest by manipulating their respective orthogonal views. The proposed scenario is initiated by fast browsing through the different 3-D hierarchical levels. This enables the user to select the optimal level in terms of minimal interaction. Browsing among the different hierarchical levels, the user will be able to see all the possible segmentations provided by the complete hierarchical stack. The user then chooses the hierarchical level that best approximates the object under discussion. The notion of interactivity entails two basic actions. The first action, the inclusion/exclusion of a 3-D segment, can be activated by a simple mouse click. In the event that needed segments are not present, on-the-fly selection of a finer level out of the hierarchical set is feasible. The second main action for the interactive process is the conversion of the contour, as it has been achieved until now, to a contour representation based on Catmull-Rom splines [55]. This enables a further flexible manual adjustment in those regions where the border of the object is not respected (Figure 7.16, bottom right view). In this approach, parts of the object that are very smooth are characterized by only a few points, while jagged edges can be approximated at any accuracy by addition or by dragging points to the edges, with immediate update of the spline-contour. A contour represented by Catmull-Rom splines, which belong to the family of cardinal splines, passes through all points marked by the user, which is not the case for representations based on B-splines or Bezier curves, where
Medical image analysis method
320
a distinction is made between the points the curve is passing through and the control knots, which
FIGURE 7.16 Top left view: Wiremesh of the 3-D delineated contour set. Other views: simultaneous 3-D orthogonal viewing for interactive correction. allow the user to control the slopes of the curve. Definition of these control knots, under the constraint that the spline should follow the trace of a visible edge, makes manual delineation less efficient. During the editing of the contour, the consistency of the object in 3-D can be checked by looking at the cross-section of the contour in the other orthogonal planes (transversal, coronal, and sagittal). Once the user is completely satisfied, the contour is added to the stack of contours of the object and converted to a 3-D voxel representation of the object, which is scanned in real time to retrieve and show the updated outlines of the object in the other orthogonal planes (Figure 7.16). The whole concept has been integrated into a 3-D modality editor software with the aim of creating reference sets of anatomical objects (i.e., digital atlases) and facilitating morphometrical studies. 7.5 EXPERIMENTAL RESULTS
Three-dimensional multiscale watershed segmentation
321
The dynamics of contours in scale-space algorithm in 3-D has been tested on artificial test images and on real-world (medical) images. 7.5.1 ARTIFICIAL IMAGES For the experiments, we have used the HAND100 artificial 3-D images. The original HAND image contains the pixel values zero (background), 500 (thumb), 800 (forefinger), 1000 (palm of the hand), 1250 (middle finger and ring finger), and 1500
FIGURE 7.17 Cross-sections of the HAND100 volumetric image. (Provided by Koen Vincken from the Image Sciences Institute, Utrecht, Netherlands.)
FIGURE 7.18 Rendering of the thresholded HAND100 volumetric image. (Provided by Koen Vincken from the Image Sciences Institute, Utrecht, Netherlands.)
Medical image analysis method
322
(little finger). From this original image, two different images have been derived by adding Gaussian noise with σ=100. We may point out that while these images are originally 16-bit, we have utilized their 8-bit version. The cross-sections and the volume rendering using different thresholds for the HAND 100 volumetric image can be seen in Figure 7.17 and Figure 7.18. The application of the dynamics of contours in scale-space algorithm yields a hierarchical hyperstack that consists of three levels of segmented volumes in the case of HAND100. Hyperstacks can be seen in Figure 7.19a to Figure 7.19c. In these figures, the gradient watersheds appear in white. Using the produced hyperstack for the HAND 100 image, we have segmented the “whole hand” and its parts, namely “palm” and “fingers.” To segment the “fingers,” we used only the first level of the hyperstack (Figure 7.19a) because, at this level, we had a complete definition for all of them. Their assignment as segmented objects occurs by a simple click of the mouse for each of them. At this point, we have to mention that when we assign a partition to an object, we can view this propagation of the assignment in the 3-D space by looking at the other orthogonal views. This facilitates the interactive task because the operators can have complete control of their corrections across the volume. In Figure 7.20, one can see the segmented “fingers” in red along with their volume rendering. Segmentation of the “palm” of the hand has also been obtained by using only the first level of the hyperstack. The assignment of this object was done by only one click of the mouse. The produced segmentation can be seen in Figure 7.21. Finally, the whole hand was segmented by using the third level of the hyperstack (Figure 7.19c). To assign this object, one has to merge the thumb finger with the rest of the hand. This occurs by dragging the mouse over the area of these two parts or by applying two mouse clicks in the respective areas. The produced result can be seen in Figure 7.22.
Three-dimensional multiscale watershed segmentation
323
FIGURE 7.19 Cross-sections of each segmented volumetric level in the hierarchical hyperstack for HAND 100 volume. (Provided by Koen Vincken from the Image Sciences Institute, Utrecht, Netherlands.)
Medical image analysis method
324
FIGURE 7.20 (Color figure follows p. 274.) Segmentation of the fingers in the HAND 100 volume. (Provided by Koen Vincken from the Image Sciences Institute, Utrecht, Netherlands.) Volume rendering of the segmented “whole hand” in artificial volume HAND 100 is compared with the volume rendering of the thresholded versions of the original data (Figure 7.23). We observe that in the case that we use only thresholding, parts of the whole hand are either obscured by noise or disappear due to a high threshold. However, the segmented whole hand preserves the features of the original data. 7.5.2 MEDICAL IMAGES The dynamics of contours in scale-space algorithm in 3-D has been tested for the segmentation of the cerebellum of the brain. The produced hyperstack consisted of four volumetric levels, which are shown in Figure 7.24, Figure 7.25, Figure 7.26, and Figure
Three-dimensional multiscale watershed segmentation
325
7.27. In these figures, the different levels are demonstrated and compared by showing the same orthogonal views for each level. For the segmentation of the
FIGURE 7.21 (Color figure follows p. 274.) Segmentation of the palm in the HAND100 volume. (Provided by Koen Vincken from the Image Sciences Institute, Utrecht, Netherlands.) cerebellum, the optimum coarse partitioning is obtained by selecting level four. After the level selection, we clicked once with the mouse inside the area of the cerebellum. The result can be seen in Figure 7.28. The red line indicates the delineation of the assigned object. This coarse segmentation step produces a volume that is rendered in the same figure. While the orthogonal views (which can be seen in Figure 7.28) indicate that the coarse segmentation is very close to the real object, the rendered view indicates that structures that do not belong to the cerebellum have been assigned as such. For example, the elongated part at the bottom of the rendered view does not belong to the cerebellum.
Medical image analysis method
326
Thus, we have to browse through the selected hierarchical level for the coarse segmentation and indicate the parts of the 3-D object that have to be corrected. Figure 7.29 and Figure 7.30 are examples of the coarse segmentation from other orthogonal views. In these figures we can see that inclusion and exclusion of areas that do not belong to the cerebellum are needed.
FIGURE 7.22 (Color figure follows p. 274.) Segmentation of the whole hand in the HAND 100 volume. (Provided by Koen Vincken from the Image Sciences Institute, Utrecht, Netherlands.) To refine our segmentation, we superimpose the coarse segmented object onto a level of the hyperstack that can provide the segments needed to refine the segmentation. This superposition is shown in Figure 7.31, Figure 7.32, and Figure 7.33, which correspond to the coarse segmentation of Figure 7.28, Figure 7.29, and Figure 7.30, respectively. Then the final segmentation can be achieved by including/excluding regions using mouse
Three-dimensional multiscale watershed segmentation
327
clicks or dragging over the regions. The application of the refinement step has resulted in a final segmentation, which can be seen in Figure 7.34, Figure 7.35, and Figure 7.36. An improvement after the refinement step can also be seen in Figure 7.37, where a volume rendering from different views is given for the cerebellum in the case of (a) the coarse segmentation and (b) the segmentation after refinement with manual correction.
FIGURE 7.23 (Color figure follows p. 274.) Volume rendering of HAND 100 in the case of (a) thresholding and (b) segmentation. (Provided by Koen Vincken from the Image Sciences Institute, Utrecht, Netherlands.) 7.6 CONCLUSIONS In this chapter, we discussed our novel multiscale segmentation scheme, which is based upon principles of the watershed analysis and the Gaussian scale-space. In particular, the proposed scheme relies on the concept of the dynamics of contours in scale-space, which incorporates a segment-linking that has been advocated by studying the topological changes of the critical-point configuration. An algebraic classification for these topological changes for the gradient-squared evolution in 2-D has been studied. We have investigated the performance of the algorithm by setting up an objective evaluation method. Our conclusion is that it performs better than algorithms using either the
Medical image analysis method
328
superficial or the deep image structure alone. There is a very simple explanation for this behavior. The proposed approach can integrate three types of information into a single algorithm, namely homogeneity, contrast, and scale, and therefore utilizes far more information to guide the segmentation process. This good behavior of the algorithm gave us the hint that its extension to a fully 3-D segmentation algorithm would be worthwhile. Hence, we implemented this extension to 3-D, and our experimental observations are very optimistic. Coupling the production of meaningful 4-D hyperstacks with a user interface adapted to 4-D data manipulation without requiring any training for the user, the 3-D algorithm can lead to robust and reproducible segmentations. These conclusions have been drawn after experiments involving the use of both artificial and real medical images.
FIGURE 7.24 (Color figure follows p. 274.) Hierarchical hyperstack: Level 1.
Three-dimensional multiscale watershed segmentation
329
FIGURE 7.25 (Color figure follows p. 274.) Hierarchical hyperstack: Level 2.
Medical image analysis method
330
FIGURE 7.26 (Color figure follows p. 274.) Hierarchical hyperstack: Level 3.
Three-dimensional multiscale watershed segmentation
331
FIGURE 7.27 (Color figure follows p. 274.) Hierarchical hyperstack: Level 4.
Medical image analysis method
332
FIGURE 7.28 (Color figure follows p. 274.) Coarse segmentation.
Three-dimensional multiscale watershed segmentation
FIGURE 7.29 (Color figure follows p. 274.) Coarse segmentation.
333
Medical image analysis method
334
FIGURE 7.30 (Color figure follows p. 274.) Coarse segmentation.
Three-dimensional multiscale watershed segmentation
FIGURE 7.31 (Color figure follows p. 274.) Coarse segmentation superimposed on a final hierarchical level.
335
Medical image analysis method
336
FIGURE 7.32 (Color figure follows p. 274.) Coarse segmentation superimposed on a final hierarchical level.
Three-dimensional multiscale watershed segmentation
FIGURE 7.33 (Color figure follows p. 274.) Coarse segmentation superimposed on a final hierarchical level.
337
Medical image analysis method
338
FIGURE 7.34 (Color figure follows p. 274.) Final segmentation.
Three-dimensional multiscale watershed segmentation
FIGURE 7.35 (Color figure follows p. 274.) Final segmentation.
339
Medical image analysis method
340
FIGURE 7.36 (Color figure follows p. 274.) Final segmentation.
Three-dimensional multiscale watershed segmentation
341
FIGURE 7.37 Three different views for the segmentation of the cerebellum in the case of (a) coarse segmentation and (b) segmentation after refinement with manual correction.
Medical image analysis method
342
REFERENCES 1. Beucher, S., Segmentation d’Images et Morphologic Mathématique, Ph.D. Thesis, Ecole Nationale Supérieure des Mines de Paris, Fontainebleau, 1990. 2. Grimaud, M., A new measure of contrast: the dynamics, SPIE Proc., 1769, 292–305, 1992. 3. Najman, L. and Schmitt, M., Geodesic saliency of watershed contours and hierarchical segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, 18, 1163–1173, 1996. 4. Pratikakis, I.E., Sahli, H., and Cornells, I., Hierarchy determination of the gradient watershed adjacent groups, in 10th Scandinavian Conf. on Image Analysis, Lappeenranta, Finland, 1997, pp. 685–692. 5. Gauch, J.M. and Pizer, S.M., Multiresolution analysis of ridges and valleys in greyscale images, IEEE Trans. Pattern Analysis and Machine Intelligence, 15, 635–646, 1993. 6. Olsen, O.F., Multiscale watershed segmentation, in Gaussian Scale-Space Theory, Spotting, J. et al., Eds., Kluwer Academic, Dordrecht, Netherlands, 1997, pp. 191–200. 7. Najman, L. and Schmitt, M., Watershed of a continuous function, Signal Process., 38, 99–112, 1994. 8. Meyer, F., Topographic distance and watershed lines, Signal Process., 38, 113–125, 1994. 9. Vincent, L. and Soille, P., Watersheds in digital spaces: an efficient algorithm based on immersion simulations, IEEE Trans. Pattern Analysis and Machine Intelligence, 13, 583–598, 1991. 10. Koenderink, J.J., Solid Shape, MIT Press, Cambridge, MA, 1990. 11. Beucher, S. and Lantuéjoul, C., Use of watersheds in contour detection, in Int. Workshop Image Process., Real-Time Edge and Motion Detection/Estimation, Rennes, France, 1979, pp. 17–21. 12. Friedlander, F. and Meyer, R, A sequential algorithm for detecting watersheds on a grey level image, Acta Stereologica, 6, 663–668, 1987. 13. Meyer, R, Un algorithme optimal de ligne de partage des eaux, in Proc. 8e Congress Reconnaissance des Formes et Intelligence Artificielle, Lyon, France, 1991, pp. 847–857. 14. Dobrin, B.P., Viero, T., and Gabbouj, M., Fast watershed algorithms: analysis and extensions, SPIE Proc., 2180, 209–220, 1994. 15. Hagyard, D., Razaz, M., and Atkin, P., Analysis of watershed algorithms for gray scale images, in Proc. IEEE Int. Conf. on Image Processing, Lausanne, Switzerland, 1996, pp. 41–44. 16. ter Haar Romeny, B.M., Introduction to scale-space theory: multiscale geometric image analysis, Tech. Rep. 96–21, Utrecht University, Netherlands, 1996. 17. Jain, A., Fundamentals of Digital Image Processing, Prentice Hall, Englewood Cliffs, NJ, 1989. 18. Lui, H.K., Two- and three-dimensional boundary detection, Comput. Graphics Image Process., 6, 123–134, 1977. 19. Zucker, S.W. and Hummel, R.A., A three-dimensional edge operator, IEEE Trans. Pattern Analysis Machine Intelligence, 3, 324–331, 1981. 20. Hueckel, M.T., An operator which locates edges in digitized pictures, J. Assoc. Comput. Mach., 18, 113–125, 1971. 21. Gratin, C, De la Representation des Images au Traitement Morphologique d’Images Tridimensionnelles, Ph.D. Thesis, Ecole Nationale Supérieure des Mines de Paris, Fontainebleau, 1993. 22. Monga, O., Deriche, R., and Rocchisani, J.-M., 3-D edge detection using recursive filtering: application to scanner images, CVGIP: Image Understanding, 53, 76–87, 1991. 23. Deriche, R., Using Canny’s criteria to derive a recursively implemented optimal edge detector, Int. J. Comput. Vision, 1, 167–187, 1987. 24. Koenderink, J.J., The structure of images, Biological Cybernetics, 50, 363–370, 1984. 25. Young, R., The Gaussian derivative model for machine vision: visual cortex simulation, J. Optical Soc. Am., July 1987.
Three-dimensional multiscale watershed segmentation
343
26. Weickert, J., Anisotropic Diffusion in Image Processing, B.G. Teubner Stuttgart, Karlsruhe, Germany, 1998. 27. Yuille, A.L. and Poggio, T.A., Scaling theorems for zero crossings, IEEE Trans. Pattern Analysis and Machine Intelligence, 8, 15–25, 1986. 28. Babaud, J., Witkin, A., Baudin, M., and Duda, R., Uniqueness of the Gaussian kernel for scalespace filtering, IEEE Trans. Pattern Analysis and Machine Intelligence, 8, 26–33, 1986. 29. Lindeberg, T., Scale-space for discrete signals, IEEE Trans. Pattern Analysis and Machine Intelligence, 12, 234–254, 1990. 30. Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., and Viergever, M.A., Scale and the differential structure of images, Image Vision Computing, 10(6): 376–388, 1992. 31. Alvarez, L., Guichard, R, Lions, P.-L., and Morel, J.-M., Axioms and fundamental equations of image processing, Arch. Rational Mechanics Anal, 123, 199–257, 1993. 32. Pauwels, E.J., Van Gool, L.J., Fiddelaers, P., and Moons, T., An extended class of scaleinvariant and recursive scale-space filters, IEEE Trans. Pattern Analysis and Machine Intelligence, 17, 691–701, 1995. 33. Nielsen, M., Florack, L., and Deriche, R., Regularization, scale-space and edge detection filters, J. Math. Imag. Vision, 7, 291–307, 1997. 34. Lindeberg, T., On the axiomatic foundations of linear scale-space, in Gaussian Scale-Space Theory, Sporring, J. et al., Eds., Kluwer Academic, Dordrecht, Netherlands, 1997, pp. 75–97. 35. Florack, L., Data, models and images, in Proc. IEEE Int. Conf. Image Processing, Lausanne, Switzerland, 1996, pp. 469–472. 36. Lifshitz, L.M. and Pizer, S.M., A multiresolution hierarchical approach to image segmentation based on intensity extrema, IEEE Trans. Pattern Analysis and Machine Intelligence, 12, 529– 541, 1990. 37. Lindeberg, T., Scale-Space Theory in Computer Vision, Kluwer Academic, Dordrecht, Netherlands, 1994. 38. Gerig, G., Szekely, G., Israel, G., and Berger, M., Detection and characterization of unsharp blobs by curve evolution, in Information Processing in Medical Imaging, Bizais, Y. et al., Eds., Kluwer Academic, Netherlands, 1995, pp. 165–176. 39. Eberly, D., Geometric Methods for Analysis of Ridges in N-dimensional Images, Ph.D. Thesis, University of North Carolina at Chapel Hill, Chapel Hill, 1994. 40. Vincken, K.L., Koster, A.S.E., and Viergever, M.A., Probabilistic multiscale image segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, 19, 109–120, 1997. 41. Vincken, K.L., Probabilistic Multiscale Image Segmentation by the Hyperstack, Ph.D. Thesis, University of Utrecht, Netherlands, 1995. 42. Koster, A., Linking Models for Multiscale Image Segmentation, Ph.D. Thesis, University of Utrecht, Netherlands, 1995. 43. Niessen, W.J., Vincken, K.L., Weickert, J.A., and Viergever, M.A., Nonlinear multiscale representations for image segmentation, Comput. Vision Image Understanding, 66, 233-245, 1997. 44. Kalitzin, S.N., ter Haar Romeny, B.M., and Viergever, M., On topological deepstructure segmentation, in Proc. IEEE Int. Conf. on Image Processing, Santa Barbara, CA, 1997, pp. 863866. 45. Griffin, L.D., Robinson, G., and Colchester, A.C.F., Hierarchical segmentation satisfying constraints, in Proc. Br. Machine Vision Conf., Hancock, E., Ed., 1994, pp. 135-144. 46. Jackway, P.T., Gradient watersheds in morphological scale-space, IEEE Trans. Image Process., 5, 913-921, 1996. 47. Pratikakis, I., Watershed-Driven Image Segmentation, Ph.D. Thesis, Vrije Universiteit Brussel, Brussels, 1998. 48. Johansen, P., Local analysis of image scale-space, in Gaussian Scale-Space Theory, Sporting, J. et al., Eds., Kluwer Academic, Dordrecht, Netherlands, 1997, pp. 139-146.
Medical image analysis method
344
49. Damon, J., Local Morse theory for Gaussian blurred functions, in Gaussian ScaleSpace Theory, Sporting, J. et al., Eds., Kluwer Academic, Dordrecht, Netherlands, 1997, pp. 147-163. 50. Pratikakis, I.E., Sahli, H., and Cornelis, J., Low-level image partitioning guided by the gradient watershed hierarchy, Signal Process., 75, 173-195, 1999. 51. Papoulis, A., Probability, Random Variables and Stochastic Processes, McGraw-Hill International, Singapore, 1991. 52. Pizer, S.M., Cullip, T.J., and Fredericksen, R.E., Toward interactive object definition in 3-D scalar images, in 3-D Imaging in Medicine, Hohne, K.H. et al., Eds., vol. F60, NATO ASI Series, Springer-Verlag, Berlin, 1990, pp. 83-105. 53. Maes, F., Vandermeulen, D., Suetens, P., and Marchal, G., Automatic image partitioning for generic object segmentation in medical images, in Information Processing in Medical Imaging, Bizais, Y. et al., Eds., Kluwer Academic, Dordrecht, Netherlands, 1995, pp. 215-226. 54. Pratikakis, I.E., Deklerck, R., Salomie, A., and Cornelis, J., Improving precise interactive delineation of 3-D structures in medical images, in Computer Assisted Radiology, Lemke, H.U., Ed., Elsevier, Berlin, 1997, pp. 215-220. 55. Hearn, D. and Baker, M.P, Computer Graphics, Prentice-Hall, Englewood Cliffs, NJ, 1994.
8 A MRF-Based Approach for the Measurement of Skin Thickness in Mammography Antonis Katartzis, Hichem Sahli, Jan Cornells, Lena Costaridou, and George Panayiotakis 8.1 INTRODUCTION Breast skin changes are considered by physicians as an additional sign of breast pathology. They can be divided into two major categories, namely skin retraction and localized or generalized skin thickening, which can be either benign or malignant. The skin can attain a thickness of 10 to 20 times normal before it can be perceived as abnormal by palpation [1, 2]. Both retraction and thickening may be evident mammographically before they can be clinically detected. The existing techniques for the measurement of breast skin thickness are based on manual estimations on the mammograms, using simple measuring devices [3, 4]. Considering the continuous evolution of computer-aided diagnostic systems, the aforementioned manual methods appear quite obsolete. As far as time and accuracy are concerned, the quantitative analysis of breast skin changes can be substantially improved with a computer-assisted measurement technique. We have developed a computerized method for the measurement of breast skin thickness from digitized mammograms that involves a salient feature (hereinafter denoted as a skin feature) that captures the radiographic properties of the skin region and a dedicated Markovian model that characterizes its geometry [5]. During a first processing stage, we apply a combination of global and local thresholding operations for breast border extraction. The estimation of the skin feature comprises a method for the exclusion of the region of the nipple and an estimation of the gray-level gradient orientation, based on a multiscale wavelet decomposition of the image. Finally, the region of the skin is identified based on two anatomical properties, namely its shape and its relative position with respect to the surrounding mammographic structures. This a priori knowledge can be easily modeled in the form of a Markov random field (MRF), which captures the contextual constraints of the skin pixels. The proposed MRF model is defined on a binary set of interpretation labels (skin, no skin), and the labeling process is carried out using a maximum a posteriori probability (MAP) estimation rule. The method is tested on a series of mammograms with enhanced contrast at the breast periphery, obtained by an exposure-equalization technique during image acquisition. The results are compared with manual measurements performed on each of the films.
Medical image analysis method
346
The chapter is organized as follows. In Section 8.2 we present the main principles of Markov random field theory and its application to labeling problems and provide an overview of related work on mammographic image analysis. In Section 8.3 we describe the image-acquisition process and state the main properties of the skin as viewed in a typical mammogram. Section 8.4 initially refers to the extraction of the salient feature that discriminates the skin from other anatomical structures at the breast periphery. The section concludes with a description of the proposed Markovian model and the labeling scheme for the extraction of skin region. The validation of our method, which includes representative results for the measurement of skin thickness, is presented in Section 8.5. Finally, a discussion and suggested directions for future research are given in Section 8.6. 8.2 BACKGROUND 8.2.1 MRF LABELING The use of contextual constraints is indispensable for every complex vision system. A scene is understood through the spatial and visual context of the objects in it; the objects are recognized in the context of object features at a lower level representation; the object features are identified based on the context primitives at an even lower level; and the primitives are extracted in the context of image pixels at the lowest level of abstraction. Markov random field theory provides a convenient and consistent way of modeling context-dependent entities, constituting the nodes of a graph [6]. This is achieved through characterizing mutual influences among such entities using MRF probabilities. Theory tells us how to model the a priori probability of context-dependent patterns. A particular MRF model favors its own class of patterns by associating them with larger probabilities than other pattern classes. Such models, defined on regular lattices of image pixels, have been effectively used in texture description and segmentation [7], as well as in image restoration and denoising [8, 9]. In higher levels of abstraction, MRF models are able to encode the spatial dependencies between object features, giving rise to efficient schemes for perceptual grouping and object recognition [10]. We will briefly review the concept of MRF defined on graphs. Let G={S,N} be a graph, where S={1, 2,…, m} is a discrete set of nodes, representing either image pixels or structures of higher abstraction levels, and is a given neighborhood system on G. Ni is the set of all nodes in S that are neighbors of i, such that 1. 2. Let L={L1, L2,…, Lm} be a family of random variables defined on S, in which each random variable Li takes a value li in a given set (the random variables Li’s can be numeric as well as symbolic, e.g., interpretation labels). The family L is called a MRF, with respect to the neighborhood system N, if and only if 1. P(L=l)>0, for all realizations l of L 2.
A MRF-based approach for the measurement of skin thickness
347
where P(L=l)=P(L1=l1, L2=l2, , Lm=lm) (abbreviated by P(l)) and P(li|lj) are the joint and conditional probability functions, respectively. Intuitively, the MRF is a random field with the property that the statistics at a particular node depend on that of its neighbors. An important feature of the MRF model defined above is that its joint probability density function has a general functional form, known as Gibbs distribution, that is defined based on the concept of cliques. A clique c, associated with the graph G, is a subset of S such that it contains either a single node or several nodes that are all neighbors of each other. If we denote the collection of all the cliques of G, with respect to the neighborhood system N, as C(G,N), then the general form of a realization of P(l) can be expressed as the following Gibbs distribution (8.1) where is called the Gibbs energy function and Vc(l) the clique potential The functional form of these functions defined on the corresponding cliques potentials conveys the main properties of the Markovian model. Finally, is a normalizing constant called the partition function. In the case of a labeling problem, where L represents a set of interpretation labels and d={d1, ,dm} a set of physical measurements that correspond to the realization of an observation field D on S, the most optimal labeling of the graph G can be obtained based on a maximum a posteriori probability (MAP) criterion. According to the Bayes rule, the posterior probability can be computed using the following formulation (8.2) where P(L=l) is the prior probability of labeling l, p(D=d|L=l) is the conditional probability distribution function (PDF) of the observations d, also called the likelihood function of l for d fixed, and p(D=d)is the density of d, which is constant when d is given. In a more simplified form, Equation 8.2 can be written as (8.3) By associating an energy function to p(d|l) and P(l), the posterior probability obtains the following form (8.4) Following this formulation, the optimal labeling is then accomplished via the minimization of the posterior energy function U(l|d) [6]. The combinatorial problem of finding the global minimum of U(l|d) is generally solved using one of the following relaxation algorithms: (a) simulated annealing (SA) [8], or (b) iterated conditional modes (ICM) [12].
Medical image analysis method
348
8.2.2 MRF-BASED MAMMOGRAPHIC IMAGE ANALYSIS Several mammographic image analysis techniques, based on MRF models, have been proposed in the literature. These models are capable of representing explicit knowledge of the spatial dependence between different anatomical structures and can lead to very efficient image-segmentation schemes. The segmentation process is performed by defining either a MRF on the original lattice of image pixels or a cascade of MRF models on a multiresolution, pyramidal structure of the image. In both cases, the parameter estimation of the Markovian priors is carried out either empirically or using selected training data. In the early work of Karssemeijer [13], a stochastic Bayesian model was used for segmenting faint calcifications from connective-tissue structures. The method was based on local contrast and orientation observation measures and a single-resolution MRF describing both spatial tissue dependencies and the clustering characteristics of microcalcifications. Comer et al. [14] proposed a statistical algorithm for the segmentation of mammograms into homogeneous texture regions. In their approach, both the mammographic image and the underlying label field (representing a finite number of tissue classes) are modeled as discrete-parameter random fields. The labeling is performed via a maximization of the posterior marginals (MPM) process [11], where the unknown likelihood parameters are estimated using the expectation-maximization (EM) algorithm. In recent years, the need to reduce the complexity of MRF models on large-image lattices gave rise to a series of hierarchical/multiresolution analysis methods. Li et al. [15] developed a technique for tumor detection based on an initial segmentation using a multiresolution MRF model and a postprocessing classification step based on fuzzy, binary decision trees. With a pyramidal image representation and a predefined set of tissue labels, the segmentation is carried out in a top-down fashion, starting from the lowest spatial resolution and considering the label configurations as the realizations of a dedicated MRF. The segmentation at each resolution level comprises a likelihoodparameter estimation step and a MAP labeling scheme using the ICM algorithm, initialized with the result of the previous resolution. In the approach of Zheng et al. [16], a similar hierarchical segmentation scheme is applied on a multiresolution tower constructed with the use of the discrete wavelet transform. At each resolution, the lowfrequency subband is modeled as a MRF that represents a discrete set of spatially dependent image-intensity levels (tissue signatures) contaminated with independent Gaussian noise. Finally, Vargas-Voracek and Floyd [17] introduced a hierarchical MRF model for mammographic structure extraction using both multiple spatial and intensity resolutions. The authors presented qualitative results for the identification of the breast skin outline, the breast parenchyma, and the mammographic image background. All of the aforementioned labeling techniques consider the image labels (tissue types) as being mutually exclusive, without taking into account the projective nature of the mammographic image modality. McGarry and Deriche [18] presented a hybrid model that describes both anatomical tissue structural information and tissue-mixture densities, derived from the mammographic imaging process. Spatial dependencies among anatomical structures are modeled as a MRF, whereas image observations, which represent the mixture of several tissue components, are expressed in terms of their linear
A MRF-based approach for the measurement of skin thickness
349
attenuation coefficients. These two sources of information are combined into a Bayesian framework to segment the image and extract the regions of interest. The MRF-based method presented in this chapter falls in the scope of image segmentation/interpretation for the identification of an anatomical structure situated at the breast periphery (skin region). It uses (a) an observation field that encompasses the projective, physical properties of the mammographic image modality and (b) a MRF model, defined on the full-resolution image lattice, that describes the geometric characteristics of the skin in relation to its neighboring anatomical structures. The following sections present in detail the different modules of the proposed approach. 8.3 DATA AND SCENE MODEL 8.3.1 IMAGE ACQUISITION In general, the effect of overexposure at the region of the film corresponding to the breast periphery results in a poor visualization of the skin region, hampering its identification. Contrast enhancement at the breast periphery can be accomplished with a series of exposure or density-equalization techniques. Exposure equalization can be performed using either anatomical filters [19, 20] or more sophisticated techniques that modulate the entrance exposure, based on feed-back of the regional variations in X-ray attenuation [21, 22]. The existing methods for density equalization mainly employ computer-based procedures for the matching of the optical density between the periphery and the central part of the breast [23–27]. In our study, during the acquisition of each mammogram, we used the anatomical filter-based exposure-equalization (AFEE) technique of Panayiotakis et al. [20]. This technique utilizes a set of solid anatomical filters made of Polyamide 6, as this material meets the basic requirements of approximately unit density, homogeneity, and ease of manufacture. The anatomical filters have a semicircular band shape with increasing thickness toward the periphery. The AFEE technique produces images of improved contrast characteristics at the breast periphery, ensuring minimization of the total dose to the breast through the elimination of a secondary exposure to patients with an indication of peripheral breast lesions. Its performance has been extensively evaluated using both clinical and phantom-based evaluation methods [28, 29]. The mammographic images used in this study were digitized using an Agfa DuoScan digitizer (Agfa Gevaert, Belgium) at 12-bit pixel depth and a spatial resolution of 100 µm/pixel. According to quality control measurements, this film digitizer is suitable for mammogram digitization, as the optical-density range of the cases used for validation falls into the linear range of its input/output response curve [30]. Figure 8.1 shows an example from our test set of mammograms.
Medical image analysis method
350
8.3.2 RADIOGRAPHIC AND GEOMETRICAL PROPERTIES OF THE SKIN Our approach for breast skin thickness extraction involves the construction of a physical model of the skin region that describes both its radiographic and geometric properties. This model is based on the following three assumptions. 1. Anatomically, if we consider an axial section of the breast, the skin is a thin stripe of soft tissue situated at its periphery. At its vicinity, there is the subcutaneous fat, which radiographically is viewed as a structure with higher optical density than the one of the skin. This anatomical information, together with the fact that mammography is a projection imaging modality, will be the basis of our model. The region of the image that the physicians indicate as skin does not correspond to the real one at any of the breast sections, and it is always bigger than the skin thickness that a histological examination might give. In fact, this virtual skin, indicated by the physicians, is the superposition of thin stripes of soft tissue that correspond to the real skin at several axial sections of the breast (Figure 8.2).
FIGURE 8.1 Original image. 2. The shape of the skin’s external border should coincide with the shape of the breast. Most of the time, this appears to be regular, and it can be approximated by a circle or
A MRF-based approach for the measurement of skin thickness
351
an ellipse. In an effort to make the shape estimation more accurate and reliable, we will not consider the breast border as a whole. Instead, we make the assumption that it can be divided into smaller segments, each of them corresponding to an arc of a circle. 3. From the configuration of Figure 8.2, we can infer that the external border of the skin in a mammographic image is mainly formed by the projection of the central section of the breast. As we move inward, starting from the breast periphery, we notice also the projections of the skin segments that belong to breast sections situated above and below the central one. In the digitized gray-level image, this results in a gradient at the periphery of the breast (where the skin is located), oriented perpendicularly to the breast border. All the previously described assumptions are the main components of our model. Their combination leads to the following conclusion: The salient feature (skin feature) that reveals the skin layer of the breast (as this is viewed on the mammogram) is the angle formed by the gradient vector and the normals to the breast border. Deeper structures, underneath the skin layer, do not conform to the previously mentioned radiographic and geometrical skin model.
FIGURE 8.2 Geometrical representation of the imaging process of the skin.
Medical image analysis method
352
8.4 ESTIMATION AND EXTRACTION METHODS 8.4.1 SKIN FEATURE ESTIMATION 8.4.1.1 External Border of the Skin The external border of the skin separates the breast from the surrounding background, thus it coincides with the breast border. Several computerized schemes have been developed for the automatic detection of the breast region. Most of them make use of the gray-level histogram of the image. Yin et al. [31] have developed a method to identify the breast region on the basis of a global histogram analysis. Bick et al. [32] suggested a method based on the analysis of the local gray-value range to classify each pixel in the image. Davies and Dance [33] used a histogram-derived threshold in conjunction with a mode filter to exclude uniform background areas from the image. Chen et al. [34] proposed an algorithm that detects the skin-line edge on the basis of a combination of histogram analysis and a Laplacian edge detector. Mendez et al. [35] used a fully automatic technique to detect the breast border and the nipple based on the gradient of the gray-level values. Our approach initially employs a noise-suppression median-filtering step (with a filter size equal to five pixels), followed by an automated histogram thresholding technique. We assume that the histogram of each mammogram exhibits a certain bimodality: each pixel in the image belongs either to the directly exposed region (image background) or to the potential object of interest (breast). For this purpose, we have chosen the minimumerror thresholding technique proposed by Kittler and Illingworth [36]. The principal idea behind this method is the minimization of a criterion function related to the average pixel classification error rate, under the assumption that the object and the background graylevel values are normally distributed. Unfortunately, the presence of the anatomical filter, used for exposure equalization, disturbs the bimodality of the image histogram. A threshold selection, using the histogram of the whole image, results in an inaccurate identification of the breast border. More specifically, the gray values corresponding to the anatomical filter induce a systematic error that increases the value of the threshold compared with the optimal one. The size of the resulting binary region will always be smaller than the real size of the breast. To overcome this problem, we try to combine both local and global information. Initially, an approximation of the breast’s border is estimated by performing a global thresholding on the histogram of the whole image using the method of Kittler and Illingworth. After thresholding, the breast border is extracted by using a morphological opening operator with a square flat structuring element of size 5, followed by a 4-point connectivity tracking algorithm. We then define overlapping square windows along the previously estimated border, where we apply local thresholding using the same approach as before (Figure 8.3). All the pixels situated outside the union of the selected windows keep the label attributed to them by the initial global
A MRF-based approach for the measurement of skin thickness
FIGURE 8.3 Application of local thresholding for the extraction of the skin’s external border.
353
Medical image analysis method
354
FIGURE 8.4 External border of the skin. thresholding process. The size of each window is empirically set to a physical length of approximately 1.5 cm (150×150 pixels). The histogram of each window can now be considered as bimodal, containing only pixels from the breast and the filter. For each of them, a threshold is estimated using the method of Kittler and Illingworth [36]. Its final value is the average between the threshold found in the current region and the ones of its two neighbors. Because of the overlap between neighboring windows, the resulting binary image is smooth, with no abrupt changes in curvature. Finally, the rectified breast border is obtained by applying, once again, a morphological opening operator with a square flat structuring element of size 5, followed by a tracking algorithm. The final result of our approach, applied to the image of Figure 8.1, is presented in Figure 8.4. 8.4.1.2 Exclusion of the Region of the Nipple Estimation of the Normals to the Breast Border Based on the second assumption of our skin model (see Section 8.3.2), we can divide the breast border into several segments with equal lengths and consider each of them as
A MRF-based approach for the measurement of skin thickness
355
belonging to a circular arc. The parameters of these circles (namely their radii and the coordinates of their centers) are estimated by using the Levenberg-Marquardt iterated method for curve fitting [37]. A χ2 merit function is defined that reflects the agreement between the data and the model. In our case, the data are the coordinates of the border points, and the model is a circle. The optimal solution corresponds to the minimum of the merit function. Unfortunately, this circular model of the breast border is disturbed by the presence of the nipple. Moreover, when the doctors examine a mammogram, they usually search for possible skin changes along the breast border, except of the region behind the nipple, mainly because of the presence of other anatomical structures that have similar densities as the skin (e.g. breast areola). For these reasons, we first exclude the region of the nipple and then work only with the remaining part of the breast border. Nipple detection and localization is an ongoing research topic in mammographic image analysis [35, 38]. In our scheme, the exclusion of the nipple is performed in three steps [5]: 1. The breast border is divided in three equal segments. 2. We choose the central border segment (nipple included) and estimate the coordinates of the circle that corresponds to it using the method of Levenberg-Marquardt [37]. 3. We consider the profile of distances between the center of the circle and each point of the central border segment. The border points that correspond to the nipple are situated between the two most significant extrema of the first derivative of the profile of distances. This technique works well in practice, except for extreme cases where the nipple is not visible in the mammogram because of possible retraction or other types of deformation. In these cases, manual intervention is needed. The removal of the nipple allows an efficient fitting of circular arcs to the remaining breast border and an accurate estimation of the directions normal to it. Experiments have shown that a number of five circles is sufficient for this purpose. The directions normal to the breast border can be found by simply connecting every point of each border segment to the center of the circle that corresponds to it. 8.4.1.3 Estimation of Gradient Orientation Most of the time, the image gradient is considered as a part of the general framework of edge detection. The basic gradient operators of Sobel, Prewitt, or Roberts [39] are very sensitive to noise, are not flexible, and cannot respond to a variety of edges. To cope with these types of problems, several multiscale approaches for edge detection are proposed in the literature, such as the Gaussian scale-space approach of Canny [40] or methods based on the wavelet transform [41, 42]. In our study, the estimation of the multiscale gradient is performed using the wavelet approach presented by Mallat and Zhong [42], which is equivalent to the multiscale operator of Canny. However, due to the pyramidal algorithm involved in the calculation of the wavelet transform, its computational complexity is significantly lower than the computational complexity of Canny’s approach. In waveletbased gradient estimation, the length of the filters involved in the filtering operation is
Medical image analysis method
356
constant, while the number of coefficients of the Canny filters increases as the scale increases. The method of Mallat and Zhong [42] is based on wavelet filters that correspond to the horizontal and vertical components of the gradient vector. Let be a twodimensional (2-D) function representing the image, and a smoothing function that becomes zero at infinity and whose integral over x and y is equal to 1. If we define two wavelet functions ψ1(x, y) and ψ2(x, y) such as (8.5) then the wavelet transform of f(x, y) at a scale s has two components defined by (8.6) By
we denote the dilation of ψi(x, y) by the scale factor s, so that: Following these notations, the orientation of the gradient
vector is given in Equation 8.7. (8.7) In the case of a discrete 2-D signal, the previously described wavelet model does not keep a continuous scale parameter s. Instead, it takes the form of a discrete dyadic wavelet transform, which imposes the scale to vary only along the dyadic sequence When we pass from the finest scale (j= 1) to coarser ones (j> 1), the signal-to-noise ratio in the image is increased. This results in the elimination of random and spurious responses related to the presence of noise. On the other hand, as the scale increases, the gradient computation becomes less sensitive to small variations of the gray-level values, resulting in a low precision of edge localization and blurring of the image boundaries. The selection of the optimal scale depends on the spatial resolution of the digitized mammograms. For our images (spatial resolution of 100 µm/pixel), we found that the third decomposition scale (j= 3) gives a good approximation of the image gradient, as far as our region of interest is concerned (breast periphery). An empirical study showed that the second and the fourth scale of the wavelet decomposition are optimal for mammograms digitized with 200 µm/pixel and 50 µm/pixel, respectively. In our application, the wavelet decomposition and the estimation of the gradient orientation (Equation 8.7) were performed using the Wave2 source code [43] developed by Mallat and Zhong [42]. By knowing the gradient orientation and the normals to the breast border, we can produce a transformed image that represents the values of our skin feature and highlights the region of the skin. At each point of the original image, the skin feature (as this is defined in Section 8.3.2) can be derived by estimating the angular difference between the gradient vector and the normals to the breast border. Figure 8.5 shows the transformed image that represents the estimated angular difference for the example of Figure 8.1,
A MRF-based approach for the measurement of skin thickness
357
where black represents a difference of zero degrees and white a difference of 180°. The dark stripe along the breast periphery corresponds to the region of the skin. Note that the middle part of the image, where the nipple is situated, has been removed.
FIGURE 8.5 Spatial distribution of the skin feature throughout the whole image. 8.4.2 SKIN-REGION EXTRACTION—MRF FRAMEWORK The knowledge of the spatial distribution of the skin feature (Figure 8.5) is the starting point for the identification of the skin. This is carried out with a labeling process based on a Markovian skin model. The following two subsections present the basic principles of our labeling process. 8.4.2.1 Selection of a Region of Interest To reduce the computational burden of the labeling algorithm, we extract a region of interest (ROI), situated at the breast periphery, containing the skin and a part of the inner structures of the breast. The ROI is a stripe with length equal to the length of the breast
Medical image analysis method
358
border. Its width is approximately 3 cm and corresponds to the maximum of the clinically observed thicknesses for the region that contains the skin and the subcutaneous fat. Figure 8.6(a) shows an example of our region of interest, situated at the lower part of Figure 8.5. After the extraction of the ROI, we perform a transformation of the coordinates of its pixels to facilitate the skin identification process. Let Ny be the number of pixels that corresponds to the width of our ROI, and Nx the number of pixels of the breast border. The result of the spatial transformation is a Nx×Ny array with the following properties:
FIGURE 8.6 (a) ROI corresponding to the lower part of Figure 8.5. (b) Stretched version of the selected ROI (array A). • The first row represents the Nx pixels of the skin’s external border. • The following rows correspond to the Ny layers of pixels, situated behind the skin, toward the breast parenchyma.
A MRF-based approach for the measurement of skin thickness
359
• Every column contains the Ny pixels found by scanning the ROI along a line perpendicular to the breast border. The resulting array (denoted by A) can be considered as a stretched version of our ROI (Figure 8.6(b)). 8.4.2.2 Markovian Skin Model Labeling Scheme We consider the image formed by the array A of Figure 8.6(b) and represent its rectangular lattice as a graph G={S, N}, where S={1, 2, , m} is the discrete set of pixels and N a particular neighborhood system. At each node i we associate an observation measure di that represents the value of the skin feature at the current position, and a binary label li, where li=1 if i belongs to the skin and li=0 otherwise. Every configuration of the labels l={l1, , lm} is considered as the realization of a Markov random field denoted by L={L1, , Lm}. Following a MAP estimation criterion, as described in Section 8.2.1, the optimal labeling of G is found by minimizing the posterior energy function U(l|d) (see Equation 8.4). In our application, the conditional energy term U(l\d) associates a Gaussian distribution to the observations of skin and no-skin classes. The prior energy U(l) is expressed in terms of clique potential functions that describe contextual dependencies between the labels. The selection of the neighborhood system and the potential functions are driven by our a priori knowledge about the geometrical characteristics of the skin region. The following three subsections describe the explicit form of U(l|d) and the optimization procedure for its minimization. 8.4.2.2.1 Conditional Probability Distribution We assume that each observation di is only conditioned by the corresponding label li, and that the dependencies between the different observations are exclusively determined by the dependencies between the labels li. In this case, the conditional probability distribution p(d|l) can be defined as (8.8) This type of probability density function can be deduced from the observation field d and reflects the likelihood of every pixel as either belonging or not belonging to the skin. We assume that the observation values d of both skin and no-skin regions are normally distributed. This implies that (8.9)
Medical image analysis method
360
where and are the mean value and standard deviation of the class designated by li. From Equation 8.8 and Equation 8.9, we obtain the following expression for the conditional energy term U(d|l): (8.10)
The mean value and standard deviation of the skin (li=1) and no-skin (li=0) classes (µ1, σ1 and µ0, σ0, respectively) can be estimated using the skin-feature values at the first and last row of the array A, respectively, as both are good representatives of the two classes. 8.4.2.2.2 Prior Probability of Labelings Our a priori knowledge about the geometrical characteristics of the skin generates the following two assumptions: 1. A pixel i belongs to the skin if: • All pixels between i and the external border of the skin (outer layer of our ROI), situated on the same perpendicular to the border line as i, also belong to the skin. • There are neighboring pixels, situated at the same breast layer as i, belonging to the skin. 2. A pixel i does not belong to the skin if: • All pixels between i and the inner layer of our ROI, situated on the same perpendicular to the border line as i, do not belong to the skin. • There are neighboring pixels, situated at the same breast layer as i, that do not belong to the skin. To express these contextual dependencies, we define a neighborhood system where the neighbors Ni of a pixel i are all the pixels, except of i, situated in the same column of the array A, together with its V closest horizontal neighbors (V/2 at each side). The parameter V can be considered as a quantization factor that depends on the resolution of the digitized mammograms and represents the minimum expected length along the skin, where no variations of its thickness are present. If we consider only pairwise cliques of the form the prior probability of labelings P(l) can be expressed in terms of a prior energy function U(l) and a set of clique potentials Vc(li,lj) (8.11) where Vc(li, lj) is a clique potential function associated with each clique c(i, j). For each pixel i (with coordinates (xi, yi)) the clique potential Vc(li,lj) depends on the label li and on the relative position of its neighbor j (with coordinates (xj, yj)). In particular, the potential function has the following form:
A MRF-based approach for the measurement of skin thickness
361
(8.12)
where (8.13) These types of potential functions penalize inconsistent configurations of labels with respect to the assumptions 1 and 2. High values of the penalization factor w favor more uniform representations of the skin region but at the same time suppress small variations of the skin thickness. The optimal value of w should satisfy both requirements of uniformity and accuracy. 8.4.2.2.3 MAP Estimation From the combination of Equations 8.4, 8.10, and 8.11, the posterior probability P(l|d) can be expressed in terms of a global energy function U(l|d), where (8.14)
The MAP configuration of the label field is estimated by minimizing the energy function U(l|d). For the minimization of U(l|d), we follow a simulated annealing scheme based on a polynomial-time cooling schedule [44]. Figure 8.7 shows the evolution of the labeling process toward the minimum energy state, using as example the array A of Figure 8.6(b). In this particular case, the parameters V and w were set to 20 and 2, respectively. Finally, the last step of our approach consists of the mapping of the labeled pixels of A back to the coordinates of the original image. 8.5 RESULTS 8.5.1 MEASUREMENT OF SKIN THICKNESS In our study, the measurements of the skin thickness are taken in regular intervals along the breast border. Starting from each border point, we consider a perpendicular to the border line segment, which extends up to the internal border of the skin. The skin thickness at the particular border point corresponds to the length of this line segment.
Medical image analysis method
362
For the representation of the measurement results, we use the position of the nipple as a reference point. We consider a polar representation of the breast border points using the orthogonal coordinate system of Figure 8.8. The x-axis corresponds to the image border, occupied by the largest part of the breast, and the y-axis is a vertical line that passes through the middle of the nipple. The measurement position of the skin thickness in a given border point P is adequately defined by the polar coordinate 6 of this particular point. Following these notations, angle 6 takes values in the interval [–90°, +90°], depending on the relative position of the measuring point P with respect to the nipple.
FIGURE 8.7 Energy minimization using simulated annealing. The parameters V and w are equal to 20 and 2, respectively, (a) Temperature T=100. (b) Temperature T=50. (c) Final result after convergence at temperature T=0.01.
A MRF-based approach for the measurement of skin thickness
363
FIGURE 8.8 Polar representation of the breast border points. 8.5.2 CLINICAL EVALUATION Our approach was tested on ten different cases of mammographic images with craniocaudal (CC) views of the breasts, two of them exhibiting advanced skin thickening at the breast periphery. The normal range of breast skin thickness in CC views, as reported in the survey of Pope et al. [4], is between 0.5 and 2.4 mm, with a standard deviation of approximately ±0.3 mm. Figure 8.1, Figure 8.10(a), and Figure 8.11 (a) present three examples of normal cases, with no severe skin changes along the breast periphery. Figure 8.12(a) corresponds to a pathological case, with advanced skin thickening, which is clearly visible at the upper part of the mammogram. The skindetection results for these four examples are presented in Figure 8.9(a, b), Figure 8.10(b, c), Figure 8.1 1(b, c), and Figure 8.12(b, c), respectively. The results were obtained using the same values for the parameters V and w. Given the resolution of our images, V has been set to a value equal to 20 pixels. The penalization factor w in Equation 8.13 has been empirically set to 2. On the other hand, the parameters and in Equation 8.10 are estimated on each image separately, as explained in Section 8.4.2.2.
Medical image analysis method
364
FIGURE 8.9 (a) The detected skin region that corresponds to the
A MRF-based approach for the measurement of skin thickness
mammogram of Figure 8.1. (b) Skin thickness along the breast border.
FIGURE 8.10 (a) Original image, (b) Detected skin region, (c) Skin thickness along the breast border.
365
Medical image analysis method
366
FIGURE 8.11 (a) Original image, (b) Detected skin region, (c) Skin thickness along the breast border.
A MRF-based approach for the measurement of skin thickness
367
The validation of our method is performed by comparing the detected skin thickness values with the ones obtained after a manual measurement on each film at several predefined points along the breast periphery. This process resulted in an average root mean square (RMS) error of 0.3 mm for normal cases, reaching a maximum value of 0.5 mm in pathological cases with skin thickening. The maximum RMS error was observed in the case of Figure 12(a), in which the exact borders of
FIGURE 8.12 (a) Original image, (b) Detected skin region, (c) Skin thickness along the breast border. the skin are not clearly defined because of its advanced deformation. Compared with the normal range of breast skin thickness, the estimated errors are relatively small and do not influence the clinical assessments.
Medical image analysis method
368
The computational time of our approach is rather demanding, mainly because of the optimization step (simulated annealing). Nevertheless, the optimization scheme is stable and converges to a good approximation of the global minimum solution, independently of the initial realization of labelings. For a 2300×1400 image on a Pentium III at 500 MHz, the estimation of the spatial distribution of the skin feature lasts around 1 min, whereas the labeling process takes approximately 15 min. 8.6 CONCLUSIONS We present a model-based method for the measurement of skin thickness in mammography and, at the same time, tackle secondary issues emerging from the solution of this problem, like the identification of the breast border and the extraction of the region of the nipple. The skin model is based on physical and geometrical a priori knowledge about the skin to reveal the feature that discriminates it from the other anatomical structures of the breast. The MRF framework is used to endow this a priori knowledge to a labeling scheme, which identifies the skin structure. Experimental results illustrate the efficiency of our method, which produced results comparable with manual measurements performed on each film. The estimation of the proposed saliency skin feature requires a good visualization of the breast periphery. The employed anatomical filter for exposure equalization at the breast periphery currently limits the application of the technique to craniocaudal (CC) views. A potential alternative could be a digital density-equalization technique [25–27] that allows the use of both CC and mediolateral (ML) views. Finally, future work will involve the extension of our method toward a hierarchical/multiresolution Markovian approach. The multiresolution pyramid can be created via the dyadically subsampled counterpart of the wavelet transform of Section 8.4.1.3. Based on such hierarchy, the skin feature is estimated at each resolution level separately, without the empirical choice of any particular decomposition scale, and the labeling process can be performed using a computationally efficient top-down hierarchical scheme as presented by Li et al. [15]. REFERENCES 1. Putman, C.E. and Ravin, C.E., Textbook of Diagnostic Imaging, W.B. Saunders Co., 1994. 2. Tabar, L. and Dean, P.B., Anatomy of the breast, in Teaching Atlas of Mammography, 2nd ed., Frommhold, W. and Thurn, P., Eds., Thieme, New York, 1985. 3. Willson, S.A., Adam, A.J., and Tucker, A.K., Patterns of breast skin thickness in normal mammograms, Clinical Radiol, 33, 691, 1982. 4. Pope, T.L. et al., Breast skin thickness: normal range and causes of thickening shown on filmscreen mammography, J. Can. Assoc. Radiologists, 85, 365, 1984. 5. Katartzis, A. et al., A model-based technique for the measurement of skin thickness in mammography, IEEE Medical Biological Eng. Computing, 40, 153, 2002. 6. Li, S.Z., Markov Random Field Modeling in Computer Vision, Computer Science Workbench, Springer-Verlag, Heidelberg, 1995.
A MRF-based approach for the measurement of skin thickness
369
7. Derin, H. and Elliott, H., Modeling and segmentation of noisy textured images using Gibbs random fields, IEEE Trans. Pattern Anal. Mach. Intell, 9, 39, 1987. 8. Geman, S. and Geman, D., Stochastic relaxation, Gibbs distributions, and Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell, 6, 721, 1984. 9. Pizurica, A. et al., A joint inter- and intrascale statistical model for Bayesian wavelet-based image denoising, IEEE Trans. Image Processing, 11, 545, 2002. 10. Katartzis, A. et al., A model-based approach to the automatic extraction of linear features from airborne images, IEEE Trans. Geoscience and Remote Sensing, 39, 2073, 2001. 11. Marroquin, J., Miter, S., and Poggio, T., Probabilistic solution of ill-posed problems in computational vision, J. Am. Stat. Assoc., 82, 76, 1987. 12. Besag, J., On the statistical analysis of dirty images, J. R. Stat. Soc. B, 48, 259, 1986. 13. Karssemeijer, N., Stochastic model for automated detection of calcifications in digital mammograms, Image Vision Computing, 10, 370, 1992. 14. Comer, M.L., Liu, S., and Delp, E.J., Statistical segmentation of mammograms, Digital Mammography, 72, 475, 1996. 15. Li, H.D. et al., Markov random field for tumor detection in digital mammography, IEEE Trans. Medical Imaging, 14, 565, 1995. 16. Zheng, L. et al., Detection of cancerous masses for screening mammography using DWT-based multiresolution Markov random field, J. Digital Imaging, 12 (Suppl. 1), 18, 1999. 17. Vargas-Voracek, R. and Floyd, C.E., Hierarchical Markov random field modeling for mammographic structure segmentation using multiple spatial and intensity image resolutions, SPIE Conf. Image Proc., 3661, 161, 1999. 18. McGarry, G. and Deriche, M., Mammographic image segmentation using a tissue mixture model and Markov random fields, IEEE Int. Conf. Image Proc., 3, 416, 2000. 19. Lam, K.L. and Chan, H.P., Effects of X-ray beam equalization on mammographic imaging, Medical Phys., 17, 242, 1990. 20. Panayiotakis, G. et al., An anatomical filter for exposure equalization in mammography, Eur. J. Radiol., 15, 15, 1992. 21. Oestmann, J.W. et al., Scanning equalization mammography preliminary evaluation, RadioGraphics, 14, 123, 1994. 22. Sabol, J.M. and Plewes, D.B., Analytical description of the high- and low-contrast behavior of a scan rotate geometry for equalization mammography, Medical Phys., 23, 887, 1996. 23. Bick, U. et al., Density correction of peripheral breast tissue on digital mammograms, RadioGraphics, 16, 1403, 1996. 24. Byng, J.W., Critten, J.P., and Yaffe, M.J., Thickness-equalization processing for mammographic images, Radiology, 203, 564, 1997. 25. Highnam, R.P., Brandy, M., and Stepstone, B.J., Mammographic image analysis, Eur. J. Radiol., 24, 20, 1997. 26. Stefanoyiannis, A.P. et al., A digital equalization technique improving visualization of dense mammary gland and breast periphery in mammography, Eur. J. Radiol., 45, 139, 2003. 27. Veldkamp, W.J.H. and Karssemeijer, N., Normalization of local contrast in mammograms, IEEE Trans. Medical Imaging, 19, 731, 2000. 28. Panayiotakis, G. et al., Evaluation of an anatomical filter-based exposure equalization technique in mammography, Br. J. Radiol., 71, 1049, 1998. 29. Skiadopoulos, S. et al., A phantom-based evaluation of an exposure equalization technique in mammography, Br. J. Radiol., 72, 997, 1999. 30. Kocsis, O. et al., A tool for designing digital test objects for module performance evaluation in medical digital imaging, Medical Informatics, 24, 291, 1999. 31. Yin, F.F. et al., Computerized detection of masses in digital mammograms: analysis of bilateral subtraction images, Medical Phys., 18, 955, 1991. 32. Bick, U. et al., Automated segmentation of digitized mammograms, Academic Radiol., 2, 1, 1995.
Medical image analysis method
370
33. Davies, D.H. and Dance, D.R., The automatic computer detection of subtle calcification in radiographically dense breasts, Phys. Med. Radiol., 37, 1385, 1992. 34. Chen, J., Flynn, M.J., and Rebner, M., Regional contrast enhancement and data compression for digital mammographic images, Proc. Soc. Photo-Op. Instrum. Eng., 1905, 752, 1993. 35. Mendez, A. et al., Automatic detection of breast border and nipple in digital mammograms, Comput. Methods Programs Biomed., 49, 253, 1996. 36. Kittler, J. and Illingworth, J., Minimum error thresholding, Pattern Recognition, 19, 41, 1986. 37. Press, W.H. et al., Numerical Recipes in C: the Art of Scientific Computing, 2nd ed., Cambridge University Press, Cambridge, U.K., 1992. 38. Chandrasekhar, R. and Attikiouzel, Y, A simple method for automatically locating the nipple on mammograms, IEEE Trans. Medical Imaging, 16, 483, 1997. 39. Gonzalez, R.C. and Woods, R.E., Digital Image Processing, Addison-Wesley, Reading, MA, 1992. 40. Canny, J., A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell., 8, 679, 1986. 41. Costaridou, L. et al., Quantifying image quality at breast periphery vs. mammary gland in mammography using wavelet analysis, Br. J. Radiol., 74, 913, 2001. 42. Mallat, S. and Zhong, S., Characterization of signals from multiscale edges, IEEE Trans. Pattern Anal. Mach. Intell, 14, 710, 1992. 43. Wave2 software; available on-line at http://www.ftp://cs.nyu.edu/pub/software, last accessed 6/10/2003. 44. Aarts, E.H.L. and Korst, J.H.M., Simulated Annealing and Boltzmann Machines, John Wiley & Sons, New York, 1989.
9 Landmark-Based Registration of MedicalImage Data J. Ruiz-Alzola, E. Suarez-Santana, C. Alberola-Lopez, and Carl-Fredrik Westin 9.1 INTRODUCTION Image registration consists of finding the geometric (coordinate) transformation that relates two different images, source and target. Hence, when the transformation is applied to the source image, an image with the same geometry as the target one is obtained. Should both images be obtained with the same acquisition modality and illumination conditions, the transformed source image would ideally become identical to the target one. Image registration is a crucial element of computerized medical-image analysis that is also present in other nonmedical applications of image processing and computer vision. In computer vision, for example, it appears as the socalled correspondence problem for stereo calibration [1] and for motion estimation [2], which is also of paramount importance in video coding [3]. In remote sensing, registration is needed to equalize image distortion [4], and in the broader area of geographic information systems (GIS), registration is needed to accommodate different maps in a common reference system [5]. In this chapter we propose a geostatistical framework for the registration of medical images. Our motivation is to provide the highest possible accuracy to computer-aided clinical systems in order to estimate the geometric (coordinate) transformation between two multidimensional, possibly multimodal, datasets. Hence, in addition to being accurate, the approach must be fast if it is to operate in clinically acceptable times. Even though the framework presented here could be applied to several fields, such as the ones mentioned above, this chapter focuses on the application of image registration to the medical field. Registration of medical (both two- and three-dimensional) images, from the same or different imaging modalities, is needed by computer-aided clinical systems for diagnosis, preoperative planning, intraoperative procedures, and postoperative followup. Registration is also needed to perform comparisons across a population, for deterministic and statistical atlas construction, and to embed anatomic knowledge in segmentation algorithms. A good review of the current state of the art for medical-image registration can be found in the literature [6]. Our framework is based on the reconstruction of a dense arbitrary displacement field by interpolating the displacements measured from control points [7]. To this extent, the statistical second-order characterization of the displacement field is estimated from the result of a general-purpose intensity-based registration algorithm, and it is used to make the best linear unbiased estimation of the displacement in every point using a fast
Medical image analysis method
372
implementation of universal Kriging, an optimal estimation scheme customarily used in geostatistics. Several schemes have been proposed in the past to interpolate sparse displacement fields for medical-image registration. Most of them fit in one of the two next categories, i.e., PDE- and spline-based. As for PDE-based approaches [8, 9], they rely on a mechanical dynamic model stated as a set of partial differential equations, where the sparse displacements are associated with actuating forces. The mechanical model provides an ad hoc regularization to the interpolation problem that produces a physically feasible result. However, the assumption that the physical difference between the source and the target image can actually be represented by some specific model is by no means evident. Moreover, mechanical properties must also be endowed to the anatomic structures in order to obtain a proper model. With respect to spline-based approaches, they usually make an independent interpolation for each of the components of the vector field. Interpolating or smoothing thin-plate splines [10–12] are used, depending on whether the sparse displacements are considered to be noiseless or not. The former need the order of the spline to be specified in advance, while the latter also need the regularization parameter to be specified. Adaptiveness can be obtained by spatially changing the spline order and the regularization term. The bending term in the spline energy functional could, in principle, also be modified to account for nonisotropic behavior, and even a set of covariables could also be added to the coupling term of the functional. None of these improvements are usually implemented, possibly because of the difficulty of obtaining an objective design from data. Our framework departs from the previous two approaches by adopting a geostatistical approach. Related work in the field of statistical shape analysis has been previously reported by Dry den and Mardia [13]. The underlying idea is to use an experimental approach that makes the fewest a priori assumptions by statistically analyzing the available data, i.e., the displacement field obtained from approximate intensity-based image registration. Our method consists of locally applying the socalled universal Kriging estimator [14] to obtain the best linear unbiased estimator (BLUE) of the displacement at every point from the displacements initially obtained at the control points. Central to this approach is the estimation of the second-order characterization of the displacement field, now modeled as a vector random process model. The estimated variogram [14] (a statistics related to the spatial covariance function or covariogram) plays the role of the spline kernel, though now they are directly obtained from data and not from an a priori dynamic model. Remarkably, thin-plate splines can be considered as a special case of universal Kriging estimation [15]. 9.2 DEFORMATION MAPS Consider two multidimensional images I1(x) (source) and I2(x′) (target). Registration consists of finding the mapping (9.1)
Landmark-based registration of medical-image data
373
that geometrically transforms the source image onto the target image. The components of the mapping can be made explicit as (9.2)
The vector field Y(x) is commonly termed deformation or warp. Sometimes the displacement field is considered instead, i.e., D(x)=Y(x)-x (9.3) A deformation mapping should count on two basic properties: 1. Bijective: a one-to-one and onto mapping, which means that the inverse mapping exists 2. Differentiable: continuous and smooth, ideally a diffeomorphism, so that the inverse mapping is also differentiable, thus ensuring that no foldings are present In addition, the construction method of the mapping must be equivariant with respect to some global transformations. For example, to be equivariant with respect to affine transformations, if both (source and target) images are affinely transformed, the mapping should be consistently affinely transformed too. Any deformation must also accommodate both global and local differences, i.e., the mapping can be decomposed in a global and a local component. Global differences are large-scale trends, such as an overall polynomial, affine, or rigid transformation. Local differences are on a smaller scale, highlighting changes in a local neighborhood, and are less smooth. Local differences are the reminder of the deformation once the global difference has been compensated. The definition of global and local components depends on whether they are composed or added to form the total map Y(x)=YG(x)+YL(x) =TL[TG(x)] ( 9. 4) where YG, YL and TG, TL refer to the global and local components of the mapping, in the addition and in the composition forms, respectively*. Most commonly, the global deformation consists of a polynomial map (of moderate order to avoid oscillations). Translations, rotations (i.e., Euclidean maps), and affine maps are the most usual global maps. The global polynomial map can be expressed as YG(x)=c0+C1x+xtC2x =Λ(x)a (9.5) where a contains all the unknown coefficients in c0, C1, C2, etc. Registration algorithms must estimate the deformation from the corresponding source and target images. This process can be done in one step by obtaining directly both the global and the local deformation, usually decomposed as an addition. Alternatively, many
Medical image analysis method
374
registration algorithms use a two-step approach by which the global map is first obtained and, then, the local map is obtained from the globally transformed source image and the target one, leading to the composition formulation. * Both forms are equivalent, and it is possible to switch easily between them, i.e.:
9.3 LANDMARK-BASED IMAGE ANALYSIS Landmarks are singular points of correspondence in objects with a highly descriptive power. They are commonly used in morphometrics [13] to describe shape and to analyze intra- and interpopulation statistical differences. In particular, local differences of shape between two objects are commonly studied by reconstructing a deformation that maps both objects from their homologous landmark correspondences. The most popular approach to making the reconstruction of the deformation is based on independent interpolating thin-plate splines for each coordinate [10]. Approximating thin-plate splines can also be used when a trade-off between actual confidence on landmark positions and smoothness of the deformation is desired. The trade-off is controlled with a smoothing parameter that can be either estimated by cross-validation (something usually involving a nonstraightforward optimization) or just by an ad hoc guess [13]. The former approach has also been applied to image registration [10, 12]. In this case two two-dimensional (2-D) or three-dimensional (3-D) images contain the corresponding objects, and the deformation is the geometric mapping between both images. Hence, registration consists of finding this mapping from both images. In this case, landmarks are extracted from the images. Landmarks are referred to in the literature in different ways, e.g., control points, fiducials, markers, vertices, sampling points, etc. Different applications and communities, as ever, usually have different jargons. This is also true for different classifications on landmark types. For example: A usual classification Anatomical landmark: point assigned by an expert that corresponds between organisms in some biologically meaningful way Mathematical landmark: points located on an object according to some mathematical property (e.g., curvature maximum) Pseudo-landmark: points located in between anatomical or mathematical landmarks to complete a description (They can also lie along outlines. Continuous curves and surfaces can be approximated by a large number of pseudo-landmarks.) Another usual classification Type I landmark: a point whose location is supported by the strongest evidence, such as the joins of tissue/bone or a small patch of some unusual histology
Landmark-based registration of medical-image data
375
Type II landmark: a point whose location is defined by a local geometric property Type III landmark: a landmark having at least one deficient coordinate, for instance, either end of a longest diameter, or the bottom of a concavity (Type III landmarks characterize more than one region.) A useful classification for image registration Normal landmark: point with a unique position or with an approximately isotropic uncertainty around a mean position Quasi- (or semi-) landmark: point with one or more degrees of freedom, i.e., it can slide along some direction or, with a highly anisotropic location uncertainty, around a mean position Yet another classification Unlabeled landmark: a point for which no natural labeling is available Labeled landmark: a point for which a natural and unique identification exists
9.4 LANDMARK DETECTION AND LOCATION Before any deformation map can be reconstructed, landmarks must be detected and located. These are not easy tasks, even for human experts. On the one hand, no general detection paradigm (i.e., answering the question: is there any landmark around?) can be used because the definition of landmarks varies from application to application. On the other hand, locating landmarks accurately (once a landmark has been detected it is necessary to estimate its exact position) on images is extremely difficult because digital images are defined on discrete grids, and quite often they are quasi-landmarks defined on smooth boundaries (and consequently with a high uncertainty along these boundaries). For a human expert, things become even more complicated when the images are 3-D, no matter what interaction approach with the data is implemented to click-point on the landmark locations. Therefore, it is important to count on reconstruction schemes of the deformation map that are able to deal with the uncertainty in the extracted landmark positions. A first step toward this goal is the use of approximating thin-plate splines mentioned previously. Nevertheless, this scheme only considers isotropic noise models for the landmark positions. A remarkable extension due to Rohr [16, 17] allows the incorporation of anisotropic noise models and, hence, quasi-landmarks, something important in order to deal with the registration of smooth boundaries. Anisotropic noise models correspond to nondiagonal covariance matrices, with the obvious consequence of coupling the thinplate splines formerly acting on each coordinate independently. The location of N landmarks, extracted by any means from both images, can be modeled as realizations of independent Gaussian random vectors with means equal to the correct landmark positions and covariance matrices Cxl and Cxl. Notice that nondiagonal covariance matrices account for anisotropic uncertainty. Another remarkable achievement of Rohr, which will be used in this chapter extensively, is the derivation of the Cramer-Rao lower bound for the estimation of a point
Medical image analysis method
376
landmark position [12] from discrete images of arbitrary dimensionality in additive white Gaussian noise, (9.6)
where denotes the variance of the noise, and M(m) is a neighborhood around the landmark with m elements. We will also assume this result to model the covariance of the manually extracted landmarks directly from the image data. 9.5 OUR APPROACH TO LANDMARK-BASED REGISTRATION We will consider that the deformation that puts into correspondence the source and target images is a realization of a vector random field. The global component of the deformation corresponds to the trend (mean) of the random field, whereas the local component of deformation is modeled by an intrinsically stationary random field. The field is sampled by means of landmark correspondences, i.e., to each landmark in the source image corresponds a landmark in the target one, which are then used to reconstruct the whole realization of the random deformation field. The geostatistical method tries to honor actual observations by estimating the model spatial variability directly from available data. This essentially consists of estimating the variogram of the field, which is a difficult problem, especially if it is to be done from landmarks displacements, because there are usually just a few. This has possibly prevented Kriging’s method from being used in landmark-based registration. Here we propose a practical way to circumvent these difficulties by splitting the approach into three steps: 1. Image-based global registration: Estimating the variogram of the displacement field requires detrending of the data. To avoid introducing any subsequent bias into the variogram estimation, we propose to make an intensity-based global (i.e., rigid or affine) registration first to remove the trend effect, with a variety of algorithms being available. For example, rigid registration by maximization of mutual information is a well-known algorithm [18] that can be used when image intensities in both images are different. 2. Model estimation: Estimating the variogram structure of the detrended displacement field is still a difficult task. The number of available landmarks in most practical applications is almost never enough to make good variogram estimations, and trying to extract a significant number from the images would render the method impractical. We propose to use a fast, general-purpose, nonrigid registration algorithm to obtain an approximate dense displacement field. Again, a number of algorithms are available, although we are using, with excellent results, a regularized block-matching scheme with mutual information (and others) similarity measure that was developed by our team [19]. The variogram is then readily estimated from this field.
Landmark-based registration of medical-image data
377
3. Landmark-based local registration: Landmarks are extracted from the registered image pair and used to reconstruct a realization of a zero-mean random deformation field using ordinary Kriging, with the variogram structure just estimated.
9.6 DEFORMATION MODEL ESTIMATION 9.6.1 INTENSITY-BASED REGISTRATION The model estimation, as noted previously, relies on a fast, general-purpose, intensitybased, nonrigid registration algorithm to obtain an approximate dense displacement field. This registration framework is presented in the following subsections. To understand the design criteria of our algorithm, general properties of registration algorithms are shown. To simplify the exposition, we will restrict the discussion to threedimensional medical images. Let I1 and I2 be two medical images, i.e., two scalar functions defined on two regions of the space. We will use two different coordinate systems x and x′ for each one. The registration problem consists of finding the transformation x′=Y(x) that relates every point x in the coordinate system of I1 with a point x′ in the coordinate system of I2. The criteria of correspondence are usually set by means of high-level information, for example anatomical knowledge. However, when coding the correspondence into a registration algorithm, some properties should be satisfied. Invertibility of the solution: A registration algorithm should provide an invertible solution. Invertibility implies the existence of an inverse transformation x=Y*(x′) that relates every point on I2 back to a point on I1, where Y*= Y–1. It is satisfied if the Jacobian of the transformation is positive. No boundary restriction: A registration algorithm should not impose any boundary condition. Boundary restrictions, sometimes in the model, sometimes in the representation of the warping, are usually set to help either implementation or convergence of the search technique. However, boundaries are acquisition dependent, not data dependent, so they are a fictitious matching in the solution. Thus, ideal registration should provide free-form warpings. Intensity channel insensitivity: Another desirable property of a registration algorithm is the insensitivity to noise or to a bias field in the acquisitions. These variations are usually dealt with by an entropy-based similarity measure. Possibility of large deformations: Some registration schemes are based on models such as linear elastic models, which are not thought to be useful for large deformations. The theory of linear elasticity is successful whenever relative displacements are small. Hence, mechanical models should be used with care when trying to register tissue deformations.
Medical image analysis method
378
9.6.1.1 Template Matching Intensity-based registration methods, i.e., those using directly the full content of the image and not simplifying it to a set of features to steer the registration, usually correspond to one of two important families: template matching and variational. The former was popular years ago because of its conceptual simplicity [20]. Nevertheless, in its conventional formulation, it is not powerful enough to address the challenging needs of medical-image registration. Variational methods rely on the minimization of a functional (energy) that is usually formulated as the addition of two terms: data coupling and regularization, the former forcing the similarity between both data sets (target and source deformed with the estimated field) to be high, and the latter enforcing the estimated field to fulfill some constraint (usually enforcing spatial coherencesmoothness). As opposed to variational methods, template matching does not impose any constraint on the resulting fields, which, moreover, due to the discrete movement of the template, turn out to be discrete as well. These facts have led to an increasing popularity of variational methods for registration, while template matching has been losing ground in this arena. Template matching finds the displacement for every voxel in a source image by minimizing a local cost measure that is obtained from a small neighborhood of the source image and a set of potential correspondent neighborhoods in a target image. The main disadvantage of template matching is that it estimates the displacement field independently in every voxel, and no spatial coherence is imposed to the solution. Another disadvantage of template matching is that it needs to test several discrete displacements to find a minimum. There are several optimization-based template-matching solutions that provide a real solution for every voxel, although they are slow [21]. Therefore, most template-matching approaches render discrete displacement fields. Another problem associated with template matching is commonly denoted as the aperture problem in the computer-vision literature [22]. This essentially consists of the inability of making a good match when no discriminant structure is available, such as in homogeneous regions, surfaces, and edges. When this fact is not taken into account, the matching process is steered by noise and not by the local structure, because it is not available. The model-estimation registration algorithm that we present here maintains the simplicity of template matching while addressing its drawbacks. It consists of a weighted regularization of the template-matching solution, where weights are obtained from the local structure, to render spatially coherent real deformation fields. Thanks to the multiscale nature of our approach, only displacements of one voxel on every scale are necessary when matching the local neighborhoods. 9.6.1.2 Multiresolution Pyramid The algorithm works in a way that is similar to the Kovacic and Bajcsy elastic warping [23], in which images are decomposed on Gaussian multiresolution pyramids. On the highest level, the deformation field is estimated by regularized template matching steered by local structure (details in the following subsections). On the next level, the source data set is deformed with a deformation field obtained by spatial interpolation of the one obtained on the first level. The deformed source and the target data sets on the current
Landmark-based registration of medical-image data
379
level are then registered to obtain the deformation field corresponding to the current level of resolution. This process is iterated on every level. The algorithm implementation is summarized in Figure 9.1.
FIGURE 9.1 Algorithm pipeline for pyramidal level (i).
FIGURE 9.2 (Color figure follows p. 274.) MRI T1-weighted axial slice of human brain and its structure tensors. (Hot color represents high structure.) 9.6.1.3 Local Structure Local structure measures the quantity of discriminant spatial information on every point of an image, and it is crucial for template-matching performance: the higher the local structure, the better is the result obtained on that region with template matching. To quantify local structure, a structure tensor is defined as where the subscript a indicates a local smoothing. The structure tensor consists of a symmetric positive-semidefinite 3×3 matrix that can be associated with ellipsoids, i.e., eigenvectors and eigenvalues correspond to the ellipsoids’ axes directions and lengths, respectively. A scalar measure of the local structure can be obtained as [16, 17, 24]. (9.7) Figure 9.2 shows an MRI T1-weighted axial slice of the brain and the estimated structure tensors overlaid as ellipsoids. Small eigenvalues indicate a lack of gradient variation along the associated principal direction, and therefore, high structure is indicated by big
Medical image analysis method
380
(large eigenvalues), round (no eigenvalue is small) ellipsoids. The color coding represents the scalar structure measure, with hot colors indicating higher structure.
FIGURE 9.3 (Top) MRI T1-weight cross-sections; (bottom) local structure measure (arrows point at higher structure regions). Figure 9.3 shows cross-sections of a T1-weighted MRI dataset of a human brain (top row) and the scalar measure of local structure obtained from them, represented with a logarithmic histogram correction (bottom row). Note how anatomical landmarks have the highest measure of local structure, corresponding to the points indicated by the arrows on the top row. Curves are detected with lower intensity than points, and surfaces have even lower intensity. Homogeneous areas have almost no structure. Template matching provides a discrete deformation field where no spatial coherence constraints have been imposed. In the discussion in this subsection, this field is regularized so as to obtain a mathematically consistent continuous mapping. We will consider the deformation field to be a diffeomorphism, i.e., an invertible continuously differentiable mapping. To be invertible, the Jacobian of the deformation field must be positive. On every scale level, the displacement is small enough to guarantee such a condition. For every level of the pyramid, the mapping is obtained by composing the transformation on a higher level than the one on the current level, so that the positive Jacobian condition is preserved. Spatial regularization is achieved by locally projecting the deformation field provided by template matching on an appropriate signal subspace, and simultaneously taking into account the quality of the matching as indicated by the scalar measure of local structure. We propose here to use normalized convolution [25, 26], a popular refinement of
Landmark-based registration of medical-image data
381
weighted-least squares that explicitly deals with the so-called signal/certainty philosophy. Essentially, the scalar measure of structure is incorporated as a weighting function in a least squares fashion. The field obtained from template matching is then projected onto a vector space described by a non-orthogonal basis, i.e., the dot products between the field and every element of the basis provide covariant components that must be converted into contravariants by an appropriate metric tensor. Normalized convolution provides a simple implementation of this operation. Moreover, an applicability function is enforced on the basis elements to guarantee a proper localization and avoid high-frequency artifacts. This essentially corresponds to weighting each basis element with a Gaussian window. The desired transformation is related to the displacement field by the simple relation shown in Equation 9.3. Because the transformation is differentiable, we can write the function in different orders of approximation Y(x)≈Y(x0) (9.8) Y(x)≈Y(x0)+J(x0).(x-x0) (9.9) Equation 9.8 and Equation 9.9 consist of linear decompositions of bases of size 3 and 12 basis elements, respectively. We have not found relevant experimental improvement of the registration algorithm by using the linear approximation instead of the zero-order one, probably due to the local nature of the algorithm. The basis set used is then (9.10)
Figure 9.4 shows a 2-D discrete deformation field that has been regularized using the certainty on the left side and a 2-D Gaussian applicability function with σ=0.8.
FIGURE 9.4 (Left) certainty, (center) discrete matching deformation, (right) weight-filtered deformation.
Medical image analysis method
382
9.6.1.4 Entropy-Based Similarity Measure In a work by Suarez et al. [19], the registration framework was tested using square blocks that were matched using the sum of squared differences and correlation coefficient as similarity measures. In the current work, we introduce entropy-based similarity measures into this framework, although it can be used by any algorithm based on template matching. A similarity measure can be interpreted as a function defined on the joint probability space of two random variables to be matched. In the case of block matching, each block represents a set of samples from each random variable. When this probability density function (PDF) is known, mutual information can be computed as (9.11) where I1, I2 are the images to register, and Ω is the joint probability function space. A discrete approximation is to compute the mutual information from the PDF and a small number N of samples (i1[k], i2[k]) (9.12) where fp is a coupling function defined on Ω. Therefore, the local evaluation of the mutual information for a displaced block containing N voxels can be computed just by summing the coupling function fp on the k samples that belong to this block. We propose to compute a set of multidimensional images, each of them containing at each voxel the local similarity measure corresponding to a single displacement applied to the whole target image. A decision will be made for each voxel, depending on which displacement renders the greatest similarity. A problem associated with local entropy-based similarity measures is the local estimation of the joint PDF of both blocks, because there are never enough samples available. We propose to overcome this problem by using the joint PDF corresponding to the whole displaced source image and the target one. The PDF to be used for a given displacement is the global joint-intensity histogram of the reference image with the displaced target image. This is crucial for higher pyramidal levels, where one voxel displacement drastically changes the PDF estimation. It is straightforward to compute the local mutual information for a given discrete displacement in the whole image. This requires only the convolution of a square kernel representing the block window and the evaluation of the coupling function for every pair of voxels. Furthermore, because the registration framework only needs discrete deformation fields, no interpolation is needed in this step. Any similarity measure that can be computed as a kernel convolution can be implemented this way.
Landmark-based registration of medical-image data
383
FIGURE 9.5 (Left) target image to be matched, (center) reference image where similarity measure is going to be estimated for every discrete displacement, (right) for every discrete displacement, the similarity measure is computed for every voxel by performing a convolution. A small sketch of this technique is shown in Figure 9.5. For smoothness and locality reasons, we have chosen to convolve using Gaussian kernels instead of square ones. To achieve a further computational saving, Equation 9.12 can be written as (9.13) The displacement field defines the displacement of a voxel in the source image. The similarity measure will be referred to as the source-image reference system (image 1). For a given voxel in the source image, the comparison of Equation 9.13 for different displacement will always contain the same terms, depending on p(i1[k]). Thus, we can take this term off and modify accordingly the coupling function to reduce computational cost. Any other entropy-based similarity measure can be estimated in a similar way. The computational cost is then very similar to any other similarity measure not based on entropy. 9.6.2 VARIOGRAM ESTIMATION The variogram is estimated under the assumption of intrinsic stationarity (i.e., the mean of the displacement field must be constant) from the displacement field obtained by intensity-based image registration. Should intrinsic stationarity not be the case, a trend model must be pre-estimated so that it can be substrated from the field prior to estimating the variogram. This process is undesirable because it introduces bias in the variogram estimation due to its inherent circularity: the probabilistic characterization of the random component of the field must be known to estimate the trend, but the trend must also be
Medical image analysis method
384
known to estimate the probabilistic characterization of the random component. Nevertheless, this issue is present in any model with a trend and a random component, and, in fact, estimating the sample variogram instead of the sample autocovariance has several advantages [14] from this point of view: If the mean value of the field is an unknown constant, it is not necessary to pre-estimate it because the variogram sample estimator is based on differences. Hence, in this case, the sample variogram can be estimated unbiasedly. The sample variogram estimator is more robust against mean model mismatch than the sample autocovariance one. The sample variogram estimator is less biased than the sample autocovariance one when the mean model is pre-estimated and subtracted from the field realization to make the spatial-dependence model estimation.
9.7 LANDMARK-BASED LOCAL REGISTRATION 9.7.1 DISPLACEMENT FIELD MODEL The reconstruction of the local displacement field DL(x), can be cast as the optimal prediction of the displacement at every location x from our set of observations*. These observations are obtained by measuring the displacement between pairs of point landmarks extracted from both images. The observation process is then Z(x)=X′(x)–X(x) =D(x)+NZ(x) (9.14) where X, X′ are the landmark position random processes, D is the stochastic characterization of the local displacement field, and Nz consists of a zero-mean Gaussian random noise field with autocovariance independent of D. From the model, it follows that µz(x)=µD(x) (9.15) CZ(x)=Cx′(x)+Cx(x) (9.16) Cz(xi, xj)=CD(xi, xj) (9.17) Furthermore, Equation 9.16 can be rewritten for the sampled landmarks (xl, x′l) as (9.18) where the Cramer-Rao lower bound introduced in Section 9.4 has been used.
Landmark-based registration of medical-image data
385
* Hereinafter, the L subscript will be omitted.
9.7.2 ORDINARY KRIGING PREDICTION OF DISPLACEMENT FIELDS The mean for each component of the displacement field, µD(x), is assumed to be an unknown constant. We have found that this is a very convenient model, even after the global preregistration that should render zero-mean values for the resulting displacement components. The reason is that usually a locally varying mean structure can model much of the local deformation. Therefore, in this case we will not use all the samples but a limited number around the prediction location. This has the added benefit of reducing the computational burden. For the sake of simplicity, positions of the observed landmarks will be denoted by the set O = {x1, , xN}, and the observation vector is denoted Zr(0)=[Zr(x1) Zr(xN)]t (9.19) The ordinary co-Kriging (i.e., multivariate Kriging) predictor takes the form (9.20)
If there is no second-order probabilistic dependence among the field components, each of them is dealt with independently, leading to a block-diagonal K(x,O) matrix and resulting in the conventional ordinary Kriging predictor for each component. The ordinary Kriging coefficients must minimize the mean square prediction error
Medical image analysis method
386
(9.21)
subject to the unbiasedness constraint (9.22) Closed-form equations for the coefficients and for the achieved squared error can be readily obtained after some algebra (see, for example, Cressie [14]). Because of space constraints, we only present the coefficients’ equation, expressed in terms of covariances. The matrix A is block diagonal, with each diagonal block equal to a column vector of ones, and the vector λr is a zero row vector with a single 1 in the r position: (9.23) Extensions of ordinary Kriging are possible by incorporating more complex mean structure models. Though this could seem in principle appealing, it has the serious drawback of hindering the estimation of the spatial variability model, because the mean structure has to be filtered out before the covariance structure can be estimated. Notice that estimating the variogram does not require pre-estimation of the mean, as this is constant. 9.8 RESULTS We are currently using the proposed framework in a number, of applications. To better illustrate its behavior, we have selected two simple experiments. Figure 9.6(a) shows a T1w MRI axial slice of a multiple sclerosis patient, and Figure 9.6(b) a corresponding T2w axial slice of a different patient. Ellipsoids representing landmark covariances have been overlaid (seven landmarks in the brain and four on the skull). Figure 9.6(d) and Figure 9.6(e) show two T1w mid-sagittal slices of MS patients, also with covariance landmark ellipsoids overlaid (11 landmarks in the brain and 3 on the skull). In each case, the second image is to be warped onto the first one. In both cases the images are first globally registered. Then a forward displacement field is obtained for each one using our general-purpose general registration scheme [19] to estimate the variograms. Sample variograms and their weighted-least squares fit to theoretical models (linear combination of Gaussian and power models) are shown in Figure 9.6(g) and Figure 9.6(h). For this purpose, 5000 displacements were sampled, which makes the estimation highly accurate.
Landmark-based registration of medical-image data
387
Registration results are shown in Figure 9.6(c) and Figure 9.6(f) by ordinary Kriging prediction of the displacement field, using only the displacements from the landmarks on the images. Notice how even with so few landmarks, a good result is achieved, especially in areas closer to the landmarks, because of the proper estimation of the random displacement field. The open-source software Gstat [27] was used in these experiments.
FIGURE 9.6 Experimental results: (a) axial T1, (b) axial T2, (c) warped axial T2, (d) first T1 sagittal, (e) second T1 sagittal, (f) warped second sagittal, (g) displacement variograms (axial), and (h) displacement variograms (sagittal).
Medical image analysis method
388
9.9 CONCLUSIONS We have presented a practical approach to the statistical prediction of displacement fields from pairs of landmarks. The method is grounded on the solid theory of ordinary Kriging, and it also provides a way of estimating the spatial-dependence models from image data, thus circumventing some of the hurdles found when using Kriging. The fact that the statistical relation between both geometries is successfully used makes the method highly accurate and particularly well suited for image-registration and shape-analysis applications. It is remarkable to note that thin-plate splines can be considered a particular case of Kriging, and in this sense, our approach generalizes this popular registration method. APPENDIX 9.1 GEOSTATISTICAL SPATIAL MODELING Consider a random field Zr(x) (the superscript r is meant to consider several random fields, such as the components of a vector random field) such that (9.24) with h=xi–xj, is called the variogram of the random field Zr(x) The function and, assuming it exists, is the central parameter to model the spatial dependence of the (without the 2 factor) is random field in the geostatistical method. The variable usually called the semivariogram. The variogram can be easily related to the variance and covariance from the relation (9.25) The shape of a variogram is summarized by the following parameters: Nugget: it is the size of the discontinuity of the semivariogram at the origin. Note that the presence of a nugget other than zero indicates that the random field is not continuous. The presence of a nugget effect is usually attributed to measurement noise and to a very local random component of the field that appears as uncorrelated at the working resolution. Both effects are usually superimposed and modeled with white noise. Sill: if the variogram is bounded, the sill is the value of the bound. A sill indicates total noncorrelation as, for example, with white noise. Usually, random fields become uncorrelated for big lags, reaching a sill. Partial sill: it is the difference between the sill and the nugget. Range: it is the lag for which the sill is reached, of course assuming there is a sill in the variogram. Various approaches for constructing valid theoretical variogram models are available [14, 27–30]. Most often, existing variogram models such as nugget (white field), spherical,
Landmark-based registration of medical-image data
389
linear, exponential, power, etc. are used as building blocks in a linear combination of valid variogram models, making use of the convexity of the set of valid variograms. The variogram can be extended for the multivariate case [14]. The pseudo-crossvariogram function is defined as (9.26) A9.1.1 INTRINSIC STATIONARITY The scalar random field Zr(x) is said to be intrinsically stationary if it has a constant mean and its variogram exists. Moreover, any conditionally negative-definite function 2γ(h) is the variogram of an intrinsically stationary random field. The variogram of an intrinsic random field Zr(x) is (9.27) A9.1.2 RELATION BETWEEN INTRINSIC AND SECONDORDER STATIONARITIES Note that the family of intrinsic stationary fields is larger than the second-order stationary one. In particular, unbounded valid variograms, i.e., variograms without a sill, do not have a corresponding autocovariance function. For second-order stationary fields, there is a simple relation between the variogram and the autocovariance, i.e., (9.28) It is clear that in the common situation for second-order stationary fields where the covariance approaches zero for large space lags, the sill of the variogram is ACKNOWLEDGMENT This work has been partially funded by the Spanish Government (MCyT) under research grant TIC-2001-3808-C02-01. REFERENCES 1. Faugeras, O., Three-Dimensional Computer Vision: a Geometric Viewpoint, MIT Press, Cambridge, MA, 1993. 2. Shah, M. and Jain, R., Eds., Motion-Based Recognition, Vol. 9, Computational Imaging and Vision, Kluwer, Dordrecht, Netherlands, 1997. 3. Tekalp, A.M., Digital Video Processing, Signal Processing Series, Prentice Hall, Upper Saddle River, NJ, 1995.
Medical image analysis method
390
4. Lillesand, T.M. and Kiefer, R.W., Remote Sensing and Interpretation, 4th ed., John Wiley & Sons, New York, 1999. 5. Burrough, P.A. and McDonell, R.A., Principles of Geographic Information Systems (Spatial Information Systems and Geostatistics), 2nd ed., Oxford University Press, Oxford, U.K., 1988. 6. Maintz, J.B.A. and Viergever, M.A., A survey of medical-image registration, Medical Image Anal, 2, 1-36, 1998. 7. Ruiz-Alzola, J., Suarez, E., Alberola-Lopez, C., Warfield, S.K., and Westin, C.-E, Geostatistical medical-image registration, in Lecture Notes in Computer Science, no. 2879, Springer-Verlag, New York, 2003, pp. 894-901. 8. Bajcsy, R. and Kovacic, S., Multiresolution elastic matching, Computer Vision, Graphics, Image Process., 46, 1-21, 1989. 9. Christensen, G.E., Joshi, S.C., and Miller, M.I., Volumetric transformation of brain anatomy, IEEE Trans. Medical Imaging, 16, 864-877, 1997. 10. Bookstein, F.L., Principal warps: thin-plate splines and the decomposition of deformations, IEEE Trans. Pattern Anal. Machine Intelligence, 11, 567-585, 1989. 11. Rohr, K., Image registration based on thin-plate splines and local estimates of anisotropic landmark localization uncertainties, in Lecture Notes in Computer Science, no. 1496, SpringerVerlag, Heidelberg, 1998, pp. 1174-1183. 12. Rohr, K., Landmark-based image analysis (using geometry and intensity models), Vol. 21, Computational Imaging and Vision, Kluwer, Dordrecht, Netherlands, 2001. 13. Dry den, I.L. and Mardia, K.V, Statistical Shape Analysis, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, 1998. 14. Cressie, N.A.C., Statistics for Spatial Data, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, New York, 1993. 15. Matheron, G., Splines and Kriging: their formal equivalence, in Down-to-Earth Statistics: Solutions Looking for Geological Problems, Syracuse University Geological Contributions, Syracuse, NY, 1981, pp. 77-95. 16. Rohr, K., Differential operators for detecting point landmarks, Image Vision Computing, 15, 219-233, 1997. 17. Harris, C. and Stephens, M., A combined corner and edge detector, in Proc. Fourth Alvey Vision Conference, 1988, pp. 147-151. 18. Wells, W.M., Viola, P., Atsumi, H., Nakajima, S., and Kikinis, R., Multimodal volume registration by maximization of mutual information, Medical Image Anal., 1, 35-51, 1996. 19. Suarez, E., Westin, C.-F., Rovaris, E., and Ruiz-Alzola, J., Nonrigid registration using regularized matching weighted by local structure, in Lecture Notes in Computer Science, no. 2489, Springer-Verlag, Heidelberg, 2002, pp. 581-589. 20. Duda, R.O. and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1973. 21. Suarez, E., Cárdenes, R., Alberola, C., Westin, C.-E, and Ruiz-Alzola, J., A general approach to nonrigid registration: decoupled optimization, in 23rd Ann. Int. Conf. IEEE Eng. Med. Biol. Soc., IEEE, Washington, DC, 2000. 22. Poggio, T., Torre, V, and Koch, C., Computational vision and regularization theory, Nature, 317, 314-319, 1985. 23. Kovacic, S. and Bajcsy, R.K., Multiscale/multiresolution representations, in Brain Warping, Academic Press, New York, 1999, pp. 45-65. 24. Ruiz-Alzola, J., Kikinis, R., and Westin, C.-E, Detection of point landmarks in multidimensional tensor data, Signal Process., 81, 2243-2247, 2001. 25. Westin, C.-E, A Tensor Framework for Multidimensional Signal Processing, Ph.D. Thesis, Linköping University, Sweden, 1994. 26. Knutsson, H. and Westin, C.-E, Normalized and differential convolution: methods for interpolation and filtering of incomplete and uncertain data, in Proc. Computer Vision and Pattern Recognition, IEEE, New York, 1993, pp. 515-523.
Landmark-based registration of medical-image data
391
27. Pebesma, E.J. and Wesseling, C.G., Gstat: a program for geostatistical modelling, prediction and simulation, Comput. Geosci., 24, 17-31, 1998. 28. Chiles, J.-P. and Delfiner, P., Geostatistics: Modeling Spatial Uncertainty, Wiley Series in Applied Probability and Statistics, Wiley-Interscience, New York, 199. 29. Ripley, B.D., Statistical Inference for Spatial Processes, repr., Cambridge University Press, Cambridge, U.K., 1991. 30. Arlinghaus, S.L. and Griffith, D.A., eds., Practical Handbook of Spatial Statistics, rev. ed., CRC Press, Boca Raton, FL, 1995.
10 Graph-Based Analysis of Amino Acid Sequences Luciano da Fontoura Costa 10.1 INTRODUCTION One of the most essential features underlying natural phenomena and dynamical systems are the many connections, implications, and causalities between the several involved elements and processes. For instance, the whole dynamics of gene activation can be understood as a highly complex network of interactions, in the sense that some genes are enhanced while others are inhibited by several environmental factors, including the current biochemical composition of the individual (such as the presence of specific genes/proteins) as well as external effects such as temperature and interaction with other individuals. Interestingly, such a network of effects extends much beyond the individual in time and space, in the sense that any living being is affected by history (i.e., evolutionary processes) and spatial interactions (i.e., ecology). Although biology can only be fully understood and explained by considering the whole of such an intricate network of effects, reductionist approaches can still provide many insights about biological phenomena that are more localized in time and space, such as the genetic dynamics during an individual lifetime or an infectious process. The large masses of data produced by experimental works in biology, molecular biology, and genetics can only be properly organized, analyzed, and modeled by using computer concepts including databases, networks, parallel computing, and artificial intelligence, with special emphasis placed on signal processing and pattern recognition. The incorporation of such modern computer concepts and tools into biology and genetics has been called bioinformatics [1]. The applications of this new area to genetics are manifold, ranging from nucleotide analysis to animal development. Among the several signal-processing methods considered in bioinformatics [2], we have the application of Markov random fields to model the sequences of nucleotides, the use of correlation and covariance to characterize sequences of nucleotides and amino acids, and wavelets [2, 3]. One particularly important problem concerns the analysis of proteins, the basic blocks of life [4, 5]. Constituted by sequences of amino acids, proteins participate in all vital processes, acting as catalysts; providing the mechanical scaffolding for cells, organs, and tissues; and participating in DNA expression. Proteins are polymers of amino acids, determined from the DNA through the process of protein expression. Many of the properties of proteins derive from their spatial shape and electrical affinities, which are both defined by the specific sequences of constituent amino acids [4, 5]. Therefore, given the sequence of amino acids specified by the DNA, the protein folds into specific forms while taking into account the interactions between the amino acids and external influence
Graph-based analysis of amino acid sequences
393
of chaperones. It remains an open problem how to determine the structural properties of proteins from the respective amino acid sequences, a problem known as protein folding [4, 5]. Except for some basic motifs, such as alpha-helices and beta-sheets, which are structures that appear repeatedly in proteins, the prediction of protein shape constitutes an intense research area. Experimentally, the sequences of amino acids underlying proteins can be obtained by using sequencing machines capable of reading the nucleotides, which are subsequently translated into amino acids by considering triples of nucleotides, the socalled codons, translated according to the genetic code 38 By being inherently oriented toward representing connections and implications, graphs stand out as one of the most general and interesting data structures that can be used to represent biological systems. Basically, a graph is a representational structure composed of nodes, which are connected through directed or undirected edges. Any structure or phenomenon can be represented to varying degrees of completeness in terms of graphs, where each node would correspond to an aspect of the phenomenon and the edges to interactions. Such a potential for representation and modeling is greatly extended by the many types of graphs, including those with weighted edges, different types of coexisting nodes or edges, and hypergraphs, to name only a few. Interestingly, most biological phenomena can be properly represented in terms of graphs, including gene activation, metabolic networks, evolution (recall that hierarchical structures such as trees are special kinds of graphs), ecological interactions, and so on. However, despite the natural potential of graphs for representing and studying natural phenomena, their application was timid until the recent advent of the area of complex networks. One of the possible reasons for that is that graphs had been often understood as representations of static interactions, in the sense that the connections between nodes were typically assumed not to change with time. Thus, the uses of graphs in biology, for instance, were mainly constrained to representing evolutionary hierarchies (in terms of trees) and metabolic networks. This situation underwent an important recent change sparked mainly by the pioneering developments in random networks by Rapoport [6] and Erdös and Rényi [7], Watts and Strogatz small-world models [8], and by Barabasi scale-free networks [9]. The research of such types of complex graphs became united under the name of complex networks [10–12]. Now, in addition to the inherent potential of graphs to nicely represent natural phenomena, important connections were established with dynamics systems, statistical physics, and critical phenomena, while many possibilities for multidisciplinary research were established between areas such as graph theory, statistical physics, nonlinear dynamical systems, and complexity theory. Despite such promising perspectives, one of the often overlooked reasons why complex networks have become so important for modern science is that studies in this area tend to investigate the dynamical evolution of the graphs [10–12], which can provide key insights about the relationship between the topology and function of such complex systems. For example, one of the most interesting properties exhibited by random graphs is the abrupt appearance, as new edges are progressively added at random, of a giant cluster that dominates the graph structure and connections henceforth. Thus, in addition to being typically large (several studies in complex networks consider infinitely large graphs), the graphs were now used to model growing processes. Allied to the inherent vocation of graphs to represent connections, interactions, and causality, the possibility of modeling dynamical evolution in terms of
Medical image analysis method
394
complex networks has made this area into one of the most promising scientific concepts and tools. The present chapter is aimed at addressing how complex-network research has been applied to bioinformatics, with special attention given to the characterization and analysis of amino acid sequences in proteins. The text starts by reviewing the basic context, concepts, and tools of complex-network research and continues by presenting some of the main applications of this area in bioinformatics. The remainder of the chapter describes the more specific investigation of amino acid sequences in terms of complex networks obtained for graphs derived from subsequence strings. 10.2 COMPLEX-NETWORKS CONCEPTS AND TOOLS 10.2.1 BRIEF HISTORIC PERSPECTIVE The beginnings of complex-network research can be traced back to the pioneering and outstanding works by Rapoport [6] and Erdos and Renyi [7], who concentrated attention on the type of networks currently known as random networks. This name is somewhat misleading in the sense that many other network models are also random. The essential property of random networks as understood in graph theory, therefore, is not only being random, but to follow a particular probabilistic model, namely the uniform random distribution [13]. In other words, given a set of N nodes, connections are established by choosing pairs of nodes according to the uniform probability density. In the case of undirected graphs, the edges are uniformly sampled out of the N(N-l)/2 possible connections. Consequently, random networks correspond to the maximum entropy hypothesis of connectivity evolution, providing a suitable null hypothesis against which several real and theoretical models can be compared and contextualized. One of the most interesting features of random networks is the fact that the progressive addition of new edges tends to abruptly form a giant, dominating cluster (or connected component) in the graph. Such a critical transition is particularly interesting not only because it represents a sudden change of the network connectivity, but because it provides a nice opportunity for connecting graph theory to statistical physics. Indeed, the appearance of the giant cluster can be understood as a percolation of the graph, similar to critical phenomena (phase transitions) underlying the transformation of ice into water. Basically, percolation corresponds to an abrupt change of some property of the analyzed system as some parameter is continually varied. This interesting connection between graph theory and statistical physics has provided unprecedented opportunities for multidisciplinary works and applications, nicely bridging the gap between areas such as complexity analysis, which is typical of graph theory, and the study of systems involving large numbers of elements, typical in statistical physics. In addition to such an exciting perspective, random networks attracted much interest as possible models of real structures and phenomena in nature, with special emphasis given to the Internet and the World Wide Web. After the fruitful studies of Rapoport and Erdos and Renyi, the study of large networks (note that the term complex network was not typical at those times) went through a period of continuing academic investigation followed by few applications, except for promising
Graph-based analysis of amino acid sequences
395
investigations in areas such as sociology. Indeed, one of the next important steps shaping the modern area of complex networks was the investigation of personal interactions in society, of which the 1998 work by Watts and Strogatz [8] represents the basic reference. Basically, experimental investigations regarding social contacts led to the result that the average length between any two nodes (i.e. persons) is rather small, hence the name small-world networks. The typical mathematical model of such networks starts with a regular graph, which subsequently has a percentage of its connections rewired according to uniform probability. Although such investigations brought many insights to the area, the small-world property was later verified to be an almost ubiquitous property of complex networks. The subsequent investigations of the topological properties of the Internet and WWW performed by Albert and Barabasi [9] led to the important discovery that the statistical distribution of the node degrees (i.e., the number of connections of a node) in several complex networks tends to follow a power law, indicating scale-free behavior. Unlike the random model, this property favors the appearance of nodes concentrating many of the connections, the so-called hubs. Such underlying structure has several implications, such as resilience to attack, which is particularly fragile for hub attacks. From then on, the developments in complex-network research boomed, covering several types of natural systems, from epidemics to economy. The interested reader is encouraged to check the excellent surveys of this area [10–12] for complementary information. 10.2.2 BASIC MATHEMATICAL CONCEPTS This section provides a brief introductory review of basic concepts and measurements in graph theory, statistics, random graphs, and small-work and scale-free networks. Readers who are already familiar with such topics can proceed directly to Section 10.2.3. 10.2.2.1 Graph Theory Basics Basically, a typical graph [14–17] in complex-network theory [10–12] involves a collection of N nodes i=1, 2, , N that are connected through edges (i,j) that can have weights w(i,j). Such a data structure is precise and completely represented by the respective weight matrix W, where each entry W(j,i) represents the weight of edge (i,j). Nonexistent edges are represented as null entries in that matrix. The adjacency matrix K of the graph is a matrix where the value 1 is assigned to an element (i,j) whenever there is an edge connecting node j to I, and 0 otherwise. The adjacency matrix can be obtained from the weight matrix by setting each element larger or equal to a specific threshold value T to 1, assigning 0 otherwise. Such adjacency matrices, henceforth represented as KT, provide indication about the network structure defined by the weights that are higher than the threshold. Therefore, the adjacency matrix for high values of T can be understood as the strongest component, or “kernel,” of the weighted graph. Observe that it is also possible to consider the complementary matrix of KT with respect to K, which is defined as follows. Each element (i,j) of such a matrix, hence abbreviated as QT, receives value 1 iff KT(i,j) =0 and K(i,j) 0. An undirected graph is characterized by undirected edges, so that K(j,i)=1 iff K(i,j)=1, i.e., K is symmetric. A directed graph, or digraph, is characterized by directed edges and not necessarily by a symmetric adjacency matrix.
Medical image analysis method
396
One of the most basic and interesting local feature of a graph or network is the number of connections of a specific node i, which is called the node degree and often abbreviated as ki. Observe that a directed graph has two types of such a degree, the indegree and the outdegree, corresponding to the number of incoming and outgoing edges, respectively. Figure 10.1 illustrates the concepts introduced here with respect to an undirected graph G and a directed graph H, identifying the nodes, edges, and weights. This figure also shows the respective weight matrices WG and WH and adjacency matrices AG and AH. The degree of node 1 in G is 2, the outdegree of node 1 in H is 2, and the indegree of node 1 in H is 1. N is equal to 4 for both graphs. A great part of the importance of graphs stems from their generality for representing, in an intuitive and explicit way, virtually any discrete structure while emphasizing the involved entities (nodes) and connections. Indeed, virtually every data structure (e.g., tree, queue, list) is a particular case of a graph. In addition, graphs
FIGURE 10.1 Basic concepts in graph theory: examples of undirected (G) and directed (H) graphs, with respective nodes, edges, and weights. The weight matrices of G and H are WG and WH, and the respective adjacency matrices considering threshold T=1 are given as AG and AH.
Graph-based analysis of amino acid sequences
397
can be used to represent the most general mesh of points used for numeric simulation of dynamic systems, from the regular orthogonal lattice used in image representation to the most intricate adaptive triangulations. As such, graphs are poised to provide one of the keys for connecting not only structure and function, but also several different biological areas and even the whole of science. Several measurements or features have been proposed and used to express meaningful and useful global properties of the network structure. In similar fashion to feature selection in the area of pattern recognition (e.g., [13]), the choice of such features has to take into account the specific problem of interest. For instance, a problem of communication along the network needs to take into account the distance between nodes. It should be observed that, in most cases, the selected set of features is degenerated, in the sense that it is not enough to reproduce the original network structure. Therefore, great attention must be paid when deriving general conclusions based on incomplete sets of measurements, as is almost always the case. Some of the more traditional network measurements are reviewed in the following paragraph. The global measurement, usually derived from the node degree, is its average value along the whole network. Observe that, for a digraph, the average indegree and outdegree are necessarily identical. The average node degree gives a first idea about the overall connectivity of the network. Additional information about the network connectivity can be obtained from the average clustering coefficient . Given one specific node i, the immediately connected nodes are identified, and the ratio between the number of connections between them and the maximum possible value of those connections defines the clustering coefficient of node i, i.e., Ci. This feature tends to express the local connectivity around each node. Another interesting and frequently used network measurement is the length between any two nodes i and j, here denoted as L(i,j). This distance may refer either to the minimal sum of weight along a path from i to j, or to the total number of edges between those two nodes. The present work is restricted to the latter. The respectively derived global feature is the average length considering all possible pairs of network nodes, hence . This measurement provides an idea not only about the proximity between nodes, but also about the overall network connectivity, in the sense that low average-distance values tend to indicate a densely connected structure. Another interesting measurement that has been used to characterize complex networks is the betweenness centrality. Roughly, the betweenness centrality of a specific network node in an undirected graph corresponds to the number of shortest paths between any pair of node in the network that cross that node [18]. 10.2.2.2 Probabilistic Concepts Any measurement whose outcome cannot be exactly predicted, such as the weight of an inhabitant of Chicago, can be represented in terms of a random variable [13, 19]. Such variables can be completely characterized in terms of the respective density functions, which can be approximated in terms of the respective relative frequency histogram. Alternatively, a random variable can also be represented in terms of its (possibly) infinite moments, including the mean, variance, and so on. Statistical density functions of special interest for this chapter include the uniform distribution, which assigns the same probability to any possible measurement, and the Poisson distribution, which is
Medical image analysis method
398
characterized in terms of a ratio of event occurrence per length, area, or volume. For instance, we may have that the chance of having a failure in an electricity transmission cable is equal to one failure per 10,000 km. Therefore, the chance of observing the event along the considered structure (e.g., the transmission cable) is also equiprobable along the considered parameter (e.g., length or time). Such concepts can be immediately extended to multivariate measurements by introducing the concept of random vector. For instance, the temperature and pressure of an inhabitant of Chicago can be represented as the two-dimensional random vector [T, P]. Such statistical entities are also completely characterized, in statistical terms, by their respective multivariate densities. Statistical and probabilistic concepts and techniques are essential for representing and modeling natural phenomena and biological data because of the intrinsic variation of such measurements. 10.2.2.3 Random Graph Models The first type of complex networks to be systematically investigated were the random graphs [6, 7, 10–12, 20]. In using such graphs, one starts with N unconnected nodes and progressively adds edges between pairs of nodes chosen according to the uniform distribution. Although the measurements described in Section 2.2.1 are useful for characterizing the structure of such networks, it is also important to take into account parameters and measurements governing their dynamical evolution, including the critical phenomenon of percolation. As more connections are progressively added to a growing network, there is a definite tendency to form a giant cluster (percolation), which henceforth dominates the growing dynamics. Given a network, a cluster is understood as the set of nodes (and respective interconnecting edges) such that one can reach any node while starting from any other node in the cluster, i.e., the cluster is a connected component of the graph. The giant cluster corresponds to the cluster with the largest number of nodes at a given step of the network evolution. For an undirected random network, this phenomenon has been found to take place when the percentage of existing connections with respect to the maximum possible number of connections is about 1/N [5]. 10.2.2.4 Small-World and Scale-Free Models The types of complex networks known as small world and scale free were identified and studied years after Erdos and Renyi investigated random graphs. Small-world networks [8, 10] are characterized by a short path from any pairs of its constituent nodes. A typical example of such a network is the social interactions within a given society, in the sense that there are just a few (about five or six) relations between any two persons. Characterized later than small-world models, the scale-free networks [10–12] are characterized by the fact that the statistical distribution of the respective node degrees follows a power law, i.e., the representation of such a density in a log-log plot produces a straight line. Such densities, unlike those observed for other types of networks, implies a substantially higher chance of having nodes of high degree, which are traditionally called hubs. As reviewed in the next section, such nodes have been identified as playing an especially important role in biological networks. Scale-free networks can be produced by
Graph-based analysis of amino acid sequences
399
using the preferential-attachment growth strategy [10–12], characterized by the progressive addition of new nodes with fixed number of edges that are connected preferentially with nodes of higher degree, giving rise to the paradigm that has become known as “the rich get richer.” At the same time, scale-free networks have also been shown to be less resilient to random node attachments than other types of networks, such as random graphs [10]. 10.3 COMPLEX-NETWORKS APPROACHES TO BIOINFORMATICS Several possibilities of using complex network and statistical physics in biology have been described and revised by Bose in his interesting and extensive survey [21]. Special attention is given to relationships between the network’s topology and functional properties, and the following three situations are covered in considerable depth: 1. The topology of complex biological networks, such as metabolic and protein interaction 2. Nonlinear dynamics in gene expression 3. The effect of stochasticity on the network dynamics While we review in the following some of the most representative works applying complex-network research to biology, the reader is encouraged to complement and extend our revision by referring to Bose’s survey. Metabolic reactions, one of the key elements of life, were among the first to be studied by complex-network approaches. Such networks have their nodes representing the molecular compounds (or substrates), and the edges indicate the metabolic reactions connecting substrates. Incoming links to a substrate are understood to correspond to the reactions of which that substrate is a product. The pioneering investigation by Jeong et al. [22] considered networks that are available for 43 organisms, yielding average node indegree and outdegree in the range from 2.5 to 4, with the respective distribution being understood as scale free with exponents close to 2.2. The metabolic reactions of E. coli have been studied as undirected graphs by Wagner and Fell [23], yielding average node degree of 7 and a clustering coefficient (approximately 0.3) much larger than could be obtained for a random network. An interesting investigation into whether the duplication of information in genomes can significantly affect the power law exponents was reported by Chung et al. [24]. By using probabilistic methods as the means to analyze the evolution of graphs under duplication mechanisms, those authors were able to show that such mechanisms can produce networks with low power-law exponents, which are compatible with many biological networks [25]. The decomposition of biochemical networks into hierarchies of subnetworks, i.e., networks obtained by considering a subset of the nodes of the original graph and some of the respective edges, has been addressed by Holme and Huss [18]. These authors use the algorithm of Girvan and Newman [26] for tracing subnetworks, in a form adapted to bipartite representations of biochemical networks. The underlying principle of the algorithm is the fact that vertices between densely connected areas have high betweenness centrality, such that removal with high degree leads to the partition of the
Medical image analysis method
400
whole network into subnetworks that are contained in previous clusters, thereby producing a hierarchy of subnetworks. Another extremely important type of biological network, corresponding to genomic regulatory systems (i.e., the set of processes controlling gene expression), has also been subject of increasing attention in complex-network research. This type of directed network is characterized by having nodes corresponding to components of the system, with the edges representing the gene-expression regulations [11]. An important type of network in this category is that obtained from protein-protein interactions. In this type of network, each node corresponds to a protein, and the directed edges represent the interactions. A model of regulatory networks has been described by Kuo and Banzhaf [27]. A pioneering approach in this area is the work of Jeong et al. [28], which considered protein-protein interaction networks of S. cerevisiae, containing thousands of edges and nodes. The degree distribution was interpreted as following scale-free behavior with an approximate exponent of 2.5. One of the most important conclusions of that investigation was that the removal of the most-connected proteins (i.e., hubs, the nodes of a complex network receiving a large number of connections) can have disastrous effects on the proper functioning of the individual. The issue of protein-protein interaction networks has also been considered in a number of other works, including Qin et al. [29], Wagner [30], Pastor-Satorras et al. [31], and in studies of the properties and evolution of such networks. Another related work, described by Wuchty [32], considered graphs obtained by assigning a node to every protein domain (or module) and an edge whenever two such domains are found in the same protein. The important problem of determining protein function has been addressed from the perspective of networks of physical interaction by Vazquez et al. [33]. Their method is based on the minimization of the number of interacting proteins with different categories, so that the function estimation can be performed on a global scale while considering the entire connectivity of the protein network. The obtained results corroborate the validity of using protein-protein interaction networks as a means of inferring protein function, despite the unavoidable presence of imperfections and the incompleteness of protein networks. The analysis of gene-expression networks in terms of embedded complex logistics maps (ECLM), a hybrid method blending some concepts from wavelets and coupled logistics maps, has been reported by Shaw [34]. That study considered 112 genes collected at nine different time instants along 25 days, with each time point being fitted to an ECLM model with high Pearson correlation coefficient, and the connections between genes were determined by considering models with high pairwise correlation. The obtained connections were interpreted as following scale-free behavior in both topology and dynamics. A work by Bumble et al. [35] suggests that the study of pathways of network syntheses of genes, metabolism, and proteins should be extended to the investigation of the causes and treatment of diseases. Their approach involves methods capable of yielding, for a specific set of candidate reactions, a complete metabolic pathway network. Interesting results are obtained by investigating qualitative attributes, including relationships regarding the connectivity between vertices and the strength of connections, the relationship of interaction energies and chemical potentials with the coordination
Graph-based analysis of amino acid sequences
401
number of the lattice models, and how the stability of the networks are related to their topology. An interesting approach to analyzing the amino acid sequences of a protein in terms of subsequently overlapping strings of length K has been described by Hao et al. [36]. The strings of amino acids are represented as graphs by associating each possible subsequence of length K to each graph node, and having the edges represent the observed successive transitions of subsequences. Their investigation targeted the reconstruction of the original sequences from the overlapping string networks, which can be approached by counting the number of Eulerian loops (i.e., a cyclic sequence of connected edges that are followed without repetition). More specifically, the sequences are reconstructed while starting with the same initial subsequence, using each of the subsequences the same number of times as observed in the original data, and respecting a fixed sequence length. It was therefore verified that the reconstruction is unique for K≥5 for the majority of the considered networks (PDB.SEQ database [37]). The present work addresses co-occurrence strings of amino acids (or any other basic biological element) similar to the scheme described in the previous paragraph, but here the subsequences do not necessarily overlap, and the number of times a
FIGURE 10.2 The grouping scheme considered in this work, including two successive windows of size m and n, with overlap of g elements. subsequence is followed by another is represented by the weight of the respective edge in the associated graph, following the same scheme used for concept association as described in the literature [38, 39]. More specifically, whenever a subsequence of amino acids B is followed by another subsequence C, the weight of the edge connecting the two nodes representing those subsequences is increased by 1. Therefore, such a weighted, direct graph provides information about the number of times a specific subsequence is followed by other possible subsequences, which can be related to the statistical concept of correlation, with the difference that the sequence of the data is, unlike in the correlation, taken into account. As such, the obtained graph can be explored to characterize and model sequences of amino acids according to varying subsequence sizes. Moreover, by thresholding the weight matrix for subsequent threshold values, it is possible to identify subgraphs of the network corresponding to a strongly connected kernel of subsequences.
Medical image analysis method
402
10.4 SEQUENCES OF AMINO ACIDS AS WEIGHTED, DIRECTED COMPLEX NETWORKS A protein can be specified in terms of its respective sequence of amino acids, represented by the string S=A A2 AN, where each element Ai corresponds to one of the 20 possible amino acids, as indicated in Table 10.1. It is possible to subsume an amino acid sequence 5, by grouping subsequences of amino acids into new numerical codes with higher values, in a way similar to that described by Hao et al. [36]. The grouping scheme adopted in this work is illustrated in Figure 10.2, where the first and second group contains m and n amino acids, respectively. While it is possible to consider m n, we henceforth adopt m= n. The groups are taken with an overlap of g positions, with 0 ≤ g ≤ m. For each reference position i, we have two numerical codes B and C, obtained as follows B=(Ai–1)20m–1+···+(Ai+m–2–1)20+Ai+m–1 (10.1) and C=(Ai+m-g–l)20n–1+ +(Ai+m+n-g-2–1)20+Ai+m+n–g–1 (10.2) Therefore, we have that 1≤B and C≤ 20m.
TABLE 10.1 Amino Acids and Respective Numerical Codes Abbreviation
Numerical Code
A
1
R
2
D
3
N
4
C
5
E
6
Q
7
G
8
H
9
I
10
L
11
K
12
M
13
Graph-based analysis of amino acid sequences
403
F
14
P
15
S
16
T
17
W
18
Y
19
V
20
An example of this coding scheme is given in the following. Let the original protein sequence in abbreviated amino acids be S=MEQWPLLFVVALCI or, in numerical codes S=(13)(6)(7)(18)(15)(11)(11)(14)(20)(20)(1)(11)(5)(10) For m=n=2 and g=0, we have: i
B
C
1
246
138
2
107
355
3
138
291
4
355
211
5
291
214
6
211
280
7
214
400
8
280
381
9
400
11
10
381
205
11
11
90
Similarly, for m=n=3 and g=1, we obtain: i
B
C
1
4907
2755
2
2138
7091
3
2755
5811
4
7091
4214
Medical image analysis method
404
5
5811
4280
6
4214
5600
7
4280
7981
8
5600
205
9
7981
4090
Observe that the different ranges of i obtained in these two examples is a direct consequence of the fact that the larger size of the subsequences in the second example reduces the number of possible subsequence associations. Now, having defined the grouping scheme and the resulting sequences B and C, the graph representing the subsequent (with possible overlap) co-occurrences of numerical codes in this sequence is obtained as follows: 1. Each code in the sequences B and C is represented as one of the N nodes of the graph, whose number corresponds to the code produced for the respective sequence. For instance, the sequence (13)(6) implies a graph with two nodes identified as 13 and 6 containing a direct edge following from node 13 to node 6. Therefore, for a given m=n, we have a maximum of 20m nodes, numbered from 1 to 20m. Observe, however, that the resulting network does not necessarily include all possible nodes, allowing a reduction of the network size. 2. Every time a code B is followed by a code C, the weight of the edge connecting from node B to C is incremented by 1. In other words, the weight of the edge uniting two specific sequences B and C is equal to the number of times those two sequences are found to follow one another, in that same order, along the analyzed sequence of amino acids. Figure 10.3 illustrates the graph obtained from the sequence (13)(6)(7)(18)(15) (11)(11)(14)(20)(20)(1)(11)(5)(10)(15)(11)(14) considering m=1, where each node is represented by the respective code, and the edge weights (shown in italics) represent the number of successive subsequence (in this case a single amino acid) transitions. In this sense, the obtained graph represents the “unidirectional” correlations between two subsequent (with possible overlap) subsequences of amino acids in the analyzed protein. Such a network can be understood as a statistical model of the original protein for the specific correlation length implied by m and g. As such, it is possible to obtain simulated sequences of amino acids following such statistical models by performing Monte-Carlo simulation over the outdegrees of each node, in the sense that each outgoing edge is taken with frequency corresponding to its respective normalized weight (i.e., the sum of the weights of the outgoing edges must add up to 1). Therefore, the transition probabilities are proportional to the respective weights. Observe that the statistically normalized weight matrix of the network corresponds to a Markov chain, as the sum of any of its columns will be equal to 1.
Graph-based analysis of amino acid sequences
405
FIGURE 10.3 The network obtained for m=1 for the amino acid sequence (13)(6)(7)(18)(15) (11)(11)(14)(20)(20)(1)(11)(5)(10)(15) (11)(14). The weights of the edges are shown in italics. By thresholding the weight matrix for successive values of T (see Section 12.2.2), it is possible to obtain a family of graphs that can be understood as follows. The clusters defined for the highest values of T represent the kernels of the whole weighted network, corresponding to the subsequence associations that are most representative and more frequent along the whole protein. As the threshold is lowered, these kernels are augmented by incorporation of new nodes and merging of existing clusters. Such a threshold-based evolution of the graph can be related to the evolutionary history of the protein formation, in the sense that the kernels would have appeared first and served as organizing structures around which the rest of the molecule evolved. At the same time, the strongest connections in the obtained network also reflect the repetition of basic protein motifs, such as alpha helices and beta sheets. 10.5 RESULTS In the following investigations, we consider proteins from three animal species: zebra fish, Xenopus (frog), and rat. The gene sequencing data were obtained from the NIH Gene Collection repository (http://zgc.nci.nih.gov/, files \verb+dr_mgc_ cds_aa.fasta, \verb+xl_mgc_cds_aa.fasta, and \verb+rn_mgc_cds_aa.fasta). The raw data consisted of sequences of amino acids for the 2948, 1977, and 640 proteins (each containing on the average of 400 amino acids) in each of those files. The obtained results, which considers m=n=2 and g=0, are presented respectively for each species in the following subsections. The average node degree was obtained by adding all columns of the adjacency matrix. The clustering coefficient was obtained by identifying the n nodes connected to each
Medical image analysis method
406
node and dividing the number of existing edges between those nodes by n(n—l)/2, i.e., the maximum number of edges between those nodes. The minimum distances were calculated by using Dijkstra’s method [14]. 10.5.1 ZEBRA FISH The obtained 400×400 weight matrix (recall from the previous section that 400= 20m=202) had a maximum value of 487, obtained for the transition from SS to SS, and a minimum value of zero was obtained for 15,274 transitions. The maximum weight for transition between different nodes was 170, observed for the transition from EE to ED. The performed measurements included the average node degree (Figure 10.4(a)), clustering coefficient (Figure 10.5(a)), average length (Figure 10.6(a)), and maximum cluster size (Figure 10.7(a)) for the series of thresholded matrices KT (solid lines) and QT (dashed lines) obtained for T=1, 2, , 170. We also calculated the indegree and outdegree densities, which are shown in Figure 10.8(a) and Figure 10.8(b), respectively, for T=0. It is clear from this figure that both node degrees tend to be similar to one another, presenting a plateau for 6
Graph-based analysis of amino acid sequences
407
FIGURE 10.4 The average node degree as a function of the weight threshold T (solid line =KT, dashed line=QT) for (a) zebra-fish data, (b) Xenopus, and (c) rat.
Medical image analysis method
408
10.5.3 RAT The weight matrix had a maximum value of 98, obtained for the transition from LL to LL, and a minimum value of zero was obtained for 69,792 transitions. Such a large number of null transitions is a consequence of the smaller number of proteins available for this animal in the original data. The maximum weight for transition between different nodes was 35, observed for the transition from LL to AA. The performed measurements included the average node degree (Figure 10.4(c)), clustering coefficient (Figure 10.5(c)), average length (Figure 10.6(c)), and maximum cluster size (Figure 10.7(c)) for the series of thresholded matrices KT (solid lines) and QT (dashed lines) obtained for T=1, 2, , 35.
The indegree and outdegree densities are shown in Figure 10.12(a) and Figure 10.12(b), respectively, for T=0. Both of the resulting node degrees were again similar to one another, presenting a plateau for 4< log(k)<6 followed by a moderate decrease of node degree. The self-connections between nodes representing subsequences of two identical amino acids are given in Table 10.2. The initial kernel was also identified for T=22, with the obtained digraph shown in Figure 10.13. The dominant amino acids were L and A. 10.6 DISCUSSION Despite the different number of proteins and overall amino acid sequence lengths available for each of the three species, the clustering coefficient, average length, and maximum cluster size are determined from the respective adjacency matrices (not the weight), and therefore they are more significant statistically so that we can attempt a comparison between such measurements in the case of zebra fish and Xenopus. It is clear from Figure 10.4 that, as expected, the average node degree of the graph KT decreases monotonically with the threshold value 7, while the opposite happens
Graph-based analysis of amino acid sequences
409
for QT The abrupt way in which the average node degree varies for the thresholded and complementary matrices suggests that a kind of phase transition (critical phenomenon) takes place as the values of T are increased. As shown in Figure 10.5, the average clustering coefficient for KT tends to decrease steadily with the threshold values, undergoing a relatively abrupt transition (near T=20 for zebra fish), while the clustering coefficient of QT increases even more abruptly near T=10, suggesting a phase transition also for this measurement. Generally, the local connectivity reaches less than 10% of its maximum value after just one-third of the considered T excursion, which suggests that the network connectivity is dominated by stronger connections surrounded by much smaller connection weights. The average lengths of KT shown in Figure 10.6 suffer from the typical problem that such distances tend to fall as a consequence of the disappearance of connections. In other words, because nonexistent edges are not considered for the average length calculation, a network containing no connections has null average length, less than for a fully connected network, for which the average length would be 1 (overlooking selfconnections). To any extent, the average length presents a sharp discontinuity (near T=80 for zebra fish, 60 for Xenopus, and 20 for rat), possibly indicating that a large number of edges are cut by thresholds larger than these values. At the same time, the maximum average lengths in each case are similar and relatively small. An abrupt increase of the average length is observed for QT for small values of T, indicating that that matrix indeed suffers an abrupt change of its connection for small threshold values. The graphs in Figures 10.7 show that the maximum cluster size for KT decreases steadily for higher threshold values, as expected. The maximum cluster size for QT remained fixed at 400, confirming that the complementary matrix is highly connected. As indicated in Figure 10.8, Figure 10.10, and Figure 10.12, the node degree densities tend to present two distinct regions: one plateau portion at the left-hand side, followed by an abrupt descending portion at the right-hand side of the graph. While the indegree and outdegree densities also produced similar profiles for the three species, the respective kernels identified at different threshold levels (because of the different length of the amino acid sequences) were found to be rather different, with distinct pairs of amino acids dominating each kernel. While such a result may be strongly affected by the different amounts of data available for each of the considered species, it may also suggest different fundamental structures for the amino acid sequencing in those animals.
Medical image analysis method
410
FIGURE 10.5 Average clustering coefficient as a function of the weight threshold T (solid line = KT, dashed line = QT) for (a) zebra fish, (b) Xenopus, and (c) rat data.
Graph-based analysis of amino acid sequences
411
10.7 CONCLUDING REMARKS AND FUTURE WORK This chapter has addressed the promising perspective of using modern complex-network concepts and tools as a means of characterizing, modeling, and analyzing biological sequences, with special attention given to amino acid sequences in proteins. After presenting a brief historic perspective of complex-network research and some of its most representative applications to bioinformatics, the basic concepts of complex networks and respective topological measurements were presented. The problem of characterizing proteins in terms of weighted digraphs obtained from consecutive (with possible overlap) subsequences of amino acids was addressed
Medical image analysis method
412
FIGURE 10.6 Average length as a function of the weight threshold T (solid line = KT, dashed line = QT) for (a) zebra fish, (b) Xenopus, and (c) rat data.
Graph-based analysis of amino acid sequences
413
next, with respect to a specific protein in zebra fish, Xenopus, and rat. This investigation included the calculation of the average node degree, average clustering coefficient, the average length (in number of edges), and the size of the maximum cluster in the graph for a sequence of threshold values. The obtained curves were found to provide interesting insights about the structure of the overall protein, especially regarding the appearance of critical transitions of several of the considered measurements as T was increased. In addition, kernels were identified for each case, suggesting an interesting basic organization in the amino acid sequences. Despite
TABLE 10.2 Self-Connections of Subsequences Composed of Two Identical Amino Acids Subsequence
Number of SelfConnections (Zebra fish)
Number of SelfConnections (Xenopus)
Number of SelfConnections (Rat)
AA
274
126
27
RR
85
41
10
DD
216
186
11
NN
23
13
2
CC
14
3
3
EE
467
293
49
QQ
216
95
11
GG
310
79
21
Medical image analysis method
414
HH
67
48
0
II
8
8
2
LL
161
126
98
KK
188
104
29
MM
0
4
0
FF
4
9
0
PP
299
176
71
SS
487
233
61
TT
52
16
6
WW
0
0
0
YY
6
2
0
VV
13
19
3
Graph-based analysis of amino acid sequences
415
FIGURE 10.7 Maximum cluster size of KT as a function of the weight threshold T for (a) zebra fish, (b) Xenopus, and (c) rat data. the different sizes of the amino acid sequences, which do imply problems of statistical meaningfulness, some interesting trends have been identified regarding the comparison of the measurements obtained for the three different species, especially the general
Medical image analysis method
416
similarity between the topological properties for each species while completely different kernels and dominant amino acids have been identified for those cases. Future extensions of this work include the consideration of other m, n, and g configurations, the use of additional structural features such as betweenness centrality as well as the ratios suggested in the literature [40, 41], and the identification of the hierarchical backbone of the directed network, as suggested in the literature [39].
It would also be possible to consider the progressive merging of nodes and connected components into the initial kernel to obtain the hierarchical structure underlying the growth of the kernel, with possible applications to the complex problem of protein folding [42]. Finally, it would be interesting to use such measurements to compare proteins (in terms of amino acids and bases) from the same or distinct individuals, as well as to infer philogenetic evolution of the proteins. In the case of DNA analysis, the obtained topological measurements can provide a means for distinguishing between coding and noncoding regions. ACKNOWLEDGMENTS The author is grateful to Fundação de Amparo a Pesquisa do Estado de Sao Paulo— FAPESP (proc. 99/12765–2), Conselho Nacional de Desenvolvimento Científico e Tecnologico—CNPq (proc. 308231/03–1), and the Human Frontier Science Program for financial support.
Graph-based analysis of amino acid sequences
417
FIGURE 10.8 Loglog plot of (a) in and (b) outdegree distributions for T=0, weighted by the intensity of the edges (zebra-fish data).
Medical image analysis method
418
FIGURE 10.9 The ten-node kernel obtained for T=95 for zebra-fish data. The weights are represented in terms of the edge widths. The maximum and minimum weights are 170 and zero, the latter corresponding to selfconnections, as these have been excluded from the matrix used to obtain this picture.
Graph-based analysis of amino acid sequences
419
FIGURE 10.10 Loglog plot of (a) in and (b) outdegree distributions for T=0, weighted by the intensity of the edges (Xenopus data).
Medical image analysis method
420
FIGURE 10.11 The nine-node kernel obtained for T=64 for Xenopus data. The weights are represented in terms of the edge widths.
Graph-based analysis of amino acid sequences
421
FIGURE 10.12 Loglog plot of (a) in and (b) outdegree distributions for T=0, weighted by the intensity of the edges (rat data).
Medical image analysis method
422
FIGURE 10.13 The ten-node kernel obtained for T=2 for the rat data. The weights are represented in terms of the edge widths. REFERENCES 1. Baldi, P. and Brunak, S., Bioinformatics, MIT Press, Cambridge, MA, 2001. 2. da F. Costa, L., Signal processing in bioinformatics, IEEE Proc. Digital Signal Process. Conf., New Jersey, 2002, pp. 23-27. 3. Durbin, R., Eddy, S., Krogh, A., and Mitchison, G., Biological Sequence Analysis, Cambridge University Press, Cambridge, U.K., 1998. 4. Alberts, B., Bray, D., Lewins, L., Raff, M., Roberts, K., and Watson, J.D., Molecular Biology of the Cell 3rd ed., Garland Publishing, New York, 1994. 5. Garrett, R.H. and Grisham, C.M., Biochemistry, Saunders College Publishing, Fort Worth, TX, 1995. 6. Rapoport, A., Contribution to the theory of random A biased nets, Bull. Math. Biophys., 19, 257277, 1957. 7. Erdös, P. and Rényi, A., On the evolution of random graphs, Publications Mathematicae, 6, 290297, 1959.
Graph-based analysis of amino acid sequences
423
8. Watts, D.J. and Strogatz, S.H., Collective dynamics of small-world networks, Nature, 393, 440442, 1998. 9. Albert, R., Jeong, H., and Barabasi, A.-L., The diameter of the world-wide web, Nature, 401: 130-131, 1999. 10. Albert, R. and Barabasi, A.-L., Statistical mechanics of complex networks, Rev. Mod. Phys., 74, 47-97, 2002. 11. Dorogovtsev, S.N. and Mendes, J.F.F., Evolution of networks, Adv. Phys., 51, 1079-1187, 2002. 12. Newman, M.E.J., The structure and function of complex networks, SIAM Rev., 45, 167-256, 2003. 13. da F. Costa, L. and Cesar, R.M., Jr., Shape Analysis and Classification: Theory and Practice, CRC Press, Boca Raton, FL, 2001. 14. Aldous, J.M. and Wilson, R.J., Graphs and Applications: an Introductory Approach, SpringerVerlag, London, 2000. 15. West, D.B., Introduction to Graph Theory, Prentice Hall, Upper Saddle River, NJ, 2001. 16. Harary, R, Graph Theory, Addison-Wesley, Reading, MA, 1995. 17. Bollobas, B., Modern Graph Theory, Springer-Verlag, Heidelberg, 1998. 18. Holme, P., Huss, M., and Jeong, H., Subnetwork hierarchies of biochemical pathways, Bioinformatics, 19, 532-538, 2003. 19. Alon, N. and Spencer, J.H., The Probabilistic Method, Wiley Interscience, New York, 2000. 20. Bollobas, B., Random Graphs, Cambridge University Press, Cambridge, U.K., 2001. 21. Bose, L, Biological networks, available on-line at http://arxiv.org/abs/cond-mat/ 0202192. last accessed March 2005. 22. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Barabasi, A.-L., The large-scale organization of metabolic networks, Nature, 407, 651-654, 2000. 23. Wagner, A. and Fell, D.A., The small world inside large metabolic networks, Proc. R. Soc. London B, 268, 1803-1810, 2001. 24. Chung, F., Lu, L., Dewey, T.G., and Galas, D.J., Duplication models for biological networks, Journal of Computational Biology, 10(5), 677-688, 2003. 25. Aiello, W, Chung, F., and Lu, L., in Proc. 32nd Annu. ACM Symp. Theory Computing, 171180, 2000. 26. Girvan, M. and Newman, M.E.J., Community structure in social and biological networks, Proc. Nat. Acad. Sci. USA, 99, 7821-7826, 2002. 27. Kuo, P.D. and Banzhaf, W., Small world and scale-free network topologies in an artificial regulatory network model, Journal of Biological Physics and Chemistry, 4, 85–92, 2004. 28. Jeong, H., Mason, S.P., Barabasi, A.-L., and Oltvai, Z.N., Lethality and centrality in protein networks, Nature, 411, 41–42, 2001. 29. Qin, H., Lu, H.S.S., Wu, W.B., and Li, W.H., Evolution of the yeast protein interaction network, Proc. Nat. Acad. Sci., 100, 12,820–12,824, 2003. 30. Wagner, A., How large protein interaction networks evolve, Proceedings of the Royal Society of London Seriew B, 270, 457–466, 2003. 31. Pastor-Satorras, R., Smith, E., and Solé, R.V., Evolving protein interaction networks through gene duplication, J. Theor. BioL, 222, 99–210, 2003. 32. Wuchty, S., Scale-free behavior in protein domain networks, Mol Biol EvoL, 18, 1697–1702, 2001. 33. Vazquez, A., Flammini, A., Maritan, A., and Vespignani, A., Global protein function prediction in protein-protein interaction networks, Nat. Biotech., 21, 697–700, 2003. 34. Shaw, S., Evidence of scale-free topology and dynamics in gene regulatory networks, in Proc. ISCA 12th International Conference on Intelligent and Adaptive Systems and Software Engineering, 37–40, 2003. (ISBN 1–880843–47–1).
Medical image analysis method
424
35. Bumble, S., Friedler, E, and Fan, L.T., A toy model for comparative phenomenon in molecular biology and the utilization of biochemical applications of PNS in genetic applications, available on line at http://arxiv.org/abs/cond-mat/0304348. 36. Hao, B., Xie, H., and Zhang, S., Compositional representation of protein sequence and the number of Eulerian loops, available on line at http://arxiv.org/abs/physics/%200103028. 37. PDB.SEQ, A Collection of SWISS-PROT Entries; available on-line at http://www.%20expasy.org/sprot. 38. da F. Costa, L., What’s in a name?, International Journal of Modern Physics C, 15, 371–379, 2004. 39. da F. Costa, L., The hierarchical backbone of complex networks, Physical Review Letters, 93 (9), paper 98702, 4p., 2004. 40. da F. Costa, L., L-percolations of complex networks, Physical Review E-Statistical Physics, Plasmas, Fluids and Related Interdisciplinary Topics, 70, paper 056106, 8p., 2004. 41. da F. Costa, L., Reinforcing the resilience of complex networks, Physical Review E-Statistical Physics, Plasmas, Fluids and Related Interdisciplinary Topics, 69, paper 066127, 7p., 2004. 42. Crescenzi, P., Goldman, D., Papadimitriou, C., Piccolboni, A., and Yaxnakakis, M., On the complexity of protein folding, in Annu. Conf. Res. Computational Molecular Biol., ACM, New York, 1998, pp. 61–62.
11 Estimation of Human Cortical Connectivity with Multimodal Integration of fMRI and High-Resolution EEG Laura Astolfi, Febo Cincotti, Donatella Mattia, Serenella Salinari, and Fabio Babiloni 11.1 INTRODUCTION Human neocortical processes involve temporal and spatial scales spanning several orders of magnitude, from the rapidly shifting somatosensory processes characterized by a temporal scale of milliseconds and a spatial scale of a few square millimeters to the memory processes, involving time periods of seconds and a spatial scale of square centimeters. Information about the brain activity can be obtained by measuring different physical variables arising from the brain processes, such as the increase in consumption of oxygen by the neural tissues or a variation of the electric potential over the scalp surface. All these variables are connected in direct or indirect way to the ongoing neural processes, and each variable has its own spatial and temporal resolution. The different neuroimaging techniques are then confined to the spatio-temporal resolution offered by the monitored variables. For instance, it is known from physiology that the temporal resolution of the hemodynamic deoxyhemoglobin increase/decrease lies in the range of 1 to 2 sec, while its spatial resolution is generally observable with the current imaging techniques at the scale of a few millimeters. Today, no neuroimaging method allows a spatial resolution on a millimeter scale and a temporal resolution on a millisecond scale. Hence, it is of interest to study the possibility of integrating the information offered by the different physiological variables in a unique mathematical context. This operation is called the “multimodal integration” of variable X and Y, where the X variable typically has a particularly appealing spatial resolution property (millimeter scale), and the Y variable has particularly attractive temporal properties (on a millisecond scale). Nevertheless, the issue of several temporal and spatial domains is critical in the study of the brain functions, because different properties could become observable, depending on the spatio-temporal scales at which the brain processes are measured. Electroencephalography (EEG) and magnetoencephalography (MEG) are two interesting techniques that present a high temporal resolution, on the millisecond scale, adequate to follow brain activity. However, both techniques have a relatively modest spatial resolution, beyond the centimeter. Spatial resolution for these techniques is fundamentally limited by the intersensor distances and by the fundamental laws of electromagnetism [1]. On the other hand, the use of a priori information from other neuroimaging techniques like functional magnetic resonance imaging (fMRI) with high spatial resolution could improve the localization of sources from EEG/MEG data.
Estimation of human cortical connectivity
427
The initial part of this chapter then deals with the multimodal integration of electrical, magnetic, and hemodynamic data to locate neural sources responsible for the recorded EEG/MEG activity. The rationale of the multimodal approach based on fMRI, MEG, and EEG data to locate brain activity is that neural activity generating EEG potentials or MEG fields increases glucose and oxygen demands [2]. This results in an increase in the local hemodynamic response that can be measured by fMRI [3, 4]. On the whole, such a correlation between electrical and hemodynamic concomitants provides the basis for a spatial correspondence between fMRI responses and EEG/MEG source activity. However, static images of brain regions activated during particular tasks do not convey the information of how these regions communicate with each other. The concept of brain connectivity is viewed as central for the understanding of the organized behavior of cortical regions beyond the simple mapping of their activity [5, 6]. This organization is thought to be based on the interaction between different and differently specialized cortical sites. Cortical-connectivity estimation aims at describing these interactions as connectivity patterns that hold the direction and strength of the information flow between cortical areas. To achieve this, several methods have already been applied on data gathered from both hemodynamic and electromagnetic techniques [7–11]. Two main definitions of brain connectivity have been proposed over the years: functional and effective connectivity [12]. While functional connectivity is defined as temporal correlation between spatially remote neurophysiologic events, the effective connectivity is defined as the simplest brain circuit that would produce the same temporal relationship as observed experimentally between cortical sites. As for the functional connectivity, the several computational methods proposed to estimate how different brain areas are working together typically involve the estimation of some covariance properties between the different time series measured from the different spatial sites during motor and cognitive tasks studied by EEG and fMRI techniques [13–16]. In contrast, structural equation modeling (SEM) is a different technique that has been used for a decade to assess effective connectivity between cortical areas in humans by using hemodynamic and metabolic measurements [7, 17–19]. The basic idea of SEM differs from the usual statistical approach of modeling individual observations, because SEM considers the covariance structure of the data [17]. However, the estimation of cortical effective connectivity obtained with the application of the SEM technique on fMRI data has a low temporal resolution (on the order of 10 sec), which is far from the time scale at which the brain operates normally. Hence, it becomes of interest to understand whether the SEM technique could be applied to cortical activity estimated by applying the linear-inverse techniques to the high-resolution EEG (HREEG) data [20–23]. In this way, it would be possible to study time-vary ing patterns of brain connectivity linked to the different parts of the experimental task studied. So far, the estimation of functional connectivity on EEG signals has been addressed by applying either linear or nonlinear methods, both of which can track the direct flow of information between scalp electrodes in the time domain, although with different computational demands [21, 24–31]. In addition, given the evidence that important information in the EEG signals is often coded in the frequency rather than time domain (reviewed in [32]), research attention has been focused on detecting frequency-specific interactions in EEG or MEG signals by analyzing the coherence between the activity of pairs of structures [33–35]. However, coherence analysis does not have a directional
Medical image analysis method
428
nature (i.e., it just examines whether a link exists between two neural structures by describing instances when they are in synchronous activity), and it does not directly provide the direction of the information flow. In this respect, a multivariate spectral technique called directed transfer function (DTP) was proposed [36] to determine the directional influences between any given pair of channels in a multivariate data set. This estimator can simultaneously characterize both the directional and spectral properties of the brain signals, requiring only one multivariate autoregressive (MVAR) model that is estimated from all of the EEG channel recordings. The DTP technique has recently been demonstrated [37] to rely on the key concept of Granger causality between time series [38], according to which an observed time series x(n) generates another series y(n) if knowledge of x(n)’s past significantly improves the prediction of y(n). This relation between time series is not reciprocal, i.e., x(n) may cause y(n) without y(n) necessarily causing x(n). This lack of reciprocity is what allows the evaluation of the direction of information flow between structures. In this study, we propose to estimate the patterns of cortical connectivity by exploiting the SEM and DTP techniques applied on high-resolution EEG signals, which exhibit a higher spatial resolution than conventional cerebral electromagnetic measures. Indeed, this EEG technique includes the use of a large number of scalp electrodes, realistic models of the head derived from structural magnetic resonance images (MRIs), and advanced processing methodologies related to the solution of the linear-inverse problem. These methodologies facilitate the estimation of cortical current density from sensor measurements [39–41]. To pursue the aim of this study, we first explored the behavior of the SEM and DTP methods in a simulation context under various conditions that affect the EEG recordings, mainly the signal-to-noise ratio (factor SNR) and the length of the recordings (factor LENGTH). In particular, the following questions were addressed: What is the influence of a variable SNR level (imposed on the highresolution EEG data) on the accuracy of the estimation of pattern connectivity obtained by SEM and DTP? What amount of high-resolution EEG data is needed to accurately estimate the accuracy of the connectivity between cortical areas? To answer these questions, a simulation study was performed on the basis of a predefined connectivity scheme that linked several modeled cortical areas. Cortical connections between these areas were retrieved by the estimation process under different experimental SNR and LENGTH conditions. Indexes of the errors in the estimation of the connection strength were defined, and statistical multivariate analyses were performed by ANOVA (analysis of variance) and Duncan post hoc tests, with these error indexes as dependent variables. Subsequently, both SEM and DTP methods were applied to the cortical estimates obtained from high-resolution EEG data related to a simple fingertapping experiment in humans to underline the capability of the proposed methodology to draw patterns of cortical connectivity between brain areas during a simple motor task. Finally, we also present both the mathematical principle and the practical applications of the multimodal integration of high-resolution EEG and fMRI for the localization of sources responsible for intentional movements.
Estimation of human cortical connectivity
429
11.2 METHODS 11.2.1 MONITORING THE CEREBRAL HEMODYNAMIC RESPONSE BY FMRI A brain-imaging method, known as functional magnetic resonance imaging (fMRI), has gained favor among neuroscientists over the last few years. Functional MRI reflects oxygen consumption, and because oxygen consumption is tied to processing or neural activation, it can give a map of functional activity. When neurons fire, they consume oxygen, and this causes the local oxygen levels to decrease briefly and then actually increase above the resting level as nearby capillaries dilate to let more oxygenated blood flow into the active area. The most commonly used acquisition paradigm is the so-called blood-oxygen level dependence (BOLD), in which the fMRI scanner works by imaging blood oxygenation. The BOLD paradigm relies on the brain mechanisms, which overcompensate for oxygen usage (activation causes an influx of oxygenated blood in excess of that used, and therefore the local oxyhemoglobin concentration increases). Oxygen is carried to the brain in the hemoglobin molecules of blood red cells. Figure 11.1 shows the physiologic principle at the base of the generation of fMRI signals. This figure shows how the hemodynamic responses elicited by increased neuronal activity (Figure 11.1 (a)) reduce the deoxyhemoglobin content of the blood flow in the same neuronal district after a few seconds (Figure 11.1 (b)).The magnetic properties of hemoglobin when saturated with oxygen are different than when it has given up oxygen. Technically, deoxygenated hemoglobin is “paramagnetic” and therefore has a short relaxation time. As the ratio of oxygenated to deoxygenated hemoglobin increases, so does the signal recorded by the MRI. Deoxyhemoglobin increases the rate of depolarization of the hydrogen nuclei creating the MR signal, thus decreasing the intensity of the T2 image. The bottom line is that image intensity increases with increasing brain activation. The problem is that at the standard intensity used for the static magnetic field (1.5 Tesla), this increase is small (usually less than 2%) and easily obscured by noise and various artifacts. By increasing the static field of the fMRI scanner, the signal-to-noise ratio increases to more convenient values. Static-field values of 3 Tesla are now commonly used for research on humans, while an fMRI scanner at 7 Tesla was recently employed to map hemodynamic responses in the human brain [42]. At such a high field value, there is a possibility of detecting the initial increase of deoxyhemoglobin (after the initial “dip”). The interest in the detection of the dip is based on the fact that this hemodynamic response happens on a time scale of 500 msec (as revealed by hemodynamic optical measures [43]) compared with 1 to 2 sec needed for the response of the vascular system to the oxygen demand. Furthermore, in the latter case, the response has a temporal extension well beyond the activation that has occurred (10 sec).
Medical image analysis method
430
FIGURE 11.1 (Color figure follows p. 274.) Physiologic principle at the base of the generation of fMRI signals, (a) Neurons increase their firing rates, which increases oxygen consumption, (b) Hemodynamic response in a second scale increases the diameter of the vessel close to the activated neurons. The induced increase in blood flow overcomes the need for oxygen supply. As a consequence, the percentage of deoxyhemoglobin in the blood flow decreases in the vessel with respect to (a). As a last point, the spatial distribution of the initial dip (as described by using the optical dyes [43]) is sharper than those related to the vascular response of the oxygenated hemoglobin. Recently, with high-field-strength MR scanners at 7 or even 9.4 Tesla (on animals), a resolution down to the cortical-column level has been achieved [44]. However, at the standard field intensity commonly used in fMRI studies (1.5 or 3 Tesla), the identification of such initial transient increase of deoxyhemoglobin is controversial. Compared with positron-emitted tomography (PET) or single-photon-emitted tomography (SPECT), fMRI does not require the injection of radio-labeled substances, and its images have a higher resolution (as reviewed in the literature [45]). PET, however, is still the most informative technique for directly imaging metabolic processes and neurotransmitter turnover.
Estimation of human cortical connectivity
431
11.2.2 STRUCTURAL EQUATION MODELING In structural equation modeling (SEM), the parameters are estimated by minimizing the difference between the observed covariances and those implied by a structural or path model. In terms of neural systems, a measure of covariance represents the degree to which the activities of two or more regions are related. The SEM consists of a set of linear structural equations containing observed variables and parameters defining causal relationships among the variables. Variables in the equation system can be endogenous (i.e., dependent on the other variables in the model) or exogenous (independent of the model itself). The structural equation model specifies the causal relationship among the variables, describes the causal effects, and assigns the explained and the unexplained variance. Let us consider a set of variables (expressed as deviations from their means) with N observations. In this study, these variables represent the activity estimated in each cortical region, obtained with the procedures described in the following section. The SEM for these variables is the following: y=By+Гx+ζ (11.1) where: y
is a (m×1) vector of dependent (endogenous) variables
x
is a (n×1) vector of independent (exogenous) variables
ζ
is a (m×1) vector of equation errors (random disturbances)
B
is a (m×m) matrix of coefficients of the endogenous variables
Γ
is a (m×n) matrix of coefficients of the exogenous variables
It is assumed that ζ is uncorrelated with the exogenous variables, and B is supposed to have zeros in its diagonal (i.e., an endogenous variable does not influence itself) and to satisfy the assumption that (I—B) is nonsingular, where I is the identity matrix. The covariance matrices of this model are the following: Φ=E[xxT] is the (n×n) covariance matrix of the exogenous variables Ψ=E[ζζT] is the (m×m) covariance matrix of the errors If z is a vector containing all the p=m+n variables, exogenous and endogenous, in the following order: zT=[x1 xn, y1 .ym] (11.2) then the observed covariances can be expressed as Σobs=(1/(N-1))·Z·ZT (11.3) where Z is the p×N matrix of the p observed variables for N observations. The covariance matrix implied by the model can be obtained as follows:
Medical image analysis method
432
where E[yyT]=E[(I−B)−1 (Γx+ζ)+ζ)T (I−B)−1)T] = (I−B)−1 (ΓΦΓ+ψ)((I−B)−1)T
(11.5)
because the errors ζ are not correlated with the x, and where E[xxT]=Φ (11.6) E[xyT]=(I-B)–1Φ (11.7) T
-1
E[yx ]=((I—B) Φ)
T
(11.8) because Σmod, is symmetric. The resulting covariance matrix, in terms of the model parameters, is the following: (11.9)
Without other constraints, the problem of the minimizing the differences between the observed covariances and those implied by the model is undetermined, because the number of variables (elements of matrices B, Γ, Ψ, and Φ) is greater than the number of equations (m+n)(m+n+1)/2. For this reason, the SEM technique is based on the a priori formulation of a model on the basis of anatomical and physiological constraints. This model implies the existence of just some causal relationships among variables, represented by arcs in a “path” diagram; all the parameters related to arcs not present in the hypothesized model are forced to zero. For this reason, all the parameters to be estimated are called free parameters. If t is the number of free parameters, it must be that t≤ (m+n)(m+n+1)/2. These parameters are estimated by minimizing a function of the observed and implied covariance matrices. The most widely used objective function for SEM is the maximum likelihood (ML) function: FML=log|Σmod|+tr(Σobs·Σmod–1)—log|Σobs|—p (11.10) where tr(·) is the trace of matrix. In the context of multivariate, normally distributed variables, the minimum of the ML function multiplied by (N—1) follows a χ2 distribution with [p(p+l)/2]—t degrees of freedom, where t is the number of parameters to be estimated, and p is the total number of observed variables (endogenous+exogenous). The
Estimation of human cortical connectivity
433
χ2 statistic test can then be used to infer statistical significance of the structural equation model obtained. In the present study, the software package LISREL [46] was used to implement the SEM technique. 11.2.3 DIRECTED TRANSFER FUNCTION In this study, the DTP technique was applied to the set of cortical estimated waveforms S (11.11) obtained for the N ROIs considered, as will be described in detail in the following sections. The following MVAR process is an adequate description of the data set Z. (11.12) where e(t) is a vector of a multivariate zero-mean uncorrelated white noise process; (1), (2), , (q) are the N×N matrices of model coefficients, and q is the model order chosen, in our case, with the Akaike information criterion for MVAR processes [37]. To investigate the spectral properties of the examined process, Equation 11.12 is transformed to the frequency domain (f) Z(f)=E(f) (11.13) where (11.14) and t is the temporal interval between two samples. Equation 11.13 can then be rewritten as Z(f)=Λ–1(f) E(f)=H(f) E(f) (11.15) Here, H(f) is the transfer matrix of the system whose element Hij represents the connection between the jth input and the ith output of the system. With these definitions, the causal influence of the cortical waveform estimated in the jth ROI on that estimated in the ith ROI (the directed transfer function θ2ij(f)) is defined as (11.16) To enable comparison of the results obtained for cortical waveforms with different power spectra, a normalization was performed by dividing each estimated DTP by the squared sums of all elements of the relevant row, thus obtaining the socalled normalized DTP [36]
Medical image analysis method
434
(11.17)
where γij(f) expresses the ratio of influence of the cortical waveform estimated in the jth ROI on the cortical waveform estimated on the ith ROI, with respect to the influence of all the estimated cortical waveforms. Normalized DTP values are in the interval [47], and the normalization condition (11.18) is applied. 11.2.4 COMPUTER SIMULATION 11.2.4.1 The Simulation Study The experimental design we adopted was meant to analyze the recovery of the connectivity patterns obtained under the different levels of SNR and signal temporal length that were imposed during the generation of sets of test signals simulating cortical average activations. As described in the following subsections, the simulated signals were obtained from actual cortical data estimated with the high-resolution EEG procedures available at the high-resolution EEG Laboratory of the University of Rome. 11.2.4.2 Signal Generation for the SEM Methodology Different sets of test signals were generated to fit an imposed connectivity pattern (shown in Figure 11.2) and to respect imposed levels of temporal duration (LENGTH) and signal-to-noise ratio (SNR). In the following discussion, using a more compact notation, signals have been represented with the z vector defined in Equation 11.2, containing both the endogenous and the exogenous variables. Channel z1 is a reference-source waveform, estimated from a high-resolution EEG (128 electrodes) recording in a healthy subject during the execution of unaimed self-paced movements of the right finger. Signals z2, z3, and z4 were obtained by the contribution of signals from all other channels, with an amplitude variation plus zero-mean uncorrelated white noise processes with appropriate variances, as shown in Equation 11.19 z[k]=A*z[k] +W[k] (11.19) where z[k] is the [4×1] vector of signals, W[k] is the [4×1] noise vector, and A is the [4×4] parameters matrix obtained from the Γ and B matrices in the following way:
Estimation of human cortical connectivity
435
FIGURE 11.2 Connectivity pattern imposed in the generation of simulated signals. z1, , Z4 represent the average activities in four cortical areas. Values on the arcs represent the connection’s strength (a21=1.4, a31=1.1, a32=0.5, a42=0.7, a43=1.2). (11.20)
where βij stands for the generic (ij) element of the B matrix, and γi is the ith element of the vector Γ. All procedures of signal generation were repeated under the following conditions: SNR factor levels=(1, 3, 5, 10, 100) LENGTH factor levels=(60, 190, 310, 610) sec. This corresponds, for instance, to (120, 380, 620, 1220) EEG epochs, each of which is 500 msec long.
Medical image analysis method
436
It is worth noting that the levels chosen for both SNR and LENGTH factors cover the typical range for the cortical activity estimated with high-resolution EEG techniques. 11.2.4.3 Signal Generation for the DTP Methodology Different sets of test signals were generated to fit an imposed coupling scheme involving four different cortical areas (shown in Figure 11.2) while also respecting imposed levels of signal-to-noise ratio (factor SNR) and duration (factor LENGTH). Signal z1(t) was a reference cortical waveform estimated from a high-resolution EEG (96 electrodes) recording in a healthy subject during the execution of selfpaced movements of the left finger. Subsequent signals z2(t) to z4(t) were iteratively obtained according to the imposed scheme (Figure 11.2) by adding to signal zj contributions from the other signals, delayed by intervals τij and amplified by factors aij plus an uncorrelated Gaussian white noise. Coefficients of the connection strengths were chosen in a range of realistic values as met in previous studies during the application of other connectivityestimation techniques, such as structural equation modeling, in several memory, motor, and sensory tasks [7]. Here, the values used for the connection strength were a21=1.4, a31=1.1, a32=0.5, a42=0.7, and a43=1.2. The values used for the delay from the ith ROI to the jth one (τij) ranged from one sample up to the q—2, where q was the order of the MVAR model used. Because the statistical analysis performed with different values of such delay samples returned the same information with respect to the variation of this parameter, in the following we particularized the results to the case τ21=τ31=τ32=τ42=τ43=1 sample, which for a sampling rate of 64 Hz became a delay of 15 msec. All procedures of signal generation were repeated under the following conditions: SNR factor levels=(0.1, 1, 3, 5, 10) LENGTH factor levels=(960, 2,880, 4,800, 9,600, 19,200, 38,400) data samples, corresponding to signals length of (15, 45, 75, 150, 300, 600) sec at a sampling rate of 64 Hz, or to (7, 22, 37, 75, 150, 300) EEG trials of 2 sec each The levels chosen for both SNR and LENGTH factors cover the typical range for the cortical activity estimated with high-resolution EEG techniques. The MVAR model was estimated by means of the Nuttall-Strand method or the multivariate Burg algorithm, which is one of the most common estimators for MVAR models and has been demonstrated to provide the most accurate results [48–50]. 11.2.4.4 Performance Evaluation The quality of the performed estimation was evaluated using the Frobenius norm of the matrix, which reports the differences between the values of the estimated (via SEM) and the imposed connections (relative error). The norm was computed for the connectivity patterns obtained with the SEM methodology
Estimation of human cortical connectivity
437
(11.21)
In the case in which the DTP method was used, the statistical evaluation of DTP performances required a precise definition of an error function describing the goodness of the pattern recognition performed. This was achieved by focusing on the MVAR model structure described in Equation 11.12 and comparing it with the signals-generation scheme. The elements of matrices (k) of MVAR model coefficients can be put in relation with the coefficients used in the signal generation, and they are different from zero only for k=τij, where τij is the delay chosen for each pair ij of ROIs and for each direction among them. In particular, for the independent reference source waveform z1(t), an autoregressive model of the same order of the MVAR has been estimated, whose coefficients a11(1), , a11(q) correspond to the elements Λ11(1), , Λ11(q) of the MVAR coefficients matrix. Thus, with the estimation of the MVAR model parameters, we aim to recover the original coefficients aij(k) used in signal generation. In this way, reference DTP functions have been computed on the basis of the signal-generation parameters. The error function was then computed as the difference between these reference functions and the estimated ones (both averaged in the frequency band of interest). To evaluate the performances in retrieving the connections between areas, the same index used in the case of the SEM was adopted, but with light differences of notation, i.e., the Frobenius norm of the matrix reporting the differences between the values of the estimated and the imposed connections (total relative error) (11.22)
represents the average value of the DTP function from j to i In Equation 11.22, within the frequency band of interest. For both SEM and DTP, the simulations were performed by repeating each generation-estimation procedure 50 times to increase the robustness of the successive statistical analysis. 11.2.4.5 Statistical Analysis The results obtained were subjected to separate ANOVA. The main factors of the ANOVAs for the DTP method were the SNR (with five levels: 0.1, 1, 3, 5, 10) and the signal LENGTH (with six levels: 960, 2,880, 4,800, 9,600, 19,200, 38,400 data samples,
Medical image analysis method
438
equivalent to 15, 45, 75, 150, 300, 600 sec at 64 Hz of sampling rate). In the case of the SEM method, the main factors were identical, but the LENGTH has only four levels (equal to 60, 190, 310, and 610 sec at 64 Hz). For all of the methodologies used, ANOVA was performed on the error index that was adopted (relative error). The correction of Greenhouse-Gasser for the violation of the spherical hypothesis was used. The post hoc analysis with the Duncan test at the p=0.05 statistical significance level was then performed. 11.2.5 APPLICATION TO MOVEMENT-RELATED POTENTIALS The estimation of connectivity patterns by using the DTP and SEM on high-resolution EEG recordings was applied to the analysis of a simple movement task. In particular, we considered a right-hand finger-tapping movement that was externally paced by a visual stimulus. This task was chosen because it has been very widely studied in literature with various brain-imaging techniques such as EEG or fMRI [51–53]. 11.2.5.1 Subject and Experimental Design Three right-handed healthy subjects (age 23.3 ± 0.58, one male and two females) participated in the study after providing informed consent. Subjects were seated comfortably in an armchair with both arms relaxed and resting on pillows, and they were asked to perform fast, repetitive right-finger movements. During this motor task, the subjects were instructed to avoid eye blinks, swallowing, or any movement other than the required finger movements. 11.2.5.2 Head and Cortical Models A realistic head model of the subjects, reconstructed from T1-weighted MRIs, was employed in this study. Scalp, skull, and dura mater compartments were segmented from MRIs with software originally developed at the Department of Human Physiology of Rome, and such structures were triangulated with about 1,000 triangles for each surface. The source model was built with the following procedure: 1. The cortex compartment was segmented from MRIs and triangulated to obtain a fine mesh with about 100,000 triangles. 2. A coarser mesh was obtained by resampling the fine mesh to about 5,000 triangles. The downsampling was performed with an adaptive algorithm designed to represent with a sufficient number of triangles the parts of the cortex where the radius of curvature was high (for instance, during the bending of a sulcus) while attempting to represent with few triangles the flatter parts of the cortical surface (for instance, on the upper part of the gyri). 3. An orthogonal unitary equivalent-current dipole was placed in each node of the triangulated surface, with its direction parallel to the vector sum of the normals to the surrounding triangles.
Estimation of human cortical connectivity
439
11.2.5.3 EEG Recordings Event-related potential (ERP) data were recorded with 96 electrodes; data were recorded with a left-ear reference and submitted to an artifact-removal process. Six hundred ERP trials of 600 msec of duration were acquired. The analog-digital sampling rate was 250 Hz. The surface electromyographic (EMG) activity of the muscle was also collected. The onset of the EMG response served as zero time. All data were visually inspected, and trials containing artifacts were rejected. We used semiautomatic supervised threshold criteria for the rejection of trials contaminated by ocular and EMG artifacts, as described in detail elsewhere [54]. After the EEG recording, the electrode positions were digitized using a three-dimensional localization device with respect to the anatomic landmarks of the head (nasion and two preauricular points). The analysis period for the potentials timelocked to the movement execution was set from 300 msec before to 300 msec after the EMG trigger (zero time). The ERP time course was divided into two phases relative to the EMG onset: the first, labeled as “PRE” period, marked the 300 msec before the EMG onset and was intended as a generic preparation period; the second, labeled as “POST,” lasted up to 300 msec after the EMG onset and was intended to signal the arrival of the movement somatosensory feedback. We kept the same PRE and POST nomenclature for the signals estimated at the cortical level. 11.2.5.4 Statistical Evaluation of Connectivity Measurements by SEM and DTP As described previously, the statistical significance of the connectivity pattern estimated with the SEM technique was ensured by the fact that—in the context of the multivariate, normally distributed variables—the minimum of the maximum likelihod function FML, multiplied by (N−1), follows a χ2 distribution with [p(p+ 1)/2]−t degrees of freedom, where t is the number of parameters to be estimated, and p is the total number of observed variables (endogenous+exogenous). Then, the χ2 statistic test can be used to infer the statistical significance of the structural equation model obtained. The situation for the statistical significance of the DTP measurements is different because the DTP functions have a highly nonlinear relation to the time-series data from which they are derived, and the distribution of their estimators is not well established. This makes tests of significance difficult to perform. A possible solution to this problem was proposed by Kaminski et al. [37]. Their solution involves the use of a surrogate data technique [55] in which an empirical distribution for random fluctuations of a given estimated quantity is generated by estimating the same quantity from several realizations of surrogate data sets where the deterministic interdependency between variables has been removed. To ensure that all features of each data set are as similar as possible to the original data set, with the exception of channel coupling, the very same data are used, and any time-locked coupling between channels is disrupted by shuffling phases of the original multivariate signal. Because the EEG signal had been divided into single trials, each surrogate data set was built up by scrambling the order of epochs, using different sequences for each channel. In this procedure, every single-channel EEG epoch was used once and only once, and only occasionally (and with a very low probability), two channels in the same surrogate trial came from the same actual trial. The set properties of univariate surrogate signals are not influenced by this shuffling procedure, because only
Medical image analysis method
440
the epoch order is varied. Moreover, because no shuffling was performed between single samples, the temporal correlation, and thus the spectral features, of univariate signals is the same for the original and surrogate data sets, thus making it possible to estimate different distributions of DTP fluctuations for each frequency band. A total of 1000 surrogate data sets was generated, as described previously, and DTP spectra were estimated from each data set. For each channel pair and for each frequency bin, the 99th percentile was computed and subsequently considered as a significance threshold. 11.2.5.5 Estimation of Cortical Source Current Density The solution of the following linear system Lz=d+e (11.23) provides an estimate of the dipole source configuration z that generates the measured EEG potential distribution d. The system also includes the measurement noise n, assumed to be normally distributed [39]. In Equation 11.23, L is the lead field, or the forward transmission matrix, in which each jth column describes the potential distribution generated on the scalp electrodes by the jth unitary dipole. The current-density solution vector ξ was obtained as follows [39]: (11.24) where M, N are the matrices associated with the metrics of the data and of the source space, respectively, λ is the regularization parameter, and ||z||M represents the M-norm of the vector z. The solution of Equation 11.24 is given by the inverse operator G as follows: ζ=Gb, G=N–1L′(LN–1L′+λM–1)–1 (11.25) An optimal regularization of this linear system was obtained by the L-curve approach [56, 57]. As a metric in the data space, we used the identity matrix, but in the source space, we use the following metric as a norm (11.26) where (N–1)ii is the ith element of the inverse of the diagonal matrix N, and all the other matrix elements Nij, for each i j, are set to 0. The L2 norm of the ith column of the lead field matrix L is denoted by ||L.i|. Here, we present two characterizations of the source metric N that can provide the basis for the inclusion of the information about the statistical hemodynamic activation of ith cortical voxel into the linear-inverse estimation of the cortical source activity. In the fMRI analysis, several methods have been developed to quantify the brain hemodynamic response to a particular task. However, in the following, we analyze the case in which a particular fMRI quantification technique—the percent change (PC) technique—has been
Estimation of human cortical connectivity
441
used. This measure quantifies the percent increase of the fMRI signal during the task performance with respect to the rest state [58]. The visualization of the voxels’ distribution in the brain space that is statistically increased during the task condition with respect to the rest is called the PC map. The difference between the mean rest- and movement-related signal intensity is generally calculated voxel by voxel. The rest-related fMRI signal intensity is obtained by averaging the premovement and recovery fMRI. A Bonferroni-corrected student’s t-test is also used to minimize alpha-inflation effects due to multiple statistical voxel-by-voxel comparisons (Type I error; p<0.05). The introduction of fMRI priors into the linear-inverse estimation produces a bias in the estimation of the current-density strength of the modeled cortical dipoles. Statistically significantly activated fMRI voxels, which are returned by the percentage change approach [58], are weighted to account for the EEG-measured potentials. In fact, a reasonable hypothesis is that there is a positive correlation between local electric or magnetic activity and local hemodynamic response over time. This correlation can be expressed as a decrease of the cost in the functional PHI of Equation 11.24 for the sources zj in which fMRI activation can be observed. This increases the probability for those particular sources zj to be present in the solution of the electromagnetic problem. Such thoughts can be formalized by particularizing the source metric N to take into account the information coming from the fMRI. The inverse of the resulting metric is then proposed as follows [59]: (11.27) in which (N–1)ii and ||A.i|| have the same meaning as described previously. The term g(αi) is a function of the statistically significant percentage increase of the fMRI signal assigned to the ith dipole of the modeled source space. This function is expressed as (11.28) where αi is the percentage increase of the fMRI signal during the task state for the ith voxel, and the factor K tunes fMRI constraints in the source space. Fixing K=1 lets us disregard fMRI priors, thus returning to a purely electrical solution; a value for K»1 allows only the sources associated with fMRI active voxels to participate in the solution. It was shown that a value for K on the order of 10 (90% of constraints for the fMRI information) is useful to avoid mislocalization due to overconstrained solutions [60–62]. In the discussion that follows, the estimation of the cortical activity obtained with this metric will be denoted as diag-fMRI, because the previous definition of the source metric N results in a matrix in which the off-diagonal elements are zero. 11.2.5.6 Regions of Interest (ROIs) Several cortical regions of interest (ROIs) were drawn by two independent and expert neuroradiologists on the computer-based cortical reconstruction of the individual head models. In cases where the SEM methodology was adopted, we selected ROIs based on previously available knowledge about the flow of connections between different cortical macroareas, as derived from neuroanatomy and fMRI studies. In particular, information
Medical image analysis method
442
flows were hypothesized to exist from the parietal (P) areas toward the sensorimotor (SM), the premotor (PM), and the prefrontal (PF) areas [63–65]. The prefrontal areas (PF) were defined by including the Brodmann areas 8, 9, and 46; the premotor areas (PM) by including the Brodmann area 6; the sensorimotor areas (SM) by including the Brodmann areas 4, 3, 2, and 1; and the parietal areas (P), generated by the union of the Brodmann areas 5 and 7 (see colored areas in Figure 11.3). In cases where the DTP method was used, we selected the ROIs representing the left and right primary somatosensory (S1) areas, which included the Brodmann areas (BA) 3, 2, 1, while the ROIs representing the left and right primary motor (MI) included the BA 4. The ROIs representing the supplementary motor area (SMA) were obtained from the cortical voxels belonging to the BA 6. We further separated the proper and anterior SMA indicated into regions labeled BA 6P and 6A, respectively. Furthermore, ROIs from the right and the left parietal areas (BA 5, 7) and the occipital areas (BA 19) were also considered. In the frontal regions, the BA 46, 8, 9 were also selected (see Color Figure 11.4 following page 274.). 11.2.5.7 Cortical Current Waveforms By using the relations described above, at each time point of the gathered ERP data, an estimate of the signed magnitude of the dipolar moment for each of the 5000 cortical dipoles was obtained. In fact, since the orientation of the dipole was already defined to be perpendicular to the local cortical surface of the model, the estimation process returned a scalar rather than a vector field. To obtain the cortical current waveforms for all the time points of the recorded EEG time series, we used a unique quasi-optimal regularization λ value for all the analyzed EEG potential distributions. This quasi-optimal regularization value was computed as an average of the several λ values obtained by solving the linearinverse problem for a series of EEG potential distributions. These distributions are characterized by an average global field power (GFP) with respect to the higher and lower GFP values obtained during all the recorded waveforms. The instantaneous average of the dipole’s signed magnitude belonging to a particular ROI generates the representative time value of the cortical activity in that given ROI. By iterating this procedure on all the time instants of the gathered ERP, the cortical ROI current-density waveforms were obtained, and they could be taken as representative of the average activity of the ROI during the task performed by the experimental subjects. These waveforms could then be subjected to the SEM and DTP processing to estimate the connectivity pattern between ROIs, by taking into account the time-varying increase or decrease of the power spectra in the frequency bands of interest. Here, we present the results obtained for the connectivity pattern in the alpha band (8 to 12 Hz), because the ERP data related to the movement preparation and execution are particularly responsive within this frequency interval (for a review, see Pfurtscheller and Lopes da Silva [32]).
Estimation of human cortical connectivity
443
FIGURE 11.3 Cortical connectivity patterns obtained with the SEM
Medical image analysis method
444
method for the period preceding and following the movement onset in the alpha (8 to 12 Hz) frequency band. The patterns are shown on the realistic head model and cortical envelope (obtained from sequential MRIs) of the subject analyzed. Functional connections are represented with arrows moving from one cortical area to another. The colors and sizes of the arrows code the strengths of the functional connectivity observed between ROIs. The labels are relative to the name of the ROIs employed, (a) Connectivity pattern obtained from ERP data before the onset of the right-finger movement (electromyographic onset, EMG). (b) Connectivity patterns obtained after the EMG onset.
Estimation of human cortical connectivity
445
FIGURE 11.4 Cortical connectivity patterns obtained with the DTP method for the period preceding and following
Medical image analysis method
446
the movement onset in the alpha (8 to 12 Hz) frequency band. The patterns are shown on the realistic head model and cortical envelope (obtained from sequential MRIs) of the subject analyzed. Functional connections are represented with arrows moving from one cortical area to another. The colors and sizes of the arrows code the strengths of the connections, (a) Connectivity pattern obtained from ERP data before the onset of the rightfinger movement (electromyographic onset, EMG). (b) Connectivity patterns obtained after the EMG onset. 11.3 RESULTS 11.3.1 COMPUTER SIMULATIONS FOR SEM Each set of signals was generated as described in the Methods section (Section 11.2) to fit a predefined connection model as well as to respect different levels of the two factors SNR and LENGTH of the recordings. The resulting signals were analyzed by means of the freeware software LISREL, which gave as a result an estimation of the connection strengths. Figure 11.2 shows the connection model used in the signal generation and in the parameter estimation. The arrows represent the existence of a connection directed from the signal zi toward the signal zj, and the values on the arcs aij; represent the connection parameters described in Equation 11.20. The results obtained for the statistical analysis performed on the 50 repetitions of the procedure are reported in Figure 11.5, representing the plot of means of the relative error with respect to signal LENGTH and SNR. ANOVA has identified a strong statistical significance of both factors considered. The factors SNR and LENGTH were both highly significant (factor LENGTH F=288.60, p < 0.0001; factor SNR F=22.70, p<0.0001). Figure 11.5(a) shows the plot of means of the relative error with respect to the signal length levels, which reveals a decrease of the connectivity estimation error with an increase in the length of the available data. Figure 11.5(b) shows the plot of means with respect to the different SNR levels employed in the simulation. Because the main factors were found highly statistically significant, post hoc tests (Duncan at 5%) were then applied. Such tests showed statistically significant differences between all levels of the factor LENGTH, although there is no statistically significant difference between levels 3, 5, and 10 of the factor SNR.
Estimation of human cortical connectivity
447
11.3.2 COMPUTER SIMULATIONS FOR DTP The connectivity model used in the signal generation was the same as was used for the SEM simulation, which is shown in Figure 11.2. A multivariate autoregressive model of order 8 was fitted to each set of simulated data. Then, the normalized DTP functions were computed from each autoregressive model. The procedure of signal generation and DTP estimation was carried out 50 times for each level of factors SNR and LENGTH. The index of performances used, i.e., the relative error, was computed for each generationestimation procedure performed and then subjected to ANOVA. In this statistical analysis, relative error was the dependent variable, and the different SNR and LENGTH imposed in the signal generation were the main factors. ANOVA revealed a strong statistical influence of all the main factors (SNR and LENGTH; for relative error we obtained: SNR: F=3295.5, p<0.0001; LENGTH: F=1012.4, p<0.0001). Figure 11.6 shows the influence of factors SNR and LENGTH on relative error. In detail, Figure 11.6(a) shows the plot of means of the relative error with respect to the signal LENGTH levels, which reveals a decrease of the connectivity estimation error with an increase in the length of the available data; Figure 11.6(b) shows the plot of means with respect to different SNR levels employed in the simulation. In particular, for a SNR between 3 and 10, the expected error in the estimation of the connectivity pattern was generally under 7%, and the same values were obtained for ERP recording longer than 150 sec. Because the main factors were found to be statistically significant, post hoc tests (Duncan test at 5%) were then applied. The results showed statistically significant differences between the levels 15 and 45 sec (960 and 2880 samples, respectively) of the factor LENGTH and the other levels, but there is no statistically significant difference between levels 3, 5, and 10 of the factor SNR. 11.3.3 APPLICATION TO HIGH-RESOLUTION EVENTRELATED POTENTIAL RECORDINGS The results of the application of the SEM method for estimating the connectivity on the event-related potential recordings is depicted in Figure 11.3, which shows the statistically significant cortical connectivity patterns obtained for the period preceding the movement onset in subject no. 1, in the alpha frequency band. Each pattern is represented with arrows that connect one cortical area (the source) to another one (the target). The colors and sizes of arrows code the level of strength of the functional connectivity observed between ROIs. The labels indicate the names of the ROIs employed. Note that the connectivity pattern during the period preceding the movement in the alpha band involves mainly the parietal left ROI (P1) coincident with Brodmann areas 5 and 7 functionally connected to the left and right premotor cortical ROIs (PM1 and PMr), the left sensorimotor area (SMI), and both the prefrontal ROIs (PF1 and PFr). The stronger functional connections are relative to the link between the left parietal and the premotor areas of both cerebral hemispheres. After the preparation and the beginning of the finger movement in the POST period, changes in the connectivity pattern can be noted. In particular, the origin of the functional connectivity links is positioned in the sensorimotor left cortical areas (SMI). From there, functional links are established with prefrontal left (PF1) and both the premotor areas (PM1 and PMr). A functional link emerged in this
Medical image analysis method
448
condition connecting the right parietal area (Pr) with the right sensorimotor area (SMr). The left parietal area (P1) that was so active in the previous condition was instead linked with the left sensorimotor (SMI) and right premotor (PMr) cortical areas. Connectivity estimations performed by DTP on the movement-related potentials were first analyzed from a statistical point of view via the previously described shuffling procedure. The order of the MVAR model used for each DTP estimation had to be determined for each subject and in each temporal interval of the cortical waveform segmentations (PRE and POST interval). The Akaike information criterion (AIC) procedure was used and returned an optimal order between 6 and 7 for all the subjects in both PRE and POST intervals. On such cortical waveforms, the DTP computational procedure described in the Methods section (Section 11.1) was applied. Figure 11.4 shows the cortical connectivity patterns obtained for the period preceding and following the movement onset in subject no. 1. Here, we present the results obtained for the connectivity pattern in the alpha band (8 to 12 Hz), as the ERP data related to the movement preparation and execution are particularly responsive within this frequency interval (for a review, see Pfurtscheller and Lopes da Silva [32]). The task-related pattern of cortical connectivity was obtained by calculating the DTP between the cortical current-density waveforms estimated in each ROI depicted on the realistic cortex model. The connectivity patterns between the ROIs are represented by arrows pointing from one cortical area to another. The arrows’ color and size code the strength of the functional connectivity estimated between the source and the target ROI. Labels indicate the ROIs involved in the estimated connectivity pattern. Only the cortical connections statistically significant at p< 0.01 are represented, according to the thresholds obtained by the shuffling procedure. It can be noted that the connectivity patterns during the period preceding and following the movement in the alpha band involve bilaterally the parietal and sensorimotor ROIs, which are also functionally connected with the premotor cortical ROIs. A minor involvement of the prefrontal ROIs is also observed. The stronger functional connections are relative to the link between the premotor and prefrontal areas of both cerebral hemispheres. After the preparation and the beginning of the finger movement in the POST period, slight changes in the connectivity patterns can be noted.
Estimation of human cortical connectivity
449
FIGURE 11.5 (Color figure follows p. 274.) Results of ANOVA performed on the relative error resulting from SEM simulations, (a) Plot of means with respect to signal LENGTH as a
Medical image analysis method
450
function of time (seconds). ANOVA shows a high statistical significance for factor LENGTH (F=288.60, p<0.0001). Duncan post hoc test (performed at 5% level of significance) shows statistically significant differences between all levels, (b) Plot of means with respect to signal-tonoise ratio. Here, too, a high statistical influence of factor SNR on the error in the estimation is shown (F=22.70, p<0.0001). Duncan post hoc test (performed at 5% level of significance) shows that there is no statistically significant difference between levels 3, 5, and 10 of factor SNR.
Estimation of human cortical connectivity
451
FIGURE 11.6 (Color figure follows p. 274.) Results of ANOVA performed on the relative error resulting from DTP simulations, (a) Plot of means with respect to signal LENGTH as a function of time (seconds). ANOVA
Medical image analysis method
452
shows a high statistical significance for factor LENGTH (F=1012.36, p<0.0001). Duncan post hoc test (performed at 5% level of significance) shows statistically significant differences between levels 15 and 45 sec at 64-Hz sampling rate (equivalent of 960 and 2880 samples, respectively) of the factor LENGTH and all the other levels, (b) Plot of means with respect to signal-to-noise ratio. Here, too, a high statistical influence of factor SNR on the error in the estimation is shown (F=3295.45, p<0.0001). Duncan post hoc test (performed at 5% level of significance) shows that there is no statistically significant difference between levels 3, 5, and 10 of factor SNR. 11.3.4 APPLICATION OF THE MULTIMODAL EEG-FMRI INTEGRATION TECHNIQUES TO THE ESTIMATION OF SOURCES OF SELF-PACED MOVEMENTS In this section, we provide a practical example of the application of the multimodal integration techniques of EEG and fMRI (as theoretically described in the previous sections) to the problem of detection of neural sources subserving unilateral selfpaced movements in humans. The high-resolution EEG recordings (128 scalp electrodes) were performed on normal healthy subjects by using the facilities available at the laboratory of the Department of Human Physiology, University of Rome. Realistic head models were used, each one provided with a cortical surface reconstruction tessellated with 3000 current dipoles. Separate block design and eventrelated fMRI recordings of the same subjects were performed by using the facilities of the Institute Tecnologie Avanzate Biomediche (ITAB) of Chiety, Italy. Distributed linear-inverse solutions by using hemodynamic constraints were obtained according to the previously described methodology. Figure 11.7 presents the typical situation that occurred when different imaging methods were used to characterize the brain activity generated during a specific task. In particular, the task performed by the subject was the self-paced movement of the middle finger of the right hand. This task was performed three times under three different scanners, namely the fMRI, the HREEG, and the MEG. On the left of Figure 11.7, there is a view of some cerebral areas active during the movement, as reported by fMRI. The maximum values of the fMRI responses are located in the
Estimation of human cortical connectivity
453
FIGURE 11.7 (Color figure follows p. 274.) (Left) A view of some cerebral areas active during the self-paced movement of the right finger, as reported by fMRI. (Right) Dura mater potential distribution estimated with the use of the SL operator over a cortical surface reconstruction. The deblurred distribution is obtained at 100 msec after the EMG onset of the right middle finger. voxels roughly corresponding to the primary somatosensory and motor areas (hand representation) contralateral to the movement. In fact, during the self-paced unilateral finger extension, somatosensory reafference inputs from finger joints as well as cutaneous nerves are directed to the primary somatosensory area, while centrifugal commands from the primary motor area are directed toward the spinal cord via the pyramidal system. At the center of the figure is represented the dura mater potential distribution estimated with the use of the SL operator over a cortical surface reconstruction. The deblurred distribution is obtained at 100 msec after the EMG onset of the right middle finger. Note the characteristic reverse negative and positive SL fields on the left hemisphere. It is easy to appreciate the different time resolutions of the different techniques, with the fMRI data being relative to the whole time course of the experiment, and the highresolution EEG data being relative to a particular span of milliseconds of the cortical electromagnetic field evolution related to the same experiment. Simulations performed to test the efficacy of the multimodal integration of HREEG and fMRI return the information that the inclusion of fMRI priors improves the reconstruction of cortical activity [22, 60]. Figure 11.8(a) presents three cortical currentdensity distributions. The left one shows the cortical regions roughly corresponding to the
Medical image analysis method
454
supplementary motor area and the left motor cortex, with the imposed activations represented in black. The imposed activations generated a potential distribution over the scalp electrodes (not shown in the figure). From this potential distribution, different inverse operators with and without the use of fMRI priors (located in the supplementary and left motor areas) attempted to estimate the current-density distribution. The currentdensity reconstruction at the center of Figure 11.8(a) shows the results of the estimation of sources presented on the left map (obtained using the minimum-norm estimate procedure) without the use of fMRI
FIGURE 11.8 (Color figure follows p. 274.) (a) Three cortical currentdensity distributions. The left one shows the simulated cortical regions roughly corresponding to the supplementary motor area and the left motor cortex, with the imposed activations represented in black. The current-density reconstruction at the center of the figure presents the results of the estimation of sources (obtained using the minimum-norm estimate
Estimation of human cortical connectivity
455
procedure) presented on the left map without the use of fMRI priors. The current-density reconstruction on the right of the figure presents the cortical activations recovered by the use of fMRI priors in agreement with Equation 11.27. (b) Distributions of the current density estimated with the linear-inverse approaches from the potential distribution relative to the movement preparation, about 200 msec before a right middle finger extension. The distributions are represented on the realistic subject’s head volume conductor model. (Left) Scalp potential distribution recorded 200 msec before movement execution. (Center) Cortical estimate obtained without the use of fMRI constraints, based on the minimum-norm solutions. (Right) Cortical estimate obtained with the use of fMRI constraints based on Equation 11.27. priors. The current-density reconstruction on the right of the figure presents the cortical activations recovered by the use of fMRI priors in agreement with the Equation 11.27. Figure 11.8(b) illustrates the cortical distributions of the current density (estimated with the linear-inverse approaches from the potential distribution relative to the movement preparation) about 200 msec before the extension of a right middle finger. Such an approach used no fMRI constraint as well the fMRI constraints based on Equations 11.27 and 11.28. The left of Figure 11.8(b) shows the topographic map of readiness potential distribution recorded at the scalp about 200 msec before extension of the right middle finger for another subject analyzed. Note the extension of the maximum of the negative scalp potential distribution, roughly overlying the frontal and centroparietal areas contralateral to the movement.
Medical image analysis method
456
FIGURE 11.9 (a) Amplitude grayscale three-dimensional maps obtained by linear-inverse estimates from highresolution electroencephalographic (HREEG) and combined functional magnetic resonance image (fMRI)HREEG data computed from a subject about 50 msec before (readiness potential peak, RPp) and 20 msec after (motor potential peak, MPp) the onset of the electromyographic activity associated with self-paced movements of the right middle finger. Percent gray scale of HREEG and combined fMRIHREEG data is normalized with reference to the maximum amplitude calculated for each map. Maximum negativity (100%) is coded in white and maximum positivity (+100%) in black, (b) Estimation of the currentdensity waveforms in regions of interest (ROIs) coincident with the Brodmann areas. The waveforms estimated are relative to the estimation
Estimation of human cortical connectivity
457
performed with the use of information from hemodynamic measurements. The cortical distributions are represented on the realistic subject’s head volume conductor model in the center and at the right of Figure 11.8(b). The linear-inverse solution obtained with the fMRI priors presents more localized spots of activation with respect to those obtained with the no-fMRI priors. Remarkably, the spots of activation were localized in the hand region of the primary somatosensory (postcentral) and motor (precentral) areas contralateral to the movement. In addition, spots of minor activation were observed in the frontocentral medial areas (including supplementary motor area) and in the primary somatosensory and motor areas of the ipsilateral hemisphere. Figure 11.9 provides another example of multimodal integration between EEG and fMRI related to a simple voluntary movement task by using only the hemodynamic information relative to the strength of fMRI data (according to Equation 11.27). Figure 11.9(a) shows the amplitude gray-scale maps of linear source inverse estimates from EEG (first column) and combined fMRI-EEG (second column) data. The estimates were computed about 50 msec before (readiness potential peak, RPp; first row) and 20 msec after (motor potential peak, MPp; second row) the onset of the electromyographic response to voluntary right-finger movements. The linear-inverse estimates of neural activity were mapped over the cortical compartment of a realistic MRI-constructed subject’s head model. The RPp map (first row) presents maximum responses in the contralateral M1 and S1 and in the modeled SMA. Activation is stronger in proximity of the movement onset (MPp maps, second row). With respect to the high resolution EEG solutions (left column), the fMRI-EEG solutions present more-circumscribed M1, S1, and SMA responses (second column). In addition, the contralateral M1 and S1 responses have similar intensity and are spatially dissociated. Figure 11.9(b) shows the cortical distribution of the current density estimated with the linear-inverse approach (from the potential distribution of the movement-related potentials) with the inclusion of the fMRI priors. Also presented are the current-density waveforms relative to the average values of the estimated activations along the task performed. Note that the cortical activity relative to the Brodmann areas is estimated here with just the use of noninvasive electrophysiological and hemodynamical measurement procedures. 11.4 DISCUSSION 11.4.1 SIMULATION RESULTS FOR SEM The experimental design adopted for the present simulation study was chosen with the aim of analyzing the most common situations in which the proposed application of SEM technique to high-resolution EEG data might take place. The levels chosen for main factor levels SNR and LENGTH covered the most typical situations that can occur in this analysis. The obtained results indicate a clear influence of different levels of the main factors SNR and LENGTH on the efficacy of the estimation of cortical connectivity via SEM. In short:
Medical image analysis method
458
1. A variable SNR level imposed on the high-resolution EEG data significantly influenced the accuracy of the connectivity pattern estimation. In particular, SNR=3 seemed to be satisfactory in obtaining a good accuracy, as there were not significant differences in the performance with higher SNR values. 2. A usable accuracy in the estimation of connectivity between cortical areas was achieved with a minimum of 190 sec of EEG registration (equivalent, for instance, to 380 trials of 500 msec each). However, an increase of the length of the available EEG data is always related to a decrease in the connectivity estimation error. It might be wondered how the present findings, obtained by using several levels of cortical SNR, could be directly extended to the SNR related to the scalp-recorded EEG data. In general, there is a difference between the imposed SNR at the cortical level and that observed at the scalp level. This difference is due to errors in the estimation procedure of the cortical activity. These errors, already described in simulation studies in the literature [62, 66, 67], could be treated as an additional source of noise in the propagation from the cortex to the scalp. Such simulations indicated that, for highresolution EEG studies with a realistic head modeling tessellation ranging from 3000 to 5000 dipoles, the relative errors in the cortical estimation are less than 10%. Hence, we could insert this 10% error in the cortical estimate due to the inversion process as an additional noise-source error. In this hypothesis, the cortical SNR can hardly be higher than 10, even if the scalp SNR is very high, due to the inversion error introduced by the use of Equation 11.25. On the other hand, when the scalp SNR is much lower than 10, the contribution of the inversion error vanishes. In the intermediate cases, the cortical SNR is only slightly lower than scalp SNR; a scalp SNR equal to 3, for instance, would yield a cortical SNR equal to 2.3. It is worth noting that these SNR conditions are generally obtained in many standard EEG recordings of event-related activity in humans, usually characterized by values of SNR ranging from 3 (movement-related potentials) to 10 (sensory-evoked potentials) and a total length of the recordings starting from 50 sec [68]. The results obtained with the SEM technique seem to indicate an opportunity to use connectivity models that are not too detailed, in terms of cortical areas involved, as a first step of the network modeling. By using a coarse model of the cortical network to be fitted on the EEG data, there is an increase of the statistical power and a reduced possibility of generating an error in a single arc link [69]. In the present human study, this observation was taken into account by selecting a coarse model for the brain areas subserving the task being analyzed. This simplified model does not take complete account of all the possible regions engaged in the task or all the possible connections between them. However, elaborate models that permit cyclical connections between regions can become computationally unstable [19].
11.4.2 SIMULATION RESULTS FOR DTP In this study we have proposed to apply the DTP method [36, 37] to the cortical data whose estimation was performed with the linear-inverse problem solution based on realistic models of the head as volume conductor and high-resolution EEG recordings.
Estimation of human cortical connectivity
459
This approach was meant to overcome the principal limits of other methods already utilized for determining brain connectivity. A series of simulations were considered to evaluate the use of DTP technique on test signals that were generated to simulate the average electrical activity of cerebral cortical regions, gathered under different conditions of noise and length of the high-resolution EEG recordings. The values used for the strength coefficients in simulations are consistent with those estimated in previous studies for a large sample of subjects performing memory, motor, and sensory tasks [7]. Our findings indicated a clear influence of different levels of the main factors SNR and LENGTH on the efficacy of the estimation of cortical connectivity via DTP. In particular, it was noted that a SNR equal to 3 and a LENGTH of the estimated cortical data of 75 sec at 64 Hz (4800 data samples) were necessary to decrease significantly the errors on the indexes of quality adopted. These conditions are generally obtained in many standard EEG recordings of event-related activity in humans, usually characterized by values of SNR ranging from 3 (movement-related potentials) to 10 (sensory-evoked potentials) [68]. The information obtained by the simulation study was used to evaluate the applicability of this method to actual event-related recordings. The gathered ERP signals related to the analyzed finger-tapping data showed a SNR between 3 and 5. Furthermore, the total recording length of the gathered ERP data was obtained by 600 trials of 600msec length. Therefore, according to the simulation results, we applied the DTP method on the estimated cortical current-density data expecting a limited amount of errors in the estimation of cortical connectivity patterns. The use of DTP to assess cortical connectivity is an interesting procedure, because it provides directional information, i.e., it allows establishing the direction of the information flow between two particular cortical areas. This information is not generally available by means of several other techniques used to assess coupling between signals such as, for instance, the standard coherence. The evaluation of several methods for the computation of the functional connectivity between coupled EEG/MEG signals was recently performed [70]. It was concluded that nonlinear methods such as mutual information, nonlinear correlation, and generalized synchronization [28, 71–73] might be preferred when studying EEG broadband signals that are sensitive to dynamic coupling and nonlinear interactions expressed over many frequencies. However, the linear measurements are still very useful because they afford a rapid and straightforward characterization of functional connectivity. 11.4.3 APPLICATION OF CONNECTIVITY ESTIMATION METHODS TO REAL EEG DATA In the case where the SEM methodology was applied on the recorded high-resolution EEG data, our model of interactions between cortical areas is based on previous results on similar tasks obtained with fMRI. This model is sufficient to address some key questions regarding the influence of the premotor and motor areas toward the prefrontal cortical areas during the task analyzed. The finger-tapping data analyzed here present a high SNR and a large number of trials, resulting in an extended record of ERP data. Hence, the present simulation results suggest the optimal performance of the SEM method as applied to the human ERP potentials.
Medical image analysis method
460
The connectivity pattern estimated via SEM reveals the potential of extending the employed methodology to the use of high-resolution EEG recordings, the generation of a realistic head model by using sequential MRIs, and the estimation of cortical activity with the solution of the linear-inverse problem. With this methodology, it would be possible not only to detect which of the cortical areas activate during a particular (motor) task, but also how these areas are effectively interconnected in subserving that given task. In particular, the influence of the parietal area has been observed toward the premotor cortical areas during the task preparation, consistent with the role that the parietal areas have in the engagement of attentive resources as well as temporization, as assessed by several electrophysiological studies on primate or hemodynamical studies on humans [74]. It is of interest that the shift of the cortical areas behaves as the most relevant origin of functional links when the somatosensory reafferences arrive from the periphery to the cortex. In fact, the left sensorimotor area becomes very active with respect to the left parietal one, which is mainly active in the time period preceding the finger movement. Connections between the sensorimotor area and the premotor and left prefrontal areas are appropriate to distribute the information related to the movement of the finger to the higher functional regions (prefrontal and premotor). From a physiological point of view, our results obtained by estimating the connectivity patterns with the DTP are consistent and integrate those results already present in the literature on simple finger movements, as they have been obtained with neuroelectric measurements. A study employing ERP measurements from scalp electrodes and the assessment of connectivity with the nondirectional coherence methods has underlined the role of the primary sensorimotor and supplementary motor areas in the processing of the movements [75]. The connectivity patterns depicted in the premotor and prefrontal ROIs analyzed here are in agreement with earlier hypotheses formulated in the literature [76– 78]. The aforementioned studies have suggested that the dorsolateral and the ventral premotor cortices are specifically activated by movements guided by sensory information as opposed to movements carried out with no sensory control. The activity noted in the parietal areas (BA 5) in the present study could be associated with the role that this area has in the somatosensory-motor integration underlying the performance of movement. In fact, it has been hypothesized that this area could be regarded as a higher-order somatosensory zone devoted to the analysis of proprioceptive information from joints for the appropriate motor control [79]. 11.4.4 APPLICATION OF CONNECTIVITY ESTIMATION METHODS TO REAL EEG DATA This chapter reviewed a mathematical framework for the integration of EEG and fMRI data. In general, there is a rather large consensus about the need and utility of the multimodal integration of metabolic, hemodynamic, and neuroelectrical data. Results reviewed in the literature as well as those presented here suggest a real improvement in the spatial details of the estimated neural sources by performing multimodal integration. However, while a precise electromagnetic theory exists for the multimodal integration of EEG and magnetoencephalographic (MEG) data, a clear mathematical and physiologic link between metabolic demands and firing rates of the neurons is still lacking. When this link is further clarified, the modeling of the interaction between hemodynamic and neural
Estimation of human cortical connectivity
461
firing rate will undoubtedly be further refined. This will lead us to a better characterization of the issues of visible and invisible sources that, at the moment, represent the major concern about the applicability of the multimodal integration techniques [80]. The results for the multimodal integration of EEG/MEG and fMRI presented in this chapter are in line with those regarding the coupling between cortical electrical activity and hemodynamic measures, as indicated by a direct comparison of maps obtained using voltage-sensitive dyes (which reflect depolarization of neuronal membranes in superficial cortical layers) and maps derived from intrinsic optical signals (which reflect changes in light absorption due to changes in blood volume and oxygen consumption [81]). Furthermore, previous studies on animals have also shown a strong correlation between local field potentials, spiking activity, and voltage-sensitive dye signals [82]. Finally, studies in humans comparing the localization of functional activity by invasive electrical recordings and fMRI have provided evidence of a correlation between the local electrophysiological and hemodynamic responses [4]. This link was investigated in a recent study [83]. In this study, intracortical recordings of neural signals and simultaneous fMRI signals were acquired in monkeys, and comparisons were made between the local field potentials, the multiunit spiking activity, and BOLD signals in the visual cortex. The study supports the link between the local field potentials and BOLD mechanism, which is at the base of the procedure of the multimodal integration of EEG/MEG with fMRI described in this chapter. This may suggest that the local fMRI responses can be reliably used to bias the estimation of the electrical activity in the regions showing a prominent hemodynamic response. It could be argued that combined EEG-fMRI responses could be less reliable for the modeling of cortical activation in the case of a spatial mismatch between electrical and hemodynamic responses. However, previous studies have suggested that by using the fMRI data as a partial constraint in the linear-inverse procedure, it is possible to obtain accurate source estimates of electrical activity, even in the presence of some spatial mismatch between the generators of EEG data and the fMRI signals [60, 62]. Furthermore, it is questionable whether the level of bias for the hemodynamic constraints in the linear-inverse estimation can be the same with the diag-fMRI and corr-fMRI approaches. This issue seems worthy of a specific simulation study using the literature indexes capable of assessing the quality of the linear-inverse solutions [39, 40]. 11.5 CONCLUSIONS Taken together, our findings suggest that an accurate estimation of cortical connectivity patterns can be achieved by using realistic models for the head and cortical surfaces, high-resolution EEG recordings, and effective and functional cortical connectivity by using the SEM and DTP methods. The simulation findings suggest that in conditions largely met in the ERP recordings (SNR at least 3 and a length of the recording EEG above 75 sec at 64 Hz, or to 4800 data samples), the computation of functional connectivity by SEM or DTF can be performed with moderate quantitative errors. The use of high-resolution EEG recordings and the estimation of the cortical waveforms in ROIs via the solution of the linear-inverse problem facilitates evaluation of the functional
Medical image analysis method
462
cortical connectivity patterns related to the task performed. These computational tools (high-resolution EEG, estimation of cortical activity via the linear-inverse problem, SEM, and DTF) could be of interest in assessing time-vary ing functional connectivity patterns from noninvasive EEG recordings in humans. Such procedures could be integrated by using the information coming from the hemodynamic measurements (such as fMRI), as it has been demonstrated that the inclusion of the fMRI priors improves the estimation of the source of cortical activity. In conclusion, we have presented here an integrated approach to estimate brain cortical connectivity information by using noninvasive methodologies involving the multimodal integration of electrophysiological and hemodynamic measurements. These methodologies enable us to detect the level of statistical significance of the estimated cortical activations in the selected ROIs, and to follow the time-vary ing pattern of connectivity that eventually develops during simple motor tasks in humans. This body of methodologies could be suitable for the analysis of simple as well as complex movements or cognitive tasks in humans. REFERENCES 1. Nunez, P., Electric Fields of the Brain, Oxford University Press, New York, 1981. 2. Magistretti, P.J. et al., Energy on demand, Science, 283, 496–497, 1999. 3. Grinvald, A. et al., Functional architecture of cortex revealed by optical imaging of intrinsic signals, Nature, 324, 361–364, 1986. 4. Puce, A. et al., Comparison of cortical activation evoked by faces measured by intracranial field potentials and functional MRI: two case studies, Hum. Brain Mapping, 5, 298–305, 1997. 5. Lee, L., Harrison, L.M., and Mechelli, A., A report of the functional connectivity workshop, Dusseldorf 2002, Neuroimage, 19, 457–465, 2003. 6. Horwitz, B., The elusive concept of brain connectivity, Neuroimage, 19, 466–470, 2003. 7. Buchel, C. and Friston, K.J., Modulation of connectivity in visual pathways by attention: cortical interactions evaluated with structural equation modelling and fMRI, Cereb. Cortex, 7, 768–778, 1997. 8. Gevins, A.S. et al., Event-related covariances during a bimanual visuomotor task: II, Preparation and feedback, Electroencephalogr. Clin. Neurophysiol, 74, 147–160, 1989. 9. Urbano, A. et al., Dynamic functional coupling of high-resolution EEG potentials related to unilateral internally triggered one-digit movements, Electroencephalogr. Clin. Neurophysiol, 106, 477–487, 1998. 10. Brovelli, A. et al., Medium-range oscillatory network and the 20-Hz sensorimotor induced potential, Neuroimage, 16, 130–141, 2002. 11. Taniguchi, M. et al., Movement-related desynchronization of the cerebral cortex studied with spatially filtered magnetoencephalography, Neuroimage, 12, 298–306, 2000. 12. Friston, K.J., Functional and effective connectivity in neuroimaging: a synthesis, Hum. Brain Mapping, 2, 256–78, 1994. 13. Gerloff, C. et al., Functional coupling and regional activation of human cortical motor areas during simple, internally paced and externally paced finger movements, Brain, 121, 1513–1531, 1998. 14. Gevins, A.S. et al., Event-related covariances during a bimanual visuomotor task: II, Preparation and feedback, Electroencephalogr. Clin. Neurophysiol, 74, 147–160, 1989. 15. Jancke, L. et al., Cortical activations during paced finger-tapping applying visual and auditory pacing stimuli, Brain Res. Cognit. Brain Res., 10, 51–66, 2000.
Estimation of human cortical connectivity
463
16. Urbano, A. et al., Dynamic functional coupling of high-resolution EEG potentials related to unilateral internally triggered one-digit movements, Electroencephalogr. Clin. Neurophysiol, 106, 477–487, 1998. 17. Bollen, K.A., Structural Equations with Latent Variables, John Wiley & Sons, New York, 1989. 18. Schlosser, R. et al., Altered effective connectivity during working memory performance in schizophrenia: a study with fMRI and structural equation modeling, Neuroimage, 19, 751–763, 2003. 19. McIntosh, A.R. and Gonzalez-Lima, R, Structural equation modelling and its appli-cation to network analysis in functional brain imaging, Hum. Brain Mapping, 2, 2–22, 1994. 20. Gevins, A.S. et al., Event-related covariances during a bimanual visuomotor task: II, Preparation and feedback, Electroencephalogr. Clin. Neurophysiol, 74, 147–160, 1989. 21. Nunez, P.L., Neocortical Dynamics and Human EEG Rhythms, Oxford University Press, New York, 1995. 22. Babiloni, F. et al., Multimodal integration of high-resolution EEG and functional magnetic resonance imaging data: a simulation study, Neuroimage, 19, 1–15, 2003. 23. Babiloni, F. et al., High-resolution electro-encephalogram: source estimates of Laplaciantransformed somatosensory-evoked potentials using a realistic subject head model constructed from magnetic resonance images, Med. Biol. Eng. Comput., 38, 512–519, 2000. 24. Clifford Carter, G., Coherence and time delay estimation., Proc. IEEE, 75, 236–255, 1987. 25. Gevins, A.S. et al., Event-related covariances during a bimanual visuomotor task: II, Preparation and feedback, Electroencephalogr. Clin. Neurophysiol, 74, 147–160, 1989. 26. Urbano, A., et al., Responses of human primary sensorimotor and supplementary motor areas to internally triggered unilateral and simultaneous bilateral one-digit movements. A highresolution EEG study. Eur J Neurosci., 1998 Feb; 10(2): 765–770. 27. Inouye, T. et al., Inter-site EEG relationships before widespread epileptiform discharges, Int. J. Neurosci, 82, 143–153, 1995. 28. Stam, C.J., van Dyk, B.W., Synchronization likelihood: an unbiased measure of generalized synchronization in multivariate data sets, Physica D, 163,236–251, 2002. 29. Stam, C.J. et al., Nonlinear synchronization in EEG and whole-head MEG recordings of healthy subjects, Hum. Brain Mapping, 19, 63–78, 2003. 30. Tononi, G., Sporns, O., and Edelman, G.M., A measure for brain complexity: relating functional segregation and integration in the nervous system, Proc. Nat. Acad. Sci. USA, 91, 5033–5037, 1994. 31. Quian Quiroga, R. et al., Performance of different synchronization measures in real data: a case study on electroencephalographic signals, Phys. Rev. E, 65, 041903–1/14, 2002. 32. Pfurtscheller, G. and Lopes da Silva, F.H., Event-related EEG/MEG synchronization and desynchronization: basic principles, Clin. Neurophysiol, 110, 1842–1857, 1999. 33. Bressler, S.L., Large-scale cortical networks and cognition, Brain Res. Brain Res. Rev., 20, 288–304, 1995. 34. Gross, J. et al., Dynamic imaging of coherent sources: studying neural interactions in the human brain, Proc. Nat. Acad. Sci. USA, 98, 694–699, 2001. 35. Gross, J. et al., Properties of MEG tomographic maps obtained with spatial filtering, Neuroimage, 19, 1329–1336, 2003. 36. Kaminski, M.J. and Blinowska, K.J., A new method of the description of the information flow in the brain structures, Biol Cybern., 65, 203–210, 1991. 37. Kaminski, M. et al., Evaluating causal relations in neural systems: granger causality, directed transfer function and statistical assessment of significance, Biol. Cybern., 85, 145–157, 2001. 38. Granger, C.W.J., Investigating causal relations by econometric models and crossspectral methods, Econometrica, 37, 424–438, 1969.
Medical image analysis method
464
39. Grave de Peralta Menendez, R. and Gonzalez Andino, S.L., Distributed source models: standard solutions and new developments, in Analysis of Neurophysiological Brain Functioning, Uhl, C., Ed., Springer-Verlag, Heidelberg, 1999. 40. Pascual-Marqui, R.D., Reply to comments by Hamalainen, Ilmoniemi and Nunez, ISBET Newsl., N. 6, Ed. W. Skrandies, 16–28, December 1995. 41. Babiloni, E, et al., Spatial enhancement of EEG data by surface Laplacian estimation: the use of magnetic resonance imaging-based head models. Clin Neurophysiol, 2001 May; 112(5): 724– 727. 42. Bonmassar, G. et al., Spatiotemporal brain imaging of visual-evoked activity using interleaved EEG and fMRI recordings, Neuroimage, 2001 Jun; 13(6 Pt 1): 1035–1043. 43. Malonek, D. and Grinvald, A., Interactions between electrical activity and cortical microcirculation revealed by imaging spectroscopy: implications for functional brain mapping, Science, 272, 551–554, 1996. 44. Kim, D.S., Duong, T.Q., and Kim, S.G., High-resolution mapping of iso-orientation columns by fMRI, Nat. Neurosci., 3, 164–169, 2000. 45. Rosen, B.R., Buckner, R.L., and Dale, A.M., Event-related functional MRI: past, present, and future, Proc. Nat. Acad. Sci. USA, 95, 773–780, 1998. 46. Jöreskog, K. and Sörbom, D., LISREL 8.53, software, December 2002, Scientific Software International, Inc., Lincolnwood, IL. Available online at http://www.ssicentral.%20com/. 47. Stam, C.J. et al., Dynamics of the human alpha rhythm: evidence for non-linearity? Clin. Neurophysiol., 110, 1801–1813, 1999. 48. Kay, S.M., Modern Spectral Estimation: Theory and Application, Prentice Hall, Englewood Cliffs, NJ, 1988. 49. Marple, S.L., Digital Spectral Analysis with Applications, Prentice Hall, Englewood Cliffs, NJ, 1987. 50. Schlogl, A., Comparison of Multivariate Autoregressive Estimators, Available online at http://www.dpmi.tugraz.ac.at/~schloegl/publications/TR_MVARcomp201.pdf, 2003. 51. Jancke, L. et al., Cortical activations during paced finger-tapping applying visual and auditory pacing stimuli, Brain Res. Cognit. Brain Res., 10, 51–66, 2000. 52. Gevins, A.S. et al., Event-related covariances during a bimanual visuomotor task: II, Preparation and feedback, Electroencephalogr. Clin. Neurophysiol., 74, 147–160, 1989. 53. Gerloff, C. et al., Functional coupling and regional activation of human cortical motor areas during simple, internally paced and externally paced finger movements, Brain, 121, 1513–1531, 1998. 54. Moretti, D.V. et al., Computerized processing of EEG-EOG-EMG artifacts for multicentric studies in EEG oscillations and event-related potentials, Int. J. Psychophysiol., 47, 199–216, 2003. 55. Theiler, J. et al., Testing for nonlinearity in time series: the method of surrogate data, Physica D, 58, 77–94, 1992. 56. Hansen, P.C., Numerical tools for the analysis and solution of Fredholm integral equations of the first kind, Inverse Problems, 8, 8849–8872, 1992. 57. Hansen, P.C., Analysis of discrete ill-posed problems by means of the L-curve, SIAM Rev., 34, 561–580, 1992. 58. Kim, S.G. et al., Functional magnetic resonance imaging of motor cortex: hemispheric asymmetry and handedness, Science, 261, 615–617, 1993. 59. Babiloni, F., et al., Cortical source estimate of combined high resolution EEG and fMRI data related to voluntary movements, Methods Inf. Med., 2002; 41(5): 443–450 60. Liu, A.K., Belliveau, J.W., and Dale, A.M., Spatiotemporal imaging of human brain activity using functional MRI constrained magnetoencephalography data: Monte Carlo simulations, Proc. Nat. Acad. Sci. USA, 95, 8945–8950, 1998. 61. Dale, A.M. et al., Dynamic statistical parametric mapping: combining fMRI and MEG for highresolution imaging of cortical activity, Neuron, 26, 55–67, 2000.
Estimation of human cortical connectivity
465
62. Liu, A.K., Spatiotemporal brain imaging, Ph.D. Dissertation, Massachusetts Institute of Technology, Cambridge, MA, 2000. 63. Gerloff, C. et al., Functional coupling and regional activation of human cortical motor areas during simple, internally paced and externally paced finger movements, Brain, 121, 1513–1531, 1998. 64. Jancke, L. et al., Cortical activations during paced finger-tapping applying visual and auditory pacing stimuli, Brain Res. Cognit. Brain Res., 10, 51–66, 2000. 65. Babiloni, C., Babiloni, F, Carducci, F, Cincotti, F, Cocozza, G., Del Percio, C., Moretti, D.V., Rossini, P.M., Human cortical electroencephalography (EEG) rhythms during the observation of simple aimless movements: a high-resolution EEG study, Neuroimage, 2002 Oct; 17(2): 559– 572 66. Babiloni, F. et al., Multimodal integration of EEG and MEG data: a simulation study with variable signal-to-noise ratio and number of sensors, Hum. Brain Mapping, 22, 52–62, 2004. 67. Liu, A.K., Dale, A.M., and Belliveau, J.W., Monte Carlo simulation studies of EEG and MEG localization accuracy, Hum. Brain Mapping, 16, 47–62, 2002. 68. Regan, D., Evoked Potentials and Evoked Magnetic Fields in Science and Medicine, Elsevier Press, New York, 1989. 69. Horwitz, B., The elusive concept of brain connectivity, Neuroimage, 19, 466–470, 2003. 70. David, O., Cosmelli, D., and Friston, K.J., Evaluation of different measures of functional connectivity using a neural mass model, Neuroimage, 21, 659–673, 2004. 71. Roulston, M.S., Estimating the errors on measured entropy and mutual information, Physica D, 125, 285–294, 1999. 72. Pijn, J.P., Velis, D.N., and Lopes da Silva, F.H., Measurement of interhemispheric time differences in generalised spike-and-wave, Electroencephalogr. Clin. Neurophysiol, 83, 169– 171, 1992. 73. Stam, C.J. et al., Nonlinear synchronization in EEG and whole-head MEG recordings of healthy subjects, Hum. Brain Mapping, 19, 63–78, 2003. 74. Culham, J.C. and Kanwisher, N.G., Neuroimaging of cognitive functions in human parietal cortex, Curr. Opin. Neurobiol, 11, 157–163, 2001. 75. Gerloff, C. et al., Functional coupling and regional activation of human cortical motor areas during simple, internally paced and externally paced finger movements, Brain, 121, 1513–1531, 1998. 76. Sekihara, K. and Scholz, B., Generalized Wiener estimation of three-dimensional current distribution from biomagnetic measurements, IEEE Trans. Biomed. Eng., 43, 281–291, 1996. 77. Classen, J. et al., Integrative visuomotor behavior is associated with interregionally coherent oscillation in the human brain, J. Neurophysiol, 3, 567–573, 1998. 78. Rothwell, J.C. et al., Stimulation of the human motor cortex through the scalp, Exp. Physiol, 76, 159–200, 1991. 79. Rizzolatti, G., Luppino, G., and Matelli, M., The organization of the cortical motor system: new concepts, Electroencephalogr. Clin. Neurophysiol, 106, 283–296, 1998. 80. Nunez, P.L. and Silberstein, R.B., On the relationship of synaptic activity to macroscopic measurements: does co-registration of EEG with fMRI make sense? Brain Topogr., 13, 79–96, 2000. 81. Shoham, D. et al., Imaging cortical dynamics at high spatial and temporal resolution with novel blue voltage-sensitive dyes, Neuron, 24, 791–802, 1999. 82. Arieli, A. et al., Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses, Science, 273, 1868–1871, 1996. 83. Logothetis, N.K. et al., Neurophysiological investigation of the basis of the fMRI signal, Nature, 412, 150–157, 2001.
12 Evaluation Strategies for Medical-Image Analysis and Processing Methodologies Maria Kallergi 12.1 INTRODUCTION Image-processing and pattern-recognition methodologies have found a variety of applications in medical imaging and diagnostic radiology. Medical-image processing has been an area of intensive research in the last two decades with remarkable results. A variety of classical methodologies from the signal-processing and pattern-recognition domains and new ones have been implemented and tested for diverse applications. Based on the output, the various approaches can be categorized in one of the three groups shown in the block diagram in Figure 12.1. These groups involve one of the following processes: Image analysis can be defined as the process where the input to an operator is an image and the output is a measurement. This group includes such processes as automated detection and diagnosis of disease, organ area and volume segmentation, size measurements, and risk estimates [1– 6]. Image processing can be defined as the process where the input to an operator is an image and the output is another image with similar contents to the original but different in appearance. This group includes such processes as image enhancement, restoration, compression, registration, and reconstruction [7–10]. Image understanding can be defined as the process where the input to an operator is an image and the output is a different level of description, such as transforms and pixel mappings [11]. Depending on the goal of the application, the operator in Figure 12.1 could be a signal processing algorithm, a pattern-recognition algorithm, a contrast-enhancement or noisereduction function, a transformation, a mathematical measurement, or combinations of these. The most extensive and successful development so far has
Evaluation strategies for medical-image analysis
467
FIGURE 12.1 Block diagram of the various medical-image processes. Depending on the operator type, the output may be an image, a measurement, or a transformation. occurred in the fields of computer-aided detection (CAD detection) and computer-aided diagnosis (CAD diagnosis), i.e., in the image-analysis field, with image enhancement following closely behind. CAD detection is now a clinical reality for breast and lung cancer imaging. Several commercial systems are now available for breast cancer imaging and screen/film (SFM) or full-field direct digital mammography (FFDM) [2]. Similar systems are currently in beta testing stages for lung cancer imaging using computed radiography, standard chest radiography, or computed tomography (CT). CAD detection usually refers to the process where areas suspicious for disease are automatically detected on medical images, and their locations are pointed out to the observer for further review [1, 2]. In addition to pointing out the location of a potential abnormality, CAD detection algorithms may include a segmentation step, namely a process where the margins of the detected lesion, such as lung nodules in lung cancer images, calcifications or masses in mammograms, are outlined, and the outline may be presented to the reader as opposed to merely a pointer of the lesion’s location [12]. CAD diagnosis differs from CAD detection in that the detected lesions (either by the observer or by the computer) are differentiated (classified) in groups of disease and nondisease lesions [13, 14]. In this chapter, following historical precedence, the plain CAD term will be used to refer to both technologies, i.e., both detection and diagnosis algorithms, but we will differentiate by adding a detection or diagnosis extension to the term where a specific and unique reference is required. As new medical-image analysis and processing tools become available and new versions of existing algorithms appear in the market, the validation of the new and updated methodologies remains a critical issue with ever-increasing complications and needs. The general goal of validation is twofold: (a) ensure the best possible performance (efficacy) of each step of the process outlined in Figure 12.1 that would yield optimum output results and (b) determine the real-world impact of the entire process
Medical image analysis method
468
(effectiveness) [15]. The first goal is usually achieved in the laboratory with retrospective patient data of proven pathology and disease status and various statistical analysis tools that do not involve human observers or experts. The second goal usually requires the execution of clinical trials that involve experts and usually prospective patient data. Clinical studies are, in most medical applications, inevitable and are the gold standard in medical technology validation. However, the laboratory or nonobserver studies that precede them are critical in establishing the optimum technique that will be tested by the observers so that no funds, time, or effort are wasted [15, 16]. Furthermore, laboratory tests are sufficient when validating updated versions of algorithms once the original versions have demonstrated their clinical significance. This chapter will not elaborate on the aspects of clinical trials or theoretical validation issues. Rather, it focuses on the major and practical aspects of the preclinical and clinical evaluation of diagnostic medical-image analysis and processing methodologies and computer algorithms. We will further narrow down our discussion to selected tests and performance measures that are currently recognized as the standard in the evaluation of computer algorithms that are designed to assist physicians in the interpretation of medical images. We will discuss observer vs. nonobserver tests and ROC vs. non-ROC tests and related interpretation and analysis aspects. Our goal is to provide a basic and practical guide of the methods commonly used in the validation of computer methodologies for medical imaging in an effort to improve the evaluation of these techniques, advance development, and facilitate communication within the scientific community. Section 12.2 provides a brief overview of the current validation models and designs of clinical trials. Section 12.3 introduces the standard performance measurements and tests applicable in medical imaging. Section 12.4 summarizes the most important nonobserver validation methodologies that usually precede the observerbased validation techniques described in Section 12.5. Section 12.6 discusses practical issues in the implementation of the various validation strategies. Conclusions and new directions in validation are summarized in Section 12.7. 12.2 VALIDATION MODELS AND CLINICAL STUDY DESIGNS Entire industry conferences are dedicated to issues of validation and clinical study design, including the annual meetings of the Medical Image Perception Society (MIPS) and the Medical Imaging Symposium of the Society of the Photo-optical Instrumentation Engineers (SPIE). At least two workshops have also been organized in the U.S. since 1998 on clinical trial issues for radiology, sponsored by the U.S. Public Health Service’s Office on Women’s Health, the National Cancer Institute, and the American College of Radiology. One workshop, entitled Methodological Issues in Diagnostic Clinical Trials: Health Services and Outcome Research in Radiology, was held on March 15,1998, in Washington, DC, and participating papers were published in a dedicated supplement issue of Academic Radiology [17]. A second workshop, entitled Joint Working Group on Methodological Issues in Clinical Trials in Radiological Screening and Related Computer Modeling, was held on January 25, 1999, and yielded recommendations on various aspects of clinical trials, a summary of which can be found at http://www3.cancer.gov/bip/method_issues.pdf.
Evaluation strategies for medical-image analysis
469
Validation models usually start with tests of the diagnostic performance of the imaging modality or computer methodology, followed by measurements of the clinical impact or efficacy of the diagnostic test on patient management and follow-up, and ending with broader clinical studies on patient health effects (morbidity and mortality) and societal impact, including cost analysis. Clinical study types are differentiated usually by the nature of the patient data used and can be categorized as: (a) observational vs. experimental, (b) cohort vs. case control, and (c) prospective vs. retrospective. There is an extensive, in-depth bibliography on the various aspects of clinical studies, the various types, and their advantages and disadvantages [18–20]. An excellent glossary summary of the various terms encountered in clinical epidemiology and evidence-based medicine is given by Gay [21]. Fryback and Thornbury proposed a six-tiered hierarchical model of efficacy that is now embraced by the medical-imaging community involved in outcomes research and technology assessment [15, 17, 22]. Different measures of analyses are applied at the various levels of the model. Level 1 is called “technical efficacy” and corresponds to the “preclinical evaluation” stage. In this level, the technical parameters of a new system are defined and measured, including resolution and image noise measurements, pixel distribution characteristics, probability density functions, and error and standard deviation estimates [15, 22]. Clinical efficacy is measured in the next three levels of the model, with tests to determine the “diagnostic accuracy efficacy” (Level 2), the “diagnostic thinking efficacy” (Level 3), and the “therapeutic efficacy” (level 4) [15, 22]. Levels 2 and 3 correspond to what imaging scientists often term “clinical evaluation” and include measurements of performance parameters and observer experiments that are the focus of this chapter and will be further discussed in the following subsections. Level 4 is more specific to therapy-related systems and is not within the scope of this discussion, which deals with diagnostic systems. Level 5 deals with “patient outcome efficacy” and Level 6 with “societal efficacy” [15], both beyond the scope of this review. This sixtiered model provides an excellent guide for pharmaceutical and therapy trials. Its extension to screening and diagnostic medical-imaging technologies is less straightforward due to the unique characteristics of the target population, the diversity of the applications, the observer variability, and the issues of low-prevalence for several disease types including cancer. In some cases the model appears to be noninclusive; in other cases it is not entirely applicable or is not linearly applicable. Hendee [23] suggested the expansion of the model to include a factor related to the development stage or phase of evolution of the validated technology. This may lead to a model more applicable to imaging. Another approach recommended for medical-imaging technology validation was developed by Phelps and Mushlin [23, 24]. This approach is recommended as a way to define “challenge regions” and as a preliminary step guiding the design of the more expensive and time-consuming clinical trials to test the efficacy of the technology as proposed by Fryback and Thornbury [15]. The Phelps and Mushlin model, however, seems to be limited in scope and applicability, and an expansion is necessary to accommodate a broader spectrum of imaging technologies [23]. Different clinical study designs may be applicable to levels 2 and 3 of the Fryback and Thornbury model. The most commonly used design is the observational, casecontrol, retrospective study that could use a variety of performance measures. The current
Medical image analysis method
470
standard for these studies in medical imaging is the receiver operating characteristic (ROC) experiment with the corresponding measure being the ROC curve [25, 26]. ROC experiments are time consuming and expensive. Hence, nonROC approaches are explored and applied either as less-expensive precursors or as replacements to the more extensive and expensive ROC studies. Non-ROC studies may or may not involve observers. The selection of one method over the other depends on the application and the question to be answered. There is a vast literature on CAD development. Numerous algorithms have been reported, and the majority of reports include some type of validation that depends on the investigators’ knowledge of the field but mostly on available medical and statistical resources at the time. The lack of an agreement on “appropriate” methodologies leads to a lack of standard criteria and a “how-to” guide that could significantly improve scientific communications and comparisons. Only recently do we find publications that present broader methodological issues of validation and offer some guidelines. Nishikawa [27] discusses the differences in the validation of CAD detection and CAD diagnosis methodologies and offers a good summary of the ways ROC and free-response ROC (FROC), computer- or observer-based, can
TABLE 12.1 Clinical Performance Indices Signal or Disease Present
Signal or Disease Absent
Observer or computer response positive
Hit (TP)
False alarm (FP)
Observer or computer response negative
Miss (FN)
Correct rejection (TN)
Source: Beytas, E.M., Debatin, J.F., and Blinder, R.A., Invest. Radiol., 27, 374, 1992. (With permission.)
be used in algorithm validation. Houn et al. [28] and Wagner et al. [29] discuss issues of ROC study design and analysis in the evaluation of breast cancer imaging technologies particular to the U.S. Food & Drug Administration (FDA) concerns but also applicable to the broader scientific community. King et al. [30] present alternative validation approaches through observer-based non-ROC studies. This chapter follows the spirit of these latest efforts. It attempts to provide a short, practical guide through the maze of problems and methodologies associated with the validation of medical-image analysis and processing methodologies in the form of a summary of the most critical elements of validation and the most “popular” and “recognized” methodologies in the field. The prerequisite for this chapter is that the reader be familiar with the basic theoretical concepts of ROC analysis that plays a major role in medical-image validation studies. There is a vast literature in the field, and there are several Web sites with free ROC software and lists of related articles that the novice reader could use to become familiar with the topic [31, 32].
Evaluation strategies for medical-image analysis
471
12.3 CLINICAL PERFORMANCE INDICES The clinical performance of a medical test, including imaging, is usually determined by estimating indices for the true positive (TP), true negative (TN), false positive (FP), false negative (FN), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), and accuracy. In medical imaging, the response to the question, “Is there a signal in the image or not?” or “Is there disease present in the image or not?” is given by a human observer or by a computer. The answer to these questions is often depicted in the terms presented in Table 12.1, borrowed from signal-detection theory [33]. A TP is a case that is both test positive and disease positive. Test here represents the outcome of the observer or the computer process. A TN is a case that is both test negative and disease negative. Test here represents the outcome of the observer or the computer process. A FP is a case that is test positive but disease negative. Such case misclassification is undesirable because it has a major impact on health-care costs and healthcare delivery. These cases are equivalent to a statistical Type I error (α). A FN is a case that is test negative but disease positive. Such case misclassification is undesirable because it leads to improper patient follow-up and missed cases with disease. These cases are equivalent to a statistical Type II error (β). Sensitivity is the probability of a positive response for the cases with presence of signal or disease, and it is defined as
Specificity is the probability of a negative response for the cases with absence of signal or disease, and it is defined as
Positive and negative predictive values of radiological tests are then defined as
PPV and NPV depend on sensitivity and specificity but are also directly related to prevalence, namely the proportion of cases in the test population with signal or disease that is defined as
The higher the prevalence, the higher the predictive value is. Accuracy depends linearly on prevalence and it is defined as ACCURACY=PR×(SENS—SPEC)+SPEC
Medical image analysis method
472
Accuracy is equal to specificity at 0% prevalence and is equal to sensitivity at 100% prevalence. Note that for oncology applications, one needs to be a little more explicit on what can be considered a positive response because a positive interpretation may be an interpretation that leads to the recommendation for biopsy or an interpretation where a suspicious finding is identified and further work-up is requested before biopsy is recommended. These two definitions lead to different estimates of the sensitivity, specificity, and predictive values and need to be carefully reviewed prior to the design of a validation experiment in this field. A condition that is often considered in medical studies and causes some confusion in their design is incidence, and this is worthy of a brief discussion here. Incidence is the proportion of new cases in the test population with the signal or disease of interest. The incidence rate is a smaller number than the prevalence rate because the latter includes old and new cases having the disease within a certain period of time (usually one year). The use of incidence or prevalence rate to configure a study population depends on the study aims, the imaging modality, and the tested parameters. In CAD validation experiments, the incidence-vs.-prevalence dilemma may be bypassed altogether by focusing on sensitivity and specificity estimates and avoiding PPV and accuracy measurements that depend on prevalence. Validation of medical-image-processing schemes aims at relative or absolute estimates of one or more of the above indices of performance before and after the process is applied; sensitivity and specificity are usually the parameters most often targeted. Theoretically, one should be able to estimate these parameters accurately for any diagnostic procedure with a sufficiently large sample size. But the latter was and continues to be the biggest, and often unsurpassable, obstacle in medical-imaging research. For example, a prohibitively large sample size is required to evaluate the impact of a CAD detection algorithm on mammography’s sensitivity using standard statistical methods. Specifically, approximately 10,000 screening mammograms are needed to detect a change in sensitivity of 0.05 caused by the use of a CAD system, from 0.85 to 0.90, with a standard error of 5% assuming that breast cancer incidence is 0.5% (i.e., 5 out of 1000 screened women will have breast cancer) [16]. Similar estimates are obtained for other imaging modalities and processes. Consequently, statistical methodologies such as the ROC type of tests are highly desirable because they require significantly fewer resources than classical statistical approaches, and their results can be used to determine the above performance indices. ROC curves, for example, combine (SENS) and (1SPEC) data in the same plot for different test cutoff values. Hence, the curves can be used to establish the best cutoff for a test with variable parameters. The optimum cutoff depends on the relative costs of FP and FN cases. Accuracy could also be determined by a single point on an ROC curve. However, accuracy is a composite index (depends on prevalence) and could generate confusion, as mentioned earlier. So it is better to be avoided and replaced by sensitivity and specificity indices instead, which are prevalence independent. In addition to the sample size, the availability of expert observers to participate in a study is often another major obstacle in the validation process. Hence, there is a need for nonobserver validation strategies that could still measure performance indices without the
Evaluation strategies for medical-image analysis
473
experts and without large sample sizes. Computer ROC and FROC are two such methods that will be discussed in more detail in the following sections. 12.4 NONOBSERVER EVALUATION METHODOLOGIES Nonobserver evaluation methodologies are primarily used for the optimization and validation of a computer algorithm before testing its clinical efficacy. They are the first step toward final development and provide valuable information to the researcher on the direction of the work and the likelihood of its success. These approaches are usually lowcost, easy, and fast to implement. They may not yield the higher power of the observerbased studies, but they provide sufficient information to optimize the methodology and ensure that the best technique will be tested clinically. The list of techniques presented in this section is by no means comprehensive. It includes, however, the most commonly used nonobserver methodologies and those that are accepted for the validation of medical-image analysis and processing schemes. It should be noted that measurements of the physical image quality parameters, as in the case of image display or restoration techniques [34], and mathematical error analysis, as in the case of compression techniques [8], might also be considered as nonobserver validation techniques. However, these measurements usually precede the nonobserver experiments described in this section. Physical and mathematical error analysis is specific to the algorithm and application, and these will not be discussed in this chapter, the only exception being error-analysis issues pertaining to the validation of image-segmentation techniques. Image segmentation holds a major role in medical-image analysis and processing and poses unique challenges in validation. In this chapter, we will give an overview of these challenges and of the options and metrics available and commonly used for segmentation validation. 12.4.1 COMPUTER ROC TEST Computer ROC analysis is an adaptation of the standard observer ROC analysis that will be discussed in more detail in the following section [26, 35]. In this form, ROC principles are implemented for the laboratory testing of pattern-recognition and classification algorithms [27]. Classification schemes usually differentiate between two conditions such as benign and malignant lesions, diseased and nondiseased cases, and disease type 1 and disease type 2 cases. Pairs of sensitivity and specificity indices can thus be generated by adjusting an algorithm’s parameters and setting conventions on how the numbers of correctly and incorrectly classified cases are to be determined. The results are plotted as a true positive fraction (TPF) vs. false positive fraction (FPF) curve using standard ROC analysis software [32]. Figure 12.2 shows typical computer ROC curves obtained from the preclinical, computer ROC evaluation of four CAD diagnosis systems that differentiate between benign and malignant mammographic microcalcification clusters [13, 36]. The global, regional, and local metrics of the standard observer ROC analysis can also be used to quantify absolute and relative performance in computer ROC experiments. These metrics include:
Medical image analysis method
474
The area under the curve (global performance index), which ranges from 0.5 to 1, where 0.5 corresponds to random responses (guessing) and 1 to the ideal observer [26, 27]. The curves of Figure 12.2 have all areas greater than 0.9. The partial area under the curve (regional performance index), which is estimated at selected sensitivity or specificity thresholds, e.g., 0.9 TPF or 0.1 FPF and provides more meaningful results in clinical applications where high sensitivity is desirable and needs to be maintained [37]. The partial sections of the curves in Figure 12.2 at a 0.9 TPF threshold are shown in Figure 12.3. There is no publicly available software today for estimating the area under these curves. However, a polygon method [25] or the method described by Jiang et al. [37] can be implemented for this purpose.
FIGURE 12.2 Computer ROC curves obtained from the laboratory evaluation of four CAD diagnosis schemes designed to differentiate between benign and malignant
Evaluation strategies for medical-image analysis
475
microcalcification clusters in digitized screen/film mammography.
FIGURE 12.3 Partial curves used to estimate the partial area indices of the computer ROC data shown in Figure 12.2. Operating points (local performance indices), i.e., selected (TPF, FPF) pairs that provide insight on the potential clinical impact and benefits of the method. 12.4.2 COMPUTER FROC TEST Computer FROC is the laboratory adaptation of the observer FROC analysis, which will also be discussed in more detail below. Computer FROC is the method of choice for the laboratory or preclinical evaluation of CAD detection algorithms. These algorithms are usually adjusted to provide a TP rate (number of true signals correctly detected by the algorithm) and a corresponding average number of FP detections per image (total number of FP detections divided by the number of tested images) [38]. The plot of TP rate vs. the average FP signals per image gives an FROC curve. FROC curves differ from the ROC curves in the variable plotted on the x-axis of the graph, because in this case, we consider the algorithm’s detection performance on an image-by-image basis. The analysis of the computer FROC data is a relatively easy and straightforward process. One critical element in the process is the conventions followed for the estimation of the numbers of true and false detections, because they significantly alter the results [38].
Medical image analysis method
476
FIGURE 12.4 Computer FROC plots generated to compare the performance of two generations of a CAD detection algorithm for breast masses in screen/film mammography. (Data provided by Dr. Lihua Li of the Department of Radiology, University of South Florida, internal report, 2003.) Figure 12.4 shows typical computer FROC curves generated to compare the performance of two generations of a CAD detection algorithm that performs mass detection in screen/film mammography [39]. The TP rate is plotted vs. the average number of FP signals per image. The plots allow the direct comparison of the two algorithm versions. The better the performance, the higher the curve is and the closer it is to the upper left corner of the graph, where the ideal performance would be plotted, i.e., one for which sensitivity or TP rate is 100% with no FP signals per image. Both computer ROC and FROC tests require a statistical analysis step at the end to determine the significance of differences between the ROC or FROC curves. Common statistical significance tests are the paired or unpaired student’s t-test when only reader variation is considered, nonparametric tests when only case-sample variation is considered, and semiparametric multivariate tests when both sources of variation are considered [40].
Evaluation strategies for medical-image analysis
477
12.4.3 SEGMENTATION VALIDATION TESTS Segmentation poses its own special requirements on validation. Image segmentation is the process during which an object of interest (organ, anatomical features, tumor) is extracted from an image and an outline of its area or volume margins is generated [41]. Segmentation algorithms are usually evaluated with analytical and empirical methods. The analytical methods examine algorithm design issues and parameters. The empirical methods evaluate the quality of the segmentation output [42–44]. Udupa et al. [44] distinguish three groups of performance metrics according to whether they are used to evaluate the precision, the accuracy, or the efficiency of the segmentation process. Generally, the size, shape, area, or volume of the object are parameters commonly used to evaluate the segmented outcome. The same parameters also have clinical value because they help in diagnosis, therapy decisions, or assessment of treatment response. Here, we will only discuss major issues raised by the empirical methods, as they are the most relevant to the clinical application. The main requirement to validate a segmentation output is to know the “ground truth,” namely the true size, shape, or other spatial features of the object of interest. Such ground truth is an elusive concept in medical imaging because there is no clear and absolute way to define it. The only, and often the best, option is to have human expert observers define ground truth by generating manual outlines of the organs, areas, or objects of interest such as tumors. This process is not only costly and time consuming, but often biased, incomplete, and inconsistent with significant inter- and intra-observer variability [45]. Researchers have proposed various remedies to increase the accuracy and reduce variability of the experts’ ground truth, increase the speed of the process, and reduce its cost. Using multiple observers repeatedly, in combinations or independently, is proposed as a way to improve accuracy and reduce variability [44]. Using trained technicians unsupervised or supervised by experts has been proposed as a way to reduce the cost of the experts and speed up the process [45]. Similarly, semiautomated approaches have been proposed where the expert defines only a few points on the contour of interest, and the algorithm extrapolates to the full outline under the supervision (or not) of the expert. The equivalency or superiority of any of these approaches relative to the single “human expert” has not been demonstrated yet. An alternative to using clinical ground truth by experts for validation is to use simulation or phantom studies [46], relative performance measures, or indirect measures such as the evaluation of segmentation by its impact on the final outcome that has clinical significance, such as clinical diagnosis or patient-management decision [47]. Each one of the alternative approaches has its limitations, and none is generally applicable. Investigators need to generate simulation or phantom data independently if a specific application is to be tested, or use publicly available data sets from certain imaging modalities if only a segmentation methodology is to be tested. For the latter, the images and data generated by the Visible Human Project offer a high-quality, standardized data set [48]. The free set of manual segmentations of CT images of a male human is another good reference resource [49]. Once a ground-truth file is available, a variety of metrics can be used to validate the segmentation results [43]. Researchers are supposed to select those that are particularly suited for the specific application and the way the ground truth was generated [50–52].
Medical image analysis method
478
Preferred measures that are relatively easy to compute and not limited to specific shape patterns include [51]: 1. The Hausdorff distance h(A,B) between two contours of the same object (tumor), one generated by an expert (A) and one generated by the computer (B) Let A={a1, a2, , am} and B={b1, b2, ,bm} be the set of points on the two contours (each point representing a pair of x and y coordinates); then the distance of a point ai to the closest point on curve B is defined as
Similarly the distance of a point bj to the closest point on curve A is defined as
The Hausdorff distance h(A,B) is defined as the maximum of the above distances between the two contours, i.e.,
2. The degree of overlap OL between the areas G and E encompassed by contours A and B The overlap is defined as the ratio of the intersection and the union of the two areas, i.e., the ground-truth area G and the experimental computer-generated area E
The ratio is 1 if there is perfect agreement and 0 if there is complete disagreement. 3. The mean absolute contour distance (MACD) MACD is a measure of the difference between the two contours. To estimate MACD, a one-to-one correspondence between the points of the two curves is required. Once this correspondence is established, the distances between corresponding points are estimated; their average corresponds to the MACD. In addition to the absolute differences entering the MACD calculation, the signed distances between the curves can also be computed and used to determine the bias of an algorithm or any regional effects on the segmentation process, i.e., pancreatic areas closer to the liver may be less accurately segmented than areas away from large organs [51]. The first two metrics are sensitive to the size and shape of the segmented objects and also depend on the image spatial resolution. The third metric is independent of object size and image resolution, and it is preferred when an application uses images from different sources that have different resolution characteristics. We have tested all three measures
Evaluation strategies for medical-image analysis
479
for brain tumor segmentations in MRI scans, segmentations of the pancreas and pancreatic tumors in CT scans, and bone segmentation in computed radiographs. They seem to offer a reliable, nonobserver validation approach to all cases where human expert outlines are available as ground truth. A statistical analysis of the agreement between the measured parameters from different segmentation algorithms or the agreement between computer and observer performances is the last segment of the validation process. Computer and expert data are compared with a variety of statistical tools that are generally applicable and not unique to segmentation. The most frequently reported ones include: (a) linear regression analysis to study the relationship of the means in the various segmentation sets [40, 53], (b) paired ttest to determine agreement between the computer method(s) and the experts [53, 54], (c) Williams index to measure interobserver or interalgorithm variability in the generation of manual outlines [51], and (d) receiver operating characteristic (ROC) analysis and related methods to obtain sensitivity and specificity indices by estimating the true-positive and false-positive fractions detected by the algorithm or the observer [26]. In the place of or in addition to these types of analysis, one could apply the method proposed by Bland and Altman [55], assuming that the comparison of segmentation data sets is analogous to the problem of “assessing agreement between two methods of clinical measurement.” In their famous 1986 paper, Bland and Altman [55] showed that the correlation coefficient and regression analysis are not appropriate techniques for the comparison of measurement methods when “true” values are unknown. Their “95% limits of agreement” method offers an alternative and elegant, if not more accurate, approach to what is usually followed in the medical-image segmentation literature. We should finally mention the major, publicly available software tools that, although not comprehensive, provide a valuable resource to the researcher. First, the VALMET tool allows the estimation of several segmentation metrics, including those listed above, in two- and three-dimensional data sets [50]. The Insight Segmentation and Registration Toolkit (ITK) is another free tool that can be used for medical-image registration and segmentation and statistical analysis. ITK is an open-source software with widespread use [56]. ITK development is an effort initiated and funded by the National Library of Medicine (NLM) to support its Visible Human project [48]. ITK includes several segmentation and registration methodologies and statistical measures, and it has been implemented for a variety of medical-imaging applications. A third software tool, the 3DVIEWNIX, is developed by Udupa [44] at the University of Pennsylvania and is available for a fee at http://mipgsun.mipg.upenn.edu/~Vnews/. 12.5 OBSERVER EVALUATION METHODOLOGIES This group of methodologies can be considered as the second stage in a validation process, a stage that follows laboratory testing and is done only with the optimized and most promising computer algorithms. In this group, we find the traditional applications of the ROC family of tests as well as other observer-based methods that are often faster to execute and of lower cost than the ROC approaches.
Medical image analysis method
480
12.5.1 ROC TEST ROC was introduced to medical imaging more than a quarter century ago and is the current standard in the evaluation of new medical-imaging systems and computer algorithms, including CAD [31, 57–60]. ROC is based on principles of signal-detection theory, and its name originates from the initial use of the methodology (identification of radar signals in the military) and actually has no real representation in its current use in medical imaging. In ROC experiments, a signal of interest may or may not be present in an image, and the observer uses a rating scale to express his/her confidence regarding the presence or absence of the signal. ROC measures the performance of the entire imaging system, including observer, environment, and imaging conditions. The outcome of an ROC experiment may be considered sufficient, under certain conditions, to prove efficiency or show equivalency of one diagnostic modality over another and, consequently, the need for conventional prospective clinical trials may be eliminated. The ROC measurements generate plots of the TP response fraction (TPF) or hit rate as a function of the observer’s decision criterion or decision threshold or operating point, which also causes the FP fraction (FPF) or false-alarm rate to change. A typical observer ROC curve is shown in Figure 12.5. A strict decision threshold would correspond to the lower part of the curve, and a relaxed decision criterion would correspond to the upper part of the curve, where higher sensitivity (TPF) but also higher FPF would be observed. An ideal observer corresponds to a curve that has 100% TPF and 0 FPF. A chance decision corresponds to an area under the curve (Az) of 0.5 (curve becomes a diagonal line) [26]. ROC is used to make comparisons of observer performance between two different observation conditions or parameters, e.g., two different imaging systems (screen/film vs. digital radiography, MRI vs. CT, or computed vs. digital radiography), two different image formats (original vs. enhanced, original unaided vs. original with CAD, original vs. compressed), or two different tasks (detection of cancer vs. detection of artifacts, detection of cancer vs. diagnosis of cancer). Here, we focus on major issues related to the design of ROC experiments for the validation of medical-imaging technologies and make practical recommendations for the least “painful” but most meaningful implementation of these tests. The path to a successful ROC study and meaningful outcome, one that preferably shows statistical significance, takes us through the consideration of various design parameters prior to the initiation of the study. ROC design is constrained by practicality. The goal is to design efficient and economical experiments that involve the minimum possible number of physicians, reading sessions, and cases per session [61]. Table 12.2 summarizes all factors required to set up an ROC experiment. This list was formed by the author based on an initial idea from Dr. Dorfman (University of Iowa) and provides a way to ensure that all aspects of an ROC study are addressed prior to implementation. Specifically, one should:
Evaluation strategies for medical-image analysis
481
FIGURE 12.5 Typical observer ROC curve obtained with the LABMRMC tool of the ROCKIT for a study that involved four readers, 212 four-view mammograms (106 normal, 55 cancer, and 51 benign), and two treatments (film and soft-copy mammography). 1. Define clearly the hypothesis and the treatments to be tested. Here, one should consider whether it is sufficient to show equivalency or whether superiority of one treatment over another needs to be demonstrated. For example, in testing a CAD algorithm, some aspect of superiority needs to be demonstrated, either relative to the standard of practice or other application, in order to justify its clinical use. Furthermore, several treatment pairs can be included in the evaluation. 2. Define the number of observers required to participate in the study in order to achieve statistical significance and meet power requirements. Roe and Metz [62] recommended the use of a minimum of five observers based on multireader simulations with continuous ratings. However, successful ROC tests have been conducted with fewer readers (three or
Medical image analysis method
482
TABLE 12.2 Factors Considered in the Design of an ROC Experiment for the Evaluation of Medical-Imaging Technologies, Including CAD Factor
Type of Information
Hypothesis
State hypothesis to be tested
Treatments
List number and brief description of each
Readers
List number and description including subgroups, e.g., a study may use ten readers from two groups, five from academia and five from the community
Data set
List number of cases used in the study and brief description including breakdown of various case types. For example, an oncology study may use 1000 cases, 500 of which may be negative, 250 benign biopsied cases, and 250 cancer cases. Images may be single views or multiple views.
Rating method and scale
Discrete (five-point, ten-point), continuous, pseudocontinuous, BIRADS
Reading protocol Sequential, mixed, random, reading schedule (address potential bias issues) List Analysis tools algorithm to be used for analysis of data, e.g., a study using readers reading the same cases in both treatments requires the use of MRMC analysis, but a study using two readers and one treatment requires CORROC2 analysis Performance measures
Area under the curve, sensitivity, specificity, decision thresholds, confidence intervals, p-value
Presentation setup and data collection
Processing and display hardware, software, reporting methodology, forms (hard copy vs. soft copy)
four). The reason is that the number of readers, reader performance (e.g., Az value), and number of cases (discussed next) are interdependent issues. Dorfman et al. [63] actually showed that the number of readers is less important than the number of cases or the Az value, and they avoided specific recommendations, placing more emphasis on the quality of the readers or cases. One issue encountered in the selection of readers is reader expertise. Usually ROC readers have similar educational and professional experiences, unless a large reader population is available from which to select several sufficiently large groups, e.g., five radiology fellows, five junior radiologists, and five senior radiologists, that will allow generalization of the results across reader expertise. Depending on availability and other experimental constraints, one might consider a design of matched readers and matched cases (same readers reading the same cases imaged with different modalities) or matched readers only or matched cases only [59]. 3 Define sample size and contents of the data set. The minimum sample size depends on the anticipated performance and the number of readers (see item 2). For example, assuming that five readers will participate in the study and the average expected Az will be 0.85 or less, a minimum of 50 positive and 50 negative cases are required [62]. For higher area values, larger sample sizes should be used [62, 64]. The positive and negative
Evaluation strategies for medical-image analysis
483
case definitions depend on the experiment. Positive cases are considered those that contain the “signal” of interest, such as cancer, a disease type, a fractured bone, etc. Negative cases are usually matched cases with no signal or with a signal that has different properties than that of the positive cases. For example, negative cases in a classification experiment might be cases with benign tumors. Some bias issues associated with the sample size and types are discussed in Section 12.6. 4. Select rating method and scale, e.g., quasicontinuous, discrete five-point, or discrete ten-point rating scale. Metz et al. [65] recommended the quasicontinuous rating for optimum results. Dorfman et al. [63] argued that the discrete or pseudocontinuous rating scales can be used interchangeably in image-evaluation studies when Az is the performance index of interest. Dorfman et al. favored the discrete rating when operating points as well as Az were of interest. They further suggested that, for mammography applications, one can even consider using the classes of the Breast Imaging Reporting and Data System (BIRADS) of the American College of Radiology for rating, because they represent actual clinical decision thresholds [63]. The validity of the latter for ROC experiments is disputed and remains to be shown [28, 66]. Note that traditional rating scales and ROC analysis methods require the definition of a clear and simple task for the reader, preferably a task that is limited to the detection or the classification of a signal. This is not always possible or even an accurate representation of the joint detection and classification tasks performed in clinical situations. New observer methodologies are likely to emerge soon that will be able to handle complex clinical decision processes such as those represented by the BIRADS classes. 5. Select independent or sequential reading modes [67–69]. Because the same subjects or cases are commonly used in the evaluation of imaging techniques, there has been a lot of emphasis on how to present cases to the readers to eliminate or significantly reduce reading-order effects. Until recently, the recommendations were to change the modality and case reading order as well as interleave a sufficiently large time interval (4 to 8 weeks) between images from the same patient [67]. Recent ROC studies of CAD algorithms have suggested that the sequential reading mode, i.e., reading one treatment after the other without a time interval or order randomization, may be a more sensitive probe of differences between standard and computer-assisted readings than the independent reading mode, where reading-order effects are reduced following the guidelines above [68]. Although it can be argued that sequential reading is not appropriate for all imaging applications and evaluation [69], there seems to be an agreement that it may be beneficial for CAD evaluation, hence providing a faster, more practical, and more sensitive validation process. 6. Select the appropriate ROC analysis software tool. We mentioned previously that there are several free software packages available for the analysis of ROC data [32]. The type of software to be used depends on the study design. The most popular package is the ROCKIT (current version 0.9.1B) offered by Dr. Metz at the University of Chicago [70]. This package includes
Medical image analysis method
484
several algorithms that can be used to analyze single-reader and multireader studies using the same or different cases, i.e., independent or correlated data sets, two treatments, and discrete or continuous (quasicontinuous) rating scales. Multireader, multicase studies, for example, should be analyzed with the LABMRMC algorithm [70] or the MRMC algorithm from the University of Iowa [71]. All of these algorithms perform individual reader data analysis. Additional data processing is required to generate a pooled instead of an individual ROC curve [61, 72, 73]. 7. Select metrics for the estimation of absolute or relative performance and performance differences. ROC analysis allows the estimation of various global, regional, and local performance indices, including: (a) the area under the ROC curve, Az, (b) the partial area index, (c) the TPF at a selected FPF and vice versa or sensitivity and specificity pairs, (d) statistical errors and confidence intervals for the differences between treatments, and (e) decision thresholds. ROC curve fitting is also performed for plot generation. 8. The final step in the design process is the actual implementation and experimental setup. One needs to determine the reading environment, e.g., dark room, lightbox conditions, ambient light levels, sitting conditions, length of reading sessions, film-hanging protocols, and the reporting mechanism, e.g., dictation or manually on forms or computer interface. For a better understanding of the various factors entering the design of an ROC experiment, consider the following example often found in the CAD literature of the last decade. An ROC study is designed to determine the effect of a CAD detection algorithm on the interpretation of digitized mammograms on computer monitors (soft-copy digitized mammography). The hypothesis is that soft-copy digitized mammography with CAD detection has higher breast cancer detection sensitivity than conventional film mammography. The investigators want to display the CAD detection output (segmented areas that correspond to calcifications and masses) on a computer monitor as an overlay on the corresponding digitized mammogram. So, there are three image formats and two hypotheses to be evaluated in this experiment: film mammography, soft-copy digitized mammography, and computer-aided softcopy digitized mammography. The various ROC factors and design parameters for this experiment are listed in Table 12.3. Although specific to the mammography example, a similar logic can be followed for all medicalimaging tests. ROC analysis is probably the most powerful tool we have today for the evaluation of medical-imaging technologies, including CAD. But ROC has its problems and limitations, such as: It requires good ground-truth data for the selected data set that are not always available in medical imaging or are impractical to generate. Outcome is sensitive to the data set contents, particularly the subtlety of the selected cases and how well they represent the general population.
Evaluation strategies for medical-image analysis
485
TABLE 12.3 ROC Study Design for the Evaluation of a CAD Detection Algorithm for SoftCopy Digitized Mammography Factor
Type of Information
Hypothesis
Unaided interpretation of film mammography is equivalent to unaided interpretation of soft-copy digitized mammography Computer-aided interpretation of soft-copy digitized mammography is more sensitive and less variable than standard interpretation of film mammography
Treatments
Total: 3 Treatment 1: Unaided standard film reading of mammograms Treatment 2: Unaided soft-copy digitized mammogram reading Treatment 3: Aided soft-copy mammogram reading with CAD detection overlay for calcifications and masses
Readers
Total: 6 Groups: 2 (three academic and three community radiologists)
Data set
Total: 500 four-view mammograms 250 normal with 2 years of negative follow-up 120 benign biopsied cases with calcifications and masses (50/50) 130 malignant cases with calcifications and masses (50/50) Power of the study is a function of number of readers and data set size; according to the numbers above, it is expected to detect differences in Az (δAZ) on the order of 0.04 with α=0.05 (Type I error) and β=0.2 (Type II error)
Rating method and scale
100-point scale, pseudocontinuous rating of likelihood of presence (100) or absence (0) of breast cancer
Reading protocol
Patients and readers fully crossed with treatments Readers will read each mammogram in all treatment cells Reading sessions (1 h) with 20 cases per session Random mix of cases and treatments different for each reader Random reading sequence Time interval (8 weeks) between same cases in different treatments
Analysis tools
Highly correlated data, multireader, multicase design Use MRMC algorithm from UI or LABMRMC from UC
Performance measures
Area under the curve, Az Partial area index at TPF of 0.9 Test of difference between treatment means and confidence intervals Statistical significance at the 0.05 level
Presentation setup and data collection
Film multiviewer for film display following clinical hanging protocol Two high-resolution monitors for soft-copy display Film digitization at 100 µm and 12 bits per pixel Custom interface for display Custom interface for electronic reporting and generation of ROC input files for analysis by ROC algorithm
Medical image analysis method
486
Note: Film mammography was first compared with soft-copy digitized mammography before evaluating the benefits of the CAD detection algorithm.
In some applications, reading-order effects or memory bias may have a significant negative impact on the results. There are degeneracy problems when readers do not use the full rating scale. Studies are costly, time consuming, slow, and complicated. Methods of analysis can handle only binary decisions (negative vs. cancer, benign vs. malignant, disease vs. no disease) that are not representative of the complexity of medical images or of the diagnostic process or even of the output of CAD algorithms. Analysis does not differentiate between cases with single and multiple lesions and whether the observer evaluates the true lesion or not, as there is no lesion localization involved in the process. Methodologies developed to address some of the weaknesses of ROC will be briefly described in the following subsections. 12.5.2 LROC TEST The LROC methodology was developed originally by Starr et al. [74], but it was revisited and formalized by Swensson several years later [75]. LROC is not widely used, probably because the formal statistical analysis came much later than the ROC one, and it still lacks in robustness. Swensson’s 1998 software package is free but runs only on Windows 98 and does not have the user-friendly interface of the ROC packages [32]. This method, however, takes target localization accuracy into account, compares results with the standard ROC, and estimates similar performance metrics. LROC experiments are designed in ways similar to the ROC ones. Notable differences from ROC, beyond the analysis parts, are found in the interpretation process. In LROC, images may contain one or more targets (lesions or areas of interest), and each target is localized and rated using a discrete rating scale. The highest-rated report of a finding on each image is used as the “summary rating” that represents the entire image in the analysis process [76]. Images with no targets (controls, benign, or negative cases) are also rated by selecting a single “most suspicious” area in the image and assigning a low rating (forced localization choice). We have used this method successfully to evaluate an enhancement and a compression algorithm for digital mammography, where improvements in localization accuracy are an important aspect of the algorithms’ performance [77]. 12.5.3 FROC TesT This methodology was formalized for medical-imaging evaluation studies by Chakraborty et al. [78, 79]. The acceptance and application of the FROC methodology has also been limited primarily due again to the delay in developing a statistical analysis procedure relative to the ROC development. However, we now have the models required to fit the FROC data and lead to measurable outcomes. Notable differences from ROC in
Evaluation strategies for medical-image analysis
487
this case include: (a) multiple lesions or areas of interest can be present in an image, (b) all need to be localized, and (c) a four-point rating scale is used to rate each one. Furthermore, there is no forced localization choice here, as there is in LROC. Unmarked images or locations reflect “definitely negative” decisions. Dedicated software (FROCFIT) or the ROCFIT program can be used to analyze the FROC data [79]. Two methods are proposed for handling FROC data, generating ROClike curves, and estimating performance metrics: the FROC that uses the FROCFIT program [78] and the alternative FROC (AFROC) that uses the ROCFIT program [79]. The methods differ in the way the false-positive data are scored. The newest one, the AFROC, is recommended because it does not need an assumption on the distribution of the false-positive detections in an image (FROC assumes a Poisson distribution). 12.5.4 AFC AND MAFC TESTS The alternative forced choice (AFC) and multiple AFC (MAFC) tests belong to a family of methods proposed by Burgess [80] as an alternative to ROC for a more direct and faster measurement of the observer’s sensitivity in medical imaging. Observers in this case have an easier task than in ROC studies. They are required to identify a signal (target, lesion, area of interest) that is always present in one of two (2AFC) or in one of M (MAFC) images or regions of images or alternative signals. This experiment is generally easier and faster to execute. However, the interpretation process and the selection and presentation of the data are critical elements in these studies and should be carefully considered [80]. It has been shown that an AFC study could provide the same power as an ROC study for a certain sample size and number of participating readers (usually twice as many are required for equivalent results). The outcome of an AFC study is easily correlated with clinical experience, as the only measured indices of performance are related to detectability and to TP and FP rates. An MAFC experiment is particularly well suited for studying signal detectability with synthetic or simulated data in a wide range of signal-tonoise ratios [80]. 12.5.5 PREFERENCE TESTS Preference tests are non-ROC, observer-based experiments that may be highly sensitive to performance differences, easy, and fast to implement at relatively low cost despite the fact that inter- and intraobserver variability necessitates that these studies involve multiple observers and large, representative data sets. Preference tests are useful in selecting a modality or a CAD scheme to be tested further with an ROC study or to set the boundary conditions at which it makes sense to perform an ROC study. There does not seem to be a formal statistical approach or theory that is termed “preference methodology.” The name was probably selected by the medical-imaging community to indicate observer studies where the reader selects or rates an image or setup or process from a group of similar images, setups, or processes [81–85]. Terms such as “visual grading analysis” and “observer preference analysis” are often used in preference studies that measure observer performance in relation to image quality [86, 87].
Medical image analysis method
488
Several approaches are reported in the literature for rating signal detectability or ranking overall image quality. They include: (a) multipoint-rank-order studies, (b) justnoticeable-difference studies, (c) rating-scale studies, (d) forced-choice studies where the best of two or more images is selected, and (e) method of paired comparisons. The latter method was the subject of much study in the 1950s and 1960s [88] but seems to have been largely concluded by the mid 1970s, with the last publication appearing in 2001 [89]. Few of the reported studies include a detailed statistical analysis or provide a quantitative evaluation of the data. Depending on the hypothesis, design, number of participating readers, and database size, various statistical tests may be applicable, including: (a) the Friedman two-way nonparametric test for N observations [85], (b) the Wilcoxon’s signed rank test to assess pairwise comparisons of the various images or processes [85], (c) Kendall’s coefficient of consistence, coefficient of concordance, and rank correlation coefficient [87], (d) student’s t-test and confidence intervals to assess significance of results [84], and (e) student-Neuman-Keul’s test to determine the significance of the differences between mean scores [86]. There seems to be sufficient statistical basis in “preference studies” to suggest that such tests can be used in the evaluation of medical-image-processing methodologies and CAD. We certainly gain by the speed and simplicity of the execution of these studies. However, a good statistical analysis needs to accompany the study that will provide good quantitative assessment and not merely qualitative or anecdotal observations. 12.6 STUDY POWER AND BIASES In this section, we will look in more detail at issues affecting the power of a study and issues that might bias the measurements [90, 91]. These are also the issues most frequently argued upon and the “softest spots” in the development and validation of medical-imaging methodologies that routinely receive the heaviest criticism. The reasons may be that several validation aspects are truly controversial, several succumb to serious logistical and pragmatic constraints, and several suffer from a lack of standards. It is probably fair to say that it is impossible to design a study with everyone’s “seal of approval.” There are probably as many views on medical-imaging validation as there are researchers in the field. The imaging scientist, however, should carefully consider and openly discuss all aspects of a study, its strengths and, particularly, its weaknesses. In this section, we assume that the researcher is past the definition of the study and its objective, the decision on the hypothesis to be tested (i.e., equivalence or superiority or other), the selection of validation methodology and analysis tools, and the planning for timetable, funds, and effort and is now faced with the logistics of the experimental plan before actually executing the plan. In this case, the following need to be addressed:
Database generation
Evaluation strategies for medical-image analysis
489
Algorithm training and testing and database effects Estimation of performance parameters and rates Presentation setup Statistical analysis Each of these topics is discussed in the following subsections. 12.6.1 DATABASE GENERATION The database generation may be labeled as the hottest topic in validation and covers areas such as the content, the size, the case type and representation, the usage in development, the ground truth, the documentation, the quality, the standardization, and more [92–95]. The power and significance of a study is directly linked to the database, and measurements can be seriously biased if the sample size is not properly developed. 12.6.1.1 Database Contents and Case/Control Selection Some issues to consider: common databases vs. individually generated databases; the dilemma of representation and quality control; difficult vs. easy cases; negative only and cancer only as done by the commercial CAD companies [96]. Using only negative and cancer cases seems to have a favorable bias on the outcome. See, for example, what happens in the performance of the commercial CAD systems when benign cases are included. Description of contents helps communicate with other researchers in the field and allows comparisons. The use of histograms to represent the range of lesion sizes and contrast, for example, has proven useful in mammography [92, 93]. As we go beyond images and include nonimage, demographic, and clinical information in our development, a description of these factors will become necessary. Histograms may still have a role in the description of a database’s contents. How do we match cases and controls, i.e., normal and abnormal cases? Do we match them by image appearance only or by patient demographics as well. Until now, we were focused on the images, but as we move beyond images, matching demographics may be another thing to consider. 12.6.1.2 Database Size and Study Power We previously discussed the experts’ recommendations on the sample size required for ROC studies. ROC recommendations were a function of area Az. They are also a function of the correlation of the data. The recommended sample sizes should be considered as general observations that may not hold true for a specific experiment. To quote Dr. Berbaum at the University of Iowa and an authority on ROC methodology: “For ROC studies, it is often not very helpful to worry about how much power you have—I have never had as much as I wanted. Simply collect as much data as possible.” For non-ROC studies, there are several options for estimating sample size. Almost all require that the following parameters first be defined for a two-treatment evaluation [40]: Statistical significance level, a, or Type I error, or FP rate
Medical image analysis method
490
Power (1—(3), or Type II error, or (1—FN) rate Treatment 1 performance or effect Estimate of treatment 2 performance or effect Estimate of standard deviation, if dealing with means and treatment differences. For most studies, α=0.05 (or 5% significance level) and β=(3=0.2 (or 80% power). Treatment 1 is the standard of practice and treatment 2 is the new methodology that will be tested against the standard. For a study in lung cancer imaging, for example, treatment 1 might be chest radiography and treatment 2 helical CT imaging. For breast cancer imaging, treatment 1 might be mammography and treatment 2 mammography with CAD. The effect of treatment 1 is usually found in the clinical literature. The effect of the new treatment 2 is estimated either from pilot studies or by defining a clinically important effect. The latter can be estimated by considering the effect required to change the current clinical practice. Remember that justification is necessary. Simply stating a desired effect is not only insufficient, but also risks being unrealistically high and could lead a study to failure. Based on the five parameters above, tables or standard statistical equations or software can be used for sample size estimates [97]. 12.6.1.3 Ground Truth or Gold Standard We have already discussed this issue with respect to the requirements of imagesegmentation validation. Detection and classification algorithms, however, have slightly different requirements, and they may not always need an outline of the area of interest, as does segmentation. Generally, ground truth in medical imaging is established by: Clinical proof that includes image information from radiology (may be single or multimodality imaging), clinical information from laboratory and clinical examinations, and pathology information from biopsy reports. Opinion of the expert(s) participating in the study. If a panel of experts is used, ground truth may be established by relative decision rate or majority rule or consensus among the experts. Opinion of expert(s) not participating in the study. This can be done before the study as a review or after as a feedback to the overall process. 12.6.1.4 Quality Control The implementation of a quality control program is necessary to ensure that database generation conforms with generally accepted standards, that digitized or digital images are of the highest quality, that artifacts during image acquisition or during film digitization are avoided, and that the same image quality is achieved over time. Film digitizers pose the greatest challenge in database generation. Test films and phantom images can be used to monitor image quality and ensure high-quality data for further processing [98].
Evaluation strategies for medical-image analysis
491
Finally, we should mention here the consistent and diligent effort over the past several years of academic and federal institutions to develop publicly available, centralized, large, and well-documented data sets for validating various medical-imaging applications. Efforts have been initiated in human anatomy [48], breast cancer [99, 100], and lung cancer research [101, 102]. It is anticipated that these databases will provide valuable resources to the researchers that do not have immediate access to data and will advance development and relative evaluation. They may also provide a common reference that will allow comparison of different algorithms or processes. In addition, metrics of performance widely used in other fields may now attract our attention. One might consider, for example, the automatic target recognition (ATR) analysis method applied to the evaluation of detection and classification algorithms for military imaging applications [103]. In ATR, algorithm performance is measured through a set of probabilities that resemble the true and false rate definitions above. Although not used in medical imaging right now, due primarily to the small sample sizes that are traditionally available for medical studies, ATR principles may be of increasing interest and applicability now that larger data sets are planned and will soon be available to the scientific community. 12.6.2 ALGORITHM TRAINING AND TESTING AND DATABASE EFFECTS The database(s) used for algorithm training and testing and the way they are used may be another source of bias in development. The bias usually comes from the small sample size, from inadequate representation of cases and controls in the set, from poor criteria applied to the learning process, and from using learning techniques that are likely to overestimate performance. This is a large area of research, and we will not discuss it here in detail. We will only review the generally accepted procedures for training and testing algorithms on small data sets, as in the case of medical applications. Given that most algorithms are developed, trained, and tested on small data sets, mathematical methods are required in the learning process to reduce the small-sample estimation bias and variance contributions, to stop the algorithms’ training at the right point, and to construct an unbiased rule for future predictions. The major methodologies recommended and often applied to the statistical and nonstatistical pattern-recognition algorithms in medical imaging and CAD are summarized in Table 12.4. This table is by no means comprehensive and only aims at pointing out the major differences between the various terms and methodologies that are often used and confused in the medical-imaging literature. The reader is prompted to consult the excellent publications in this field for more in-depth theoretical analysis and review of applications [104–107]. The method missing from Table 12.4 is the one where the same set of cases is used for training and testing an algorithm. This is not an accepted approach, although it is often used by investigators, because it significantly overestimates algorithms’ performance and yields unrealistic results. A few more interesting remarks before we leave this subject:
Medical image analysis method
492
If you have ever asked the question, “How many cases are needed for training an artificial neural network or any classification algorithm?” you would
TABLE 12.4 Methods Commonly Used and Recommended for Estimating the Error Rate of a Prediction or Decision Rule Method
Principle
Estimated Parameter
Comments
Split-sample or hold-out validation
Data divided in two subsets: one set for training, the other set for error estimation
Generalization error function (prediction error)
No crossing of samples; used for early stopping; not robust for small sets
k-fold cross validation
Data are divided in k subsets of equal size: k—1 subsets are used for training, with the one left out used for error estimation (k is usually 10)
Generalization error function (prediction error)
Superior for small sets
Leave-one-out cross validation or round robin
Cross-validation with k equal to the sample size: k—1 cases are used for training, with the one left out used for error estimation
Generalization error function (prediction error)
Better than k-fold crossvalidation for continuous error functions but may perform poorly for noncontinuous ones
Jackknifing
Same as leave-one-out
Bias of a statistic
Complex version could estimate generalization error
Bootstrapping
N subsamples of the data are used for learning: each subsample has size k and is randomly selected with replacement from the full data set
Generalization error function (prediction error), confidence intervals
Less variability than crossvalidation in many cases; several versions; best using the .632+ rule
Note: Methods used as in the case of algorithms that employ statistical and nonstatistical patternrecognition algorithms. Sources: Tourassi, G.D. and Floyd, C.E., Medical Decision Making, 17, 186, 1997; Efron, B., The Jackknife, the Bootstrap, and Other Resampling Plans, Society for Industrial and Applied Mathematics, Philadelphia, 1982; Efron, B. and Tibshirani, R., J. Am. Stat. Assoc., 92, 548, 1997; Efron, B. and Tibshirani, R., Science, 253, 390, 1991.
know that the answers vary from “as many as possible” to “it all depends on how representative the set is,” but it never comes with a specific number. The reason is that a specific number depends on several
Evaluation strategies for medical-image analysis
493
factors, and there is not just one good answer for all applications. Most pattern-recognition algorithms are trained based on sets of features or feature vectors extracted from the medical image. The size of the input feature set and the sample size have a direct relationship, particularly when the latter is small. It is known that “the size of the training data grows exponentially with the dimensionality of the input space,” a phenomenon referred to as the “curse of dimensionality” [108]. If we are forced to work with limited data sets (as in the case of medical imaging), we cannot afford to ever increase the dimensionality of the input space. To enhance the accuracy of our algorithm, the number of variables must be reduced or the model must be simplified [109]. As a rule of thumb, that number of predictors or features / in a classification scheme should be f
Medical image analysis method
494
a standard and allow relative evaluations [38]. Estimating the same performance rates above for CAD diagnosis algorithms is less complicated because, in this case, usually two states are considered—benign vs. malignant, normal vs. abnormal, disease vs. nondisease, etc.—which are usually defined in pathology or clinical reports. Another related aspect in the estimation of the performance rates that could significantly change the outcome is whether rates are determined on a per image or per case basis; often a patient’s examination may involve more than one image, e.g., a mammogram involves four views (two of each breast). Setting clear conventions and criteria ahead of time and maintaining them during the evaluation process is critical in reporting results and obtaining consistent and unbiased performances. 12.6.4 PRESENTATION SETUP The way an algorithm’s output is presented to the observer in observer-based validation studies can influence the validation outcome [113]. Current commercial CAD systems for mammography and lung radiography point out suspicious areas using a specific symbol assigned to a type of abnormality, e.g., a triangle for a calcification cluster or a circle for a site of potential mass. These CAD outputs can be presented in a hard-copy form, i.e., a printout of the original image marked with the CAD output (if any), or in soft-copy form, i.e., the image displayed on a low-resolution computer monitor marked with the CAD output. These displays are presented sideby-side with the regular film or digital display, i.e., next to the multiviewer if films are reviewed or next to the monitors used for primary diagnosis if digital images are reviewed, as in the case of CTs and MRIs. Key elements in the presentation of an algorithm’s output are: Hanging protocol and workflow: The sequence of image presentation with and without CAD should be designed to have minimum impact on the standard work flow. The addition of processed images should not significantly delay the interpretation process. A reasonable reading load should be maintained throughout the study to avoid fatigue and unbalanced case interpretation. Type of computer monitor used for soft-copy display: The spatial resolution and luminance of the selected monitor should match the imaging task and application and ensure the highest possible image quality. Pairs of CRTs are usually recommended for all medical-imaging applications; 1-Mpixel CRTs are used for general radiography, and 5Mpixel systems are used for mammography. Recently, significant technological advances have been achieved in LCD flat-panel displays that are currently being evaluated for medical applications but are not yet clinically acceptable [114]. A quality control program should be established for the display systems that meets established standards [115]. Use of color vs. black-and-white image display: This is an ambiguous issue because the bulk of medical images today involves only gray-scale information, and the readers are trained in gray-scale interpretation. Color has certain advantages, however, and some segmentation and three-
Evaluation strategies for medical-image analysis
495
dimensional reconstruction algorithms have used color effectively, showing a positive impact on physician performance. Workstation-user interface: This interface should be user friendly and intuitive. It should offer both automatic and manual adjustments for interactive processes and, if made in-house, validated independently before being used in an algorithm-validation study [116]. Observer training: This is critical to the success of a validation study. Observers should be thoroughly trained on a separate but representative data set on how to interpret processed data, how to report, what criteria to use during the study, how the algorithm operates, and what its output means. Knowledge of the laboratory performance of the tested algorithm is useful in extrapolating to its potential clinical significance. Readers should also become familiar with the rating approach and apply consistent rating criteria for all cases throughout the study. Algorithms designed to assist in the detection of disease (CAD detection) are usually easier to evaluate than classification algorithms (CAD diagnosis schemes) that present a pathology prediction, particularly when the latter outperform the human reader. In this case, there is substantial risk for the reader to be biased by the algorithm’s performance and accept its recommendation without conducting a critical review [20]. Environment and ambient light conditions: Reading conditions are critical for both hard-copy (film) and soft-copy (computer monitor) reading. Ambient light should be controlled and conform to specifications for radiology environments. Cross-talk between monitors or display devices and light sources should be eliminated. Readers should be positioned at recommended height and distance levels from the display. Ergonomics should be fully considered to avoid fatigue and viewing distortions [117]. Reporting mechanisms’. Dictation, hard-copy forms, or computer interfaces are all options for reporting, and choosing one over another is a practical issue. A computer interface offers the greatest flexibility for the investigator because it allows exporting information directly to the analysis tools and minimizes error. 12.6.5 STATISTICAL ANALYSIS Earlier, we discussed the statistical analysis associated with ROC types of studies; this is usually part of the publicly available software packages. Non-ROC studies also require statistical analysis to test the differences between groups of data, the differences between variables, or the relationship between measured variables. The data or variables may be, for example, any of the performance indices described in Section 12.3. There are numerous statistical tests to choose from, depending on the data type and experimental conditions. A biostatistician’s guidance in selecting the right test cannot be overemphasized. Table 12.5 summarizes tests that could be used for the statistical analysis of data from non-ROC validation studies of computer algorithms in medical imaging. We observe that
Medical image analysis method
496
some of these tests are common among studies. Namely, they are used in the analysis of both ROC and non-ROC experiments or both observer and nonobserver experiments. Generally, it is the characteristics of the data set(s) and the input and output variables that determine the type of test to be used. The first step in selecting the right statistical test is to determine whether the data follow a Gaussian or normal distribution (parametric) or not (nonparametric). Note that for every parametric test, there is a nonparametric equivalent; nonparametric tests apply when the sample size is too small for any distribution assumptions to be made. The next step looks at the data type, e.g., continuous, nominal, categorical [40]. Finally, one
TABLE 12.5 Statistical Tests Commonly Used for Analysis of Single-Measurement Data Obtained from Medical-Image Analysis and Processing Algorithms Approach
Type of Data
When Used
Test
Parametric
Continuous
To compare means of two dependent groups
Paired t-test
Continuous
To compare means of two independent groups
Unpaired t-test
Continuous
To compare means of two or more independent groups
Analysis of variance (ANOVA)
Continuous
To measure association of two Pearson’s correlation variables coefficients
Binomial
To compare two dependent samples
McNemar’s test Sign test
Continuous
To compare two dependent groups
Wilcoxon signed rank test
Binomial
To compare two independent groups
Pearson’s χ2 test Fisher’s exact test
Continuous
To compare two independent groups
Wilcoxon Mann Whitney Wilcoxon rank sum test
Binomial
To compare two or more independent groups
Pearson’s χ2 test
Continuous
To compare two or more independent groups
Kruskal-Wallis test Log-rank test (for survival data)
Continuous
To measure association
Spearman’s correlation coefficient
Binomial
To measure agreement
Cohen’s (weighted) kappa
Nonparametric
Nominal Categories
Nominal
Evaluation strategies for medical-image analysis
497
Categories Continuous Note: Tests are grouped in parametric and nonparametric approaches. Depending on the type of data and the goal of the application, one or more tests may be applicable. Details and examples of these and other tests can be found in the extensive biostatistics literature. Source: Dr. Ji-Hyun Lee of the Biostatistics Core at the H. Lee Moffitt Cancer Center & Research Institute has contributed valuable comments on the role and use of the various statistical tests. Her assistance in the generation of this table is greatly appreciated.
should determine whether the data or variables to be tested are independent (unpaired or unmatched or uncorrelated) or dependent (paired or matched or correlated). Dependent groups of data are common in medical-imaging validation studies where images from the same patient are used multiple times in a data set or when the same cases are reviewed by the same observers [40]. An example of a paired t-test application is the analysis of image segmentation data that include breast density measurements from mammograms pre- and post-treatment for breast cancer patients. The unpaired t-test can be applied when breast density measurements are compared between two treatment groups. Analysis of variance (ANOVA) is applicable when breast density measurements are compared among three patient groups that receive different drug treatments. Finally, Pearson’s correlation coefficients are appropriate for correlating lung nodule detection rates of CAD detection schemes for chest radiography and CT. Nonparametric tests are applicable to the same examples when the sample size is small. Note that Table 12.5 only lists methods applied to data obtained from single measurements as opposed to data acquired from repeated measurements. The latter should be treated with longitudinal analysis methods that are applicable to data from the same experimental parameter collected over time [118]. Repeated data collection is commonly done in medical imaging when monitoring a biological process or a patient’s response to treatment or other type of long-term intervention. Computer algorithms such as segmentation methods that are applied to repeated images of the same patient should be analyzed with appropriate longitudinal analysis methodologies to avoid biased pvalues. 12.7 DISCUSSION AND CONCLUSIONS This chapter summarizes the most popular and accepted methodologies applicable to the evaluation of image analysis and processing techniques for medical-imaging applications. The approaches described here can be used to: (a) discriminate early on in the development the methods that are most likely to succeed and rank performances, and (b) assess the clinical value of top candidates and test specific hypotheses. The development of new medical-image-processing methodologies should be thoroughly scrutinized through robust but low-cost and fast validation approaches. Feasibility studies that test new image-processing ideas or new medical-imaging applications could avoid the observer-based ROC-type tests. Preference studies or computer-based ROC-type
Medical image analysis method
498
experiments or mathematical error analysis could provide the information necessary to discriminate, compare, and optimize methodologies at the early stages of development. Proven concepts could then be tested with observerbased, retrospective ROC experiments. New developments in the field of validation methodologies address the limitations of existing techniques. For example, a differential ROC (DROC) methodology was proposed by Chakraborty et al. [119, 120] for measuring small differences in the performance of two medical-imaging modalities in a timely and cost-effective way. In DROC, the observer sees pairs of images of the same patient (one from each modality), selects one that is preferred for the assigned task, and rates it using a five-point rating scale similar to the one used in ROC. The method seems promising in that it may require fewer cases and fewer observers than an ROC study while yielding equivalent power. The method is likely to be more applicable to the evaluation of different imaging modalities, e.g., MRI and mammography, digital and screen/film mammography, or image-processing techniques, e.g., image enhancement and compression algorithms. It may be less applicable to the evaluation of CAD detection or CAD diagnosis schemes. Another new development could lead to a new ROC tool that can handle more than two classes. This is required, for example, for the analysis of three-class data obtained from classifiers that differentiate between benign, malignant, and false-positive computer detections on medical images [121]. But does validation stop here? And if not, what comes next? According to Fryback and Thornbury’s model [15], these evaluation steps only take us half way to the final goal of total efficacy assessment of a diagnostic tool. Following these experiments, the computer algorithms that showed significant potential to have a positive clinical impact should be further tested for therapeutic efficacy, patient outcome efficacy, and societal efficacy. So, prospective clinical trials are what should come next. Unfortunately, we do not have a historical precedent to demonstrate the way such a trial should be conducted in the field we are discussing. Commercial CAD systems for mammography were the first ones to enter the clinic. But they did so without going through traditional clinical trials and based only on the positive outcome of ROC retrospective studies. This is probably the reason for the controversial findings that followed regarding their clinical value, which will continue to be questioned until a large prospective clinical trial is performed [122]. Finally, this chapter attempted to provide practical, albeit limited, solutions to admittedly complicated issues of validation, such as the problem of ground-truth definition required for the validation of image segmentation techniques or the sample size. We should strive to find absolute and robust solutions to the validation problems, but the lack thereof should not hinder algorithm development, considering the high rate of technological advancements in medical-imaging equipment and diagnostic procedures. Establishment of standards on metrics and validation criteria and consensus on the use of the techniques currently available could ease the burden for unattainable perfection while satisfying our current requirements, significantly improving the validation process, and yielding meaningful results.
Evaluation strategies for medical-image analysis
499
ACKNOWLEDGMENTS The author would like to thank Robert A. Clark, John J. Heine, Ji-Hun Lee, Lihua Li, Jerry A. Thomas, Anand Manohar, Joseph Murphy, Angela Salem, and Mugdha Tembey for their valuable discussions and comments on algorithm evaluation issues and their assistance in the preparation of this manuscript. REFERENCES 1. Giger, M.L., Computer-aided diagnosis, in Syllabus: a Categorical Course in Physics: Technical Aspects of Breast Imaging, Haus, A.G. and Yaffe, M.J., Eds., RSNA, Chicago, IL, 1993, p. 283. 2. Feig, S.A., Clinical evaluation of computer-aided detection in breast cancer screening, Semin. Breast Dis., 5, 223, 2002. 3. Li, L. et al., Improved method for automatic identification of lung regions on chest radiographs, Acad. Radiol., 8, 629, 2001. 4. Vaidyanathan, M. et al., Comparison of supervised MRI segmentation methods for tumor volume determination during therapy, Magn. Resonance Imaging, 13, 719, 1995. 5. Heine, J.J. and Malhotra, P., Mammographic tissue, breast cancer risk, serial image analysis, and digital mammography: Part 1, tissue-related risk factors, Acad. Radiol., 9, 298, 2002. 6. Heine, J.J., and Malhotra, P., Mammographic tissue, breast cancer risk, serial image analysis, and digital mammography: Part 2, serial breast tissue change and related temporal influences, Acad. Radiol., 9, 317, 2002. 7. Clarke, L.P. et al., Hybrid wavelet transform for image enhancement for computer-assisted diagnosis and telemedicine applications, in The Frequency and Wavelets in Biomedical Signal Processing, Akay, M., Ed., IEEE Press Series in Biomedical Engineering, IEEE, New York, 1998, chap. 21. 8. Yang, Z. et al., Effect of wavelet bases on compressing digital mammograms, IEEE Eng. Med. Biol Mag., 14, 570, 1995. 9. Masero, V., Leon-Rojas, J.M., and Moreno, J., Volume reconstruction for health care: a survey of computational methods, Ann. N.Y. Acad. Sci., 980, 198, 2000. 10. Sallam, M.Y. and Bowyer, K.W., Registration and difference analysis of corresponding mammogram images, Medical Image Anal, 3, 103, 1999. 11. Deans, S.R. et al., Wavelet transforms, in Encyclopedia of Electrical and Electronics Engineering, Webster, J.G., Ed., J. Wiley & Sons, New York, 1999. 12. Qian, W. et al., Computer-assisted diagnosis for digital mammography, IEEE Eng. Med. Biol. Mag., 14, 561, 1995. 13. Kallergi, M., Computer-aided diagnosis of mammographic microcalcification clusters, Med. Phys., 31, 314, 2004. 14. Shiraishi, J. et al., Computer-aided diagnosis to distinguish benign from malignant solitary pulmonary nodules on radiographs: ROC analysis of radiologists’ performance—initial experience, Radiology, 221, 469, 2003. 15. Fryback, D.G. and Thornbury, J.R., The efficacy of diagnostic imaging, Medical Decision Making, 11, 88, 1991. 16. Pepe, M.S., The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford University Press, Oxford, U.K., 2002. 17. Methodological issues in diagnostic clinical trials: health services and outcomes research in radiology, Symposium Proceedings, Washington DC, USA, March 15–16, 1998, Acad. Radiol., 6, Suppl. 1, S1-136, 1999.
Medical image analysis method
500
18. Hulley, S.B., et al., Designing Clinical Research: An Epidemiologic Approach, 2nd ed., Lippincott, Williams & Wilkins, Philadelphia, PA, 2000. 19. Friedman, L.M., Furberg, C., and Demets, D.L., Fundamentals of Clinical Trials, 3rd ed., Springer-Verlag, Heidelberg, 1998. 20. Zhou, X.H., McClish, D.K., and Obuchowski, N.A., Statistical Methods in Diagnostic Medicine, Wiley, New York, 2002. 21. Gay, J., Clinical Epidemiology & Evidence Based Medicine Glossary: Experimental Design and Statistics Terminology, August 22, 1999, Washington State University; Available online at http://www.vetmed.wsu.edu/courses-jmgay/GlossExpDesign.%20htm, last accessed 3/05. 22. Thornbury, J.R., Intermediate outcomes: diagnostic and therapeutic impact, Acad. Radiol., 6, S58, 1999. 23. Hendee, W.H., Technology Assessment, National Cancer Institute Imaging Sciences Working Group Technology Evaluation Committee, Final Report, December 16,1997; Available online at http://imaging.cancer.gov/reportsandpublications/Reportsand%20Presentations/ImagingScience sWorkingGroup/page2, last accessed 3/05. 24. Phelps, C.E. and Mushlin, A.I., Focusing technology assessment using medical decision theory, Medical Decision Making, 8, 270, 1988. 25. Chesters, M.S., Human visual perception and ROC methodology in medical imaging, Phys. Med. Biol, 37, 1433, 1992. 26. Metz, C.E., ROC methodology in radiologic imaging, Invest. Radiol., 21, 720, 1986. 27. Nishikawa, R., Assessment of the performance of computer-aided detection and computeraided diagnosis systems, Semin. Breast Dis., 5, 217, 2002. 28. Houn, F. et al., Study design in the evaluation of breast cancer imaging technologies, Acad. Radiol., 7, 684, 2000. 29. Wagner, R.F. et al., Assessment of medical imaging and computer-assist system: lessons from recent experience, Acad. Radiol., 9, 1264, 2002. 30. King, J.L. et al., Identification of superior discriminators during non-ROC studies, Proc. SPIE, 4686, 54, 2002. 31. Zou, K.H., Receiver Operating Characteristic (ROC) Literature Research, Department of Radiology, Brigham and Women’s Hospital, Department of Health Care Policy, Harvard Medical School; Available online at http://splweb.bwh.harvard.edu:80007%20pages/ppl/zou/roc.html, last accessed 3/05. 32. Medical Image Perception Society, ROC References-ROC & Related Programs to Analyze Observer Performance; Available online at http://www.mips.ws/, last accessed 3/05. 33. Beytas, E.M., Debatin, J.F., and Blinder, R.A., Accuracy and predictive value as measures of imaging test performance, Invest. Radiol., 27, 374, 1992. 34. Li, H.D. et al., Markov random field for tumor detection in digital mammography, IEEE Trans. Medical Imaging, 14, 565, 1995. 35. Kallergi, M., Interpretation of calcifications in screen/film, digitized, and wavelet-enhanced, monitor-displayed mammograms: a receiver operating characteristic study, Acad. Radiol., 3, 285, 1996. 36. Tembey, M., Computer-Aided Diagnosis for Mammographic Microcalcification Clusters, M.S. Thesis, Computer Science Department, College of Engineering, University of South Florida, Tampa, FL, 2003. 37. Jiang, Y., Metz, C.E., and Nishikawa, R.M., A receiver operating characteristic partial area index for highly sensitive diagnostic tests, Radiology, 201, 745, 1996. 38. Kallergi, M., Carney, G., and Gaviria, J., Evaluating the performance of detection algorithms in digital mammography, Med. Phys., 26, 267, 1999. 39. Li, L. et al., False-positive reduction in CAD mass detection using a competitive strategy, Med. Phys., 28, 250, 2001. 40. Mould, R.F., Introductory Medical Statistics, 3rd ed., Institute of Physics, Philadelphia, 1998.
Evaluation strategies for medical-image analysis
501
41. Pal, N.R. and Pal, S.K., A review on image segmentation techniques, Pattern Recogn., 126, 1277, 993. 42. Zhang, Y.J., A survey on evaluation methods for image segmentation, Pattern Recogn., 29, 1335, 1996. 43. Zhang, Y.J., A review of recent evaluation methods for image segmentation, Proc. 6th Int. Symp. Signal Processing and Its Applications (ISSPA), Kuala Lumpur, Malaysia, August 13–16, 2001, IEEE, Piscataway, NJ, 148–151, 2001. 44. Udupa, J.K. et al., A methodology for evaluating image segmentation algorithms, Proc. SPIE, 4684, 266, 2002. 45. Filippi, M. et al., Intra- and inter-observer variability of brain MRI lesion volume measurements in multiple sclerosis: a comparison of techniques, Brain, 118, 1593, 1995. 46. Kallergi, M. et al., A simulation model of mammographic calcifications based on the ACR BIRADS, Acad. Radiol., 5, 670, 1998. 47. Kallergi, M. et al., Resolution effects on the morphology of calcifications in digital mammograms, in Medicon 98, Proc. VIII Mediterranean Conf. Medical and Biological Engineering and Computing, Lemesos, Cyprus, 1998. 48. United States National Library of Medicine, National Institutes of Health, The Visible Human Project®; Available online at http://www.nlm.nih.gov/research/visible/visible_human.html, last accessed 3/05. 49. Zubal, I.G. et al., Computerized three-dimensional segmented human anatomy, Med. Phys., 21, 299, 1994. 50. Gerig, G. et al., A new validation tool for assessing and improving 3-D object segmentation, MICCAI, 2208, 516–528, 2001; Available online at http://www.cs.unc.%20edu/Research/MIDAG/pubs/papers/MICCAI01-gerig-valmet.pdf, last accessed 3/05. 51. Chalana, V. and Kim, Y., A methodology for evaluation of boundary detection algorithms on medical images, IEEE Trans. Medical Imaging, 16, 642, 1997. 52. Kelemen, A., Székely, G., and Gerig, G., Elastic model-based segmentation of 3-D neuroradiological data sets, IEEE Trans. Medical Imaging, 18, 828, 1999. 53. Motulsky, H., Intuitive Biostatistic, Oxford University Press, Oxford, U.K., 1995. 54. Mould, R.F., Introductory Medical Statistics, 3rd ed., Institute of Physics Publishing, Philadelphia, 1998. 55. Bland, J.M. and Altman, D.G., Statistical methods for assessing agreement between two methods of clinical measurement, Lancer, 1, 307, 1986. 56. Yoo, T.S., Ed., Insight into Images: Principles and Practice for Segmentation, Registration, and Image Analysis, A.K. Peters LTD, Wellesley, MA, 2004; ITK software available online at http://www.itk.org/, last accessed 3/05. 57. Goodenough, D.J., Rossman, K., and Lusted, L.B., Radiographic applications of signal detection theory, Radiology, 105, 199, 1972. 58. Chesters, M.S., Human visual perception and ROC methodology in medical imaging, Phys. Med. Biol, 37, 1433, 1992. 59. Dorfman, D.D., Berbaum, K.S., and Lenth, R.V., Multireader, multicase receiver operating characteristic methodology: a bootstrap analysis, Acad. Radiol, 2, 626, 1995. 60. Judy, P.F. et al., Measuring the observer performance of digital systems, in Computed Digital Radiography in Clinical Practice, Green, R.E. and Oestmann, J.W., Eds., Thieme Medical Publishers, New York, 1992, p. 59. 61. Berbaum, K.S., Dorfman, D.D., and Franken, E.A., Measuring observer performance by ROC analysis: indications and complications, Invest. Radiol., 24, 228, 1989. 62. Roe, C.A. and Metz, C.E., Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation, Acad. Radiol, 4, 298, 1997.
Medical image analysis method
502
63. Dorfman, D.D. et al., Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design, Acad. Radiol, 5, 591, 1998. 64. Beam, C.A., Strategies for improving power in diagnostic radiology research, AJR, 159, 631, 1992. 65. Rockette, H.E., Gur, D., and Metz, C.E., The use of continuous and discrete confidence judgments in receiver operating characteristic studies of diagnostic imaging techniques, Invest. Radiol, 27, 169, 1992. 66. Kallergi, M., Hersh, M.R., and Thomas, J.A., Using BIRADS categories in ROC experiments, Proc. SPIE, 4686, 60, 2002. 67. Metz, C.E., Some practical issues of experimental design and data in radiological ROC studies, Invest. Radiol, 24, 234, 1989. 68. Beiden, S.V. et al., Independent vs. sequential reading in ROC studies of computer-assist modalities: analysis of components of variance, Acad. Radiol., 9, 1036, 2002. 69. Chakraborty, D., Counterpoint to analysis of ROC studies of computer-assisted modalities, Acad. Radiol., 9, 1044, 2002. 70. ROC software, Kurt Rossman Laboratories for Radiologic Image Research, Department of Radiology, The University of Chicago; Available online at http://wwwradiology.uchicago.edu71a-yroc_soft.htm/, last accessed 3/05. 71. ROC Software, The Medical Image Perception Laboratory, Department of Radiology, The University of Iowa; Available On-line at http://perception.radiology.%20uiowa.edu//, last accessed 3/05. 72. Gatsonis, C. and McNeil, B.J., Collaborative evaluations of diagnostic tests: experience of the radiology diagnostic oncology group, Radiology, 175, 571, 1990. 73. Angelos-Tosteson, A.N. and Begg, C.B., A general regression methodology for ROC curve estimation, Medical Decision Making, 8, 204, 1988. 74. Starr, S.J. et al., Visual detection and localization of radiographic images, Radiology, 116, 533, 1975. 75. Swensson, R.G., Unified measurement of observer performance in detecting and localizing target objects on images, Med. Phys., 23, 1709, 1996. 76. Swensson, R.G. et al., Using incomplete and imprecise localization data on images to improve estimates of detection accuracy, Proc. SPIE, 3663, 74, 1999. 77. Kallergi, M. et al., Improved interpretation of digitized mammography with wavelet processing: a localization response operating characteristic study, AJR, 182, 697, 2004. 78. Chakraborty, D.P., Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data, Med. Phys., 16, 561, 1989. 79. Chakraborty, D.P. and Winter, L.H.L., Free-response methodology: alternate analysis and a new observer-performance experiment, Radiology, 174, 873, 1990. 80. Burgess, A.E., Comparison of receiver operating characteristic and forced choice observer performance measurement methods, Med. Phys., 22, 643, 1995. 81. Pisano, E.D. et al., Radiologists’ preferences for digital mammographic display, the International Digital Mammography Development Group, Radiology, 216, 820, 2000. 82. Strotzer, M. et al., Clinical application of a flat-panel X-ray detector based on amorphous silicon technology: image quality and potential for radiation dose reduction in skeletal radiography, AJR, 172, 835, 1999. 83. Rosen, E.L. and Soo, M.S., Tissue harmonic imaging sonography of breast lesions: improved margin analysis, conspicuity, and image quality compared to conventional ultrasound, Clin. Imaging, 25, 379, 2001. 84. Volk, M. et al., Digital radiography of the skeleton using a large-area detector based on amorphous silicon technology: image quality and potential for dose reduction in comparison with screen-film radiography, Clin. Radiol., 55, 615, 2000. 85. Sivaramakrishna, R. et al., Comparing the performance of mammographic enhancement algorithms: a preference study, AJR, 175, 45, 2000.
Evaluation strategies for medical-image analysis
503
86. Kheddache, S. and Kvist, H., Digital mammography using storage phosphor plate technique: optimizing image processing parameters for the visibility of lesions and anatomy, Eur. J. Radiol., 24, 237, 1997. 87. Caldwell, C.B. et al., Evaluation of mammographic image quality: pilot study comparing five methods, AJR, 159, 295, 1992. 88. Davidson, R.R. and Farquhar, P.H., A bibliography on the method of paired comparisons, Biometrics, 32, 241, 1976. 89. Silverstein, D.A. and Farrell, I.E., An efficient method for paired-comparison, J. Electron Imaging, 10, 394, 2001. 90. Beam, C.A., Fundamentals of clinical research for radiologists: statistically engineering the study for success, AJR, 179, 47, 2002. 91. Beam, C.A., Strategies for improving power in diagnostic radiology research, AJR, 159, 631, 1992. 92. Kallergi, M., Clark, R.A., and Clarke, L.P., Medical-image databases for CAD appli-cations in digital mammography: design issues, Stud. Health Technol Inform., 43, Pt B, 601, 1997. 93. Nishikawa, R.M. et al., Effect of case selection on the performance of computer-aided detection schemes, Med. Phys., 21, 265, 1994. 94. Zink, S. and Jaffe, C.C., Medical-imaging databases: a National Institutes of Health workshop, Invest. Radiol., 28, 366, 1993. 95. Noether, G.E., Sample size determination of some common nonparametric tests, JASA, 82, 645, 1987. 96. Gur, D. et al., Practical issues of ROC analysis: selection of controls, Invest. Radiol., 25, 583, 1990. 97. Woodard, M., Epidemiology: Study Design and Data Analysis, Chapman & Hall, CRC Press, Boca Raton, FL, 1999. 98. Kallergi, M. et al., Evaluation of a CCD-based film digitizer for digital mammography, Proc. SPIE, 3032, 282, 1997. 99. Digital Database for Screening Mammography (DDSM), University of South Florida, Digital Mammography Home Page; Available online at http://marathon.csee.usf.edu/%20Mammography/Database.html, last accessed 3/05. 100. Digital Mammographic Imaging Screening Trial, National Cancer Institute; Available online at http://cancer.gov/dmist, last accessed 3/05. 101. Lung Imagining Database Consortium (LIDC), Cancer Imaging Program, National Cancer Institute; Available online at http://imaging.cancer.gov/programsandresources/%20InformationSystems/LIDC, last accessed 3/05. 102. Fifth National Forum on Biomedical Imaging in Oncology, Bethesda, MD, 2004; Available On-line at http://cancer.gov/dctd/forum/summary04.pdf. last accessed 3/05. 103. Target Recognizer Definitions and Performance Measures, Report of the Joint U.S. Department of Defense and Industry Working Group on Automatic Target Recognizer, ATRWG No. 86–001, 1986, Storming Media, Washington, DC. 104. Tourassi, G.D. and Floyd, C.E., The effect of data sampling on the performance evaluation of artificial neural networks in medical diagnosis, Medical Decision Making, 17, 186, 1997. 105. Efron, B., The Jackknife, the Bootstrap, and Other Resampling Plans, CBMS-NSF Regional Conference Series in Applied Mathematics, Society for Industrial and Applied Mathematics, Philadelphia, 1982. 106. Efron, B. and Tibshirani, R., Improvement on cross-validation: the .632+ Bootstrap Method, J. Am. Stat. Assoc., 92, 548, 1997. 107. Efron, B. and Tibshirani, R., Statistical data analysis in the computer age, Science, 253, 390, 1991. 108. Bishop, C.W., Neural Networks for Pattern Recognition, Clarendon Press, Oxford, U.K., 1995.
Medical image analysis method
504
109. Harrell, F.E., Jr., Lee, K.L., and Mark, D.B., Multivariate prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors, Statistics Med., 15, 361, 1996. 110. Zheng, B. et al., Adequacy testing of training sample sizes in the development of a computerassisted diagnosis scheme, Acad. Radiol., 4, 497, 1997. 111. Jiang, Y. et al., Improving breast cancer diagnosis with computer-aided diagnosis, Acad. Radiol., 6, 22, 1999. 112. Metz, C.E., Herman, B.A., and Shen, J.H., Maximum-likelihood estimation of receiver operating (ROC) curves from continuously distributed data, Statistics Med., 17, 1033, 1998. 113. Begg, C.B. and McNeil, B.J., Assessment of radiologic tests: control of bias and other design considerations, Radiology, 167, 565, 1988. 114. Muka, E., Blume, H., and Daly, S., Display of medical images on CRT soft-copy displays: a tutorial, Proc. SPIE, 2431, 341, 1995. 115. Digital Imaging and Communications in Medicine (DI COM) Part 14: Grayscale Standard Display Function, National Electrical Manufacturers Association (NEMA), Rosslyn, VA, 2003; Available online at http://medical.nema.org/dicom/2003/03_%2014PU.PDF, last accessed 3/05. 116. Gohel, H.J. et al., A workstation interface for ROC studies in digital mammography, Proc. SPIE, 3031, 440, 1997. 117. Abdullah, B.J.J. and Ng, K.H., In the eyes of the beholder: what we see is not what we get, BJR, 74, 675, 2001. 118. Diggle, P.J., Liang, K.Y., and Zeger, S.L., Analysis of Longitudinal Data, Oxford University Press, Oxford, U.K., 1994. 119. Chakraborty, D.P. et al., The differential receiver operating characteristic (DROC) method, Proc. SPIE, 3338, 234, 1998. 120. Chakraborty, D.P, Howard, N.S., and Kundel, H.L., The differential receiver operating characteristic (DROC) method: rationale and results of recent experiments, Proc. SPIE, 3663, 82, 1999. 121. Edwards, D.C. et al., Estimating three-class ideal observer decision variables for computerized detection and classification of mammographic mass lesions, Med. Phys., 31, 81, 2004. 122. James, J.J., The current status of digital mammography, Clin. Radiol., 59, 1, 2004.
Index 2-D gradient watersheds, 276–278 3-D gradient watersheds, 277, 278 3-D watershed transformations, 275 4-point connectivity tracking algorithm, 343 A Academic Radiology, 436 Accuracy, 438–439, 440 improvement via segmentation validation tests, 444 Acoustic shadows, 96 Active contour model, 71 AdaBoost algorithm, 171. See also Boosting techniques Adaptive contrast enhancement (ACE), 55, 227 Adaptive filtering methodologies, 57, 187 Adaptive histogram equalization (AHE), 227 Adaptive linear mapping, 240 Adaptive local-enhancement methods, 227 Adaptive wavelet enhancement method, 229, 258, 262 advantage of, 265 Adaptive wavelet mapping, 238–241 Adaptive wavelet shrinkage, 235–237 AFC test, 454 Akaike information criterion (AIC) procedure, 417 Alberola-Lopéz, Carlos, 341, xi Algorithm performance, 458 Algorithm training, 455, 458 Alternative forced choice (AFC) test, 454 Ambient light conditions, 462 American College of Radiology, 436–438, 450 Amino acid sequences average length of, 382 co-occurrence strings of, 372–373 graph-based analysis of, 363–365 numerical coding scheme for, 374 in rat, 378–379 self-connections of subsequences, 383 as weighted/directed complex networks, 373–376 in xenopus, 377–378 in zebra fish, 377 Analysis of variance (ANOVA), 464 and brain connectivity studies, 407 results of, 416 Anatomical filter-based exposure-equalization (AFEE), 320, 323 Anatomical landmarks, 345 Angular second moment, in GLDS, 123–124
Index
507
Aperture problems, 349 Area under curve, 64–65, 441 Arrowing watershed methods, 276 Arteries, three-dimensional reconstruction of, 66–78 Artificial images, watershed segmentation experimental results, 297–300, 301–303 Artificial intelligence, 54 Artificial neural networks, 60, 61, 62, 458 Astolfi, Laura, 395, xi Asymmetry, 208 Asymptomatic plaque, 88–89, 93, 109 segmented, 96 verbal interpretations of arithmetic values, 107 Atherosclerosis, 66 in carotid plaque, 88 Atherosclerotic plaque rupture, 91 Autocorrelation function (ACF), 193 Automatic boundary-tracking process, 31 Automatic target recognition (ATR), 458 Autoregressive modeling for blocks of pixels in mammograms, 211, 213, 215, 217, 219, 221 with clustering techniques, 204–207 and electrocardiogram signals, 187–188 hierarchical clustering scheme for, 204–205 k-means algorithm for, 205–206 mammography applications, 207–214 one-dimensional, 187–188 texture characterization with, 185–188 two-dimensional, 188 B Babiloni, Fabio, 395, xi Bagging techniques, 138, 167–168 Band-pass filters, 54 Barabasi scale-free networks, 365 Bayesian classifiers, 138, 142–143, 318–319, 319 and model-based search methods, 174 Bayesian quadratic and linear classifiers, 60 Benign cysts, 13 Benign masses, mammograms with, 212–214 Betweenness centrality, 369 Biases, 455–456 algorithm training-related, 458–460 database effects-related, 458–460 database generation-related, 456–458 presentation setup, 461–462 statistical analysis, 462–464 Bijective properties, 344 Bilateral subtraction, 14, 29 Binary decisions, 453 Bioinformatics, 364 complex networks approaches to, 370–373 Biomedical image classification methods, 137–138.
Index
508
See also Classification methods Biplane coronary angiography, 66 combining with IVUS, 75 Black-and-white image display, 461 Blood noise reduction (BNR), 69 Blood-oxygen level dependence (BOLD), 399, 427 Blurring, successive, 286 Boosting techniques, 138, 167–168, 170, 171 Borders, refining in mammography, 17 Borders of interest, in IVUS images, 67 Boundary restrictions, 348 Boundary sharpening, 54 Brain activity, measuring, 396–397 Brain connectivity, 397 patterns of, 405, 412–413, 417 Brain tumor segmentations, 446 Breast border extraction, 316, 323 division into segments, 324–325 exclusion of nipple region in, 324–325 normals to, 326 polar representation of, 332 Breast-boundary tracking program, 31 Breast cancer computer-aided diagnosis of, 1–5, 226 detection sensitivity, 451 usefulness of ultrasound tissue characterization in, 168 Breast cancer detection with full-field direct digital mammography, 435 improving sensitivity and specificity, 3 Breast computed tomography, 40 Breast-deformation model, for compressed breasts, 29 Breast Imaging Reporting and Data System (BIRADS), 243, 253, 450 Breast masses CAD methods for assessing, 15–16 computerized detection of, 13–15 feature extraction and classification, 19–21 FROC analysis of detection accuracy, 21–27 object refinement process, 17–18 performance of mass detection algorithms, 25–27 preprocessing and segmentation of imaging, 16–17 training and test data sets in study, 21–23 training and testing procedures, 23–25 true positives and false positives, 23 Breast skin changes, 315–316 Busyness, in NGTDM, 125 C CAD architecture, 54 CAD detection algorithms, 435, 451 ROC study design for evaluation of, 452 CAD systems architecture of, 54
Index
509
automated methods for IVUS ROI detection, 68–72 basics of, 52–66 classification systems in, 59–64 evaluation methodologies, 64–65 feature analysis by, 57–59 fragmentation stage, 56–57 historical overview, 53–54 integrated, 65–66 IVUS image analysis limitations, 73 IVUS image interpretation in, 67–68 in mammography, 53, 226 medical-image processing and analysis for, 51–52, 78–79 plaque characterization in IVUS images, 73–75 preprocessing stage, 54–56 in three-dimensional artery reconstruction, 66–78 three-dimensional reconstruction methods, 75–78 Calcifications, 208, 226 Calcified plaque, 74 Calibration marks, 68 Carotid artery, ultrasound image, 90 Carotid endarterectomy, 88 Carotid plaque classification results of KNN classifiers, 111–112 classification results of SOM classifiers, 108–111 classifier combiner, 117 defined, 89–90 feature extraction and selection, 103–108, 115–116 future study directions, 117–118 materials in ultrasound study of, 93 plaque classification, 116–117 previous work on characterization of, 91–93 proposed system of classification, 113–115 results of classifier combiner, 112–113 study results, 103–115 texture-feature-extraction algorithms, 118–1132 Carotid plaque multifeature/multiclassifier system, 93–95 classifier combiner, 101–103 feature extraction, 97–99 feature selection, 99 Fourier power spectrum (FPS), 98 fractal dimension texture analysis (FDTA), 98 gray-level difference statistics (GLDS), 97 image acquisition and standardization, 95 Laws’s texture energy measures (TEM), 98 morphological features, 99 neighborhood gray-tone-difference matrix (NGTOM), 98 plaque classification, 99–101 with KNN classifier, 101 with SOM classifier, 100–101 plaque identification and segmentation, 95–96 shape parameters, 98 spatial gray-level dependence matrices (SGLDM), 97
Index
510
statistical-feature matrix (SFM), 98 statistical features, 97 Cascade correlation, 148 Case/control selection, 456 Case control studies, 436, 437 Case histories mammograms with benign masses, 212–214 with malignant masses, 210–212 wavelet analysis observer performance evaluation, 245–247 Catastrophe theory, 284 Catchment basin, 272, 274 Categorical data type, 462 Catmull-Rom splines, 296 Caudocranial view, 155 Center of abnormality, image coordinates of, 208 Cerebellum images, 300–311 Cerebral hemodynamic response, monitoring by fMRI, 399–400 Cerebral infarction, association with echolucent plaques, 91 Challenge regions, 437 Chan, Heang-Ping, 1, xi Characterization errors, 2 Characterization schemes, 55 Christodoulou, Christodoulos L, 87, xi Cincotti, Febo, 395, xi Classification, 53–54, 441 Classification methods, 59–64, 137–138 evaluation of, 64–65 Classifier combiner, 101–103, 117 results of, 112–113 Classifier ensembles, 138, 166–171 Clear-cell carcinoma, genomic classification methods for, 158–159 Clinical decision thresholds, 450 Clinical evaluation, 437 Clinical performance indices, 438–440 Clinical study designs, 436–438 Clinical trials, 435 Clique potential functions, 318 Cliques, 317–318 Cluster size, 384 Clustered microcalcifications, 2, 5 Clustering, 186, 187 AR modeling with, 204–207 false-positive reduction using, 9 hierarchichal scheme, 204–205 nonsymmetric model with, 208 symmetric model with, 208 Clustering coefficients, 379, 380 Co-occurrence strings, 372 Coarseness in NGTDM, 125 in SFM, 127
Index
511
Coherence analysis, 398 Cohort studies, 436 Colon polyps, computer-aided detection of, 158 Color image display, 461 Columnwise histogram stretching, 68 Complex networks, 365 approaches to bioinformatics in, 370–373 betweenness centrality of, 369 future research directions for, 381–385 history of, 365–367 investigation results, 376–379 mathematical concepts in, 367–370 sequences of amino acids as, 373–376 and statistical physics, 370 topology of biological, 370 Complexity analysis of, 366 in NGTDM, 126 Compression techniques, 441 Computer-aided detection/diagnosis (CAD), 40, 435. See also CAD systems of breast cancer, 1–5 of colon polyps, 158 effects on radiologists’ performance, 12–13 of internal carotid artery plaque, 88–89 role in reducing missed cancers, 3 Computer-assisted measurement, of skin thickness, 316 Computer FROC test, 442–444 Computer monitors, 461 Computer ROC test, 441–442 Computer simulation of cortical connectivity, 404–407 for directed transfer functions, 415–417, 424–425 for structural equation modeling, 415, 423–424 Computer vision, 54, 138 Condensation algorithm, 173, 174 Conditional probability distribution, 318, 329 Confidence levels, 253–254 differences in, 259, 261 Constrained-optimization models, 187 with equality constraints, 196–198 with inequality constraints, 199–204 Contextual classifiers, 148–149 Contextual constraints, 316, 317 Continuous data type, 462 Continuous-domain watershed transformation, 273–274 Contour-detection techniques, 70, 291–292, 293 Contrast in GLDS, 123 in NGTDM, 125 in SFM, 127 Contrast adjustment, manual, 227
Index
512
Contrast enhancement, 14, 54, 228, 239, 244, 434 at breast periphery, 320 detection performance of, 265 in magnetic-resonance breast imaging, 40 via fuzzy-logic techniques, 56 Contrast-limited ARE, 227 Contrast stretch, 55 Control points, 345 Conventional classifiers, 60–61 Convolution neural network classifier, 7–9 Coordinate system, for object localization in mammograms, 31 Coordinate transformations, 69 Cornelis, Jan, 271, 315, xi Coronary artery disease, 66 Correlation, 373 in amino acid sequences, 364 Cortical connectivity application of estimation methods to EEG data, 425–427 application to estimation of sources of self- paced movements, 419–423 application to high-resolution potential recordings, 417–419 application to real EEG data, 425–426 computer simulation of, 404–407 directed transfer function (DTP) and, 403–404 estimation of cortical source current density, 410–411 estimation using head models, 427 estimation with fMRI and EEG, 395–399 experimental design, 408 head and cortical models in, 408 methods for monitoring, 399–407 monitoring cerebral hemodynamic response by fMRI, 399–400 patterns of, 412–413 statistical evaluation of measurements, 409–410 structural equation modeling of, 400–402 study results, 415–423 task-related pattern of, 419 Cortical current density, 398 distributions of, 421 Cortical current waveforms, 412–415 Cortical models, 408 Cortical source current density, estimation of, 410–411 Costaridou, Lena, 225, 315, xi Covariance, in amino acid sequencing, 364 Cramer-Rao lower bound, 346, 355 Craniocaudal (CC) view, 2, 226, 332 vs. MLO view, 32 Crest lines, 273 Cumulants, 194–195 Current-density reconstruction, 420, 423 Curse of dimensionality, 459–460 Curve fitting methods, 324 Cusp catastrophe, 285, 287
Index
513
D da Fontura Costa, Luciano, 363, xi Data and scene model, 320–321 Data sequence, 373 Data sets, 449 in carotid plaque analysis, 93 in FROC analysis, 21–23 Database contents, 456 Database effects, 458 Database generation, 455, 456–458 Database size, 456–457 Decision boundaries, 151 Decision rules, 459 Decision thresholds, 447 Decision tree classifiers, 144–145 Deformation maps, 343–344 Deformation model estimation entropy-based similarity measure, 353–354 intensity-based registration, 348 local structure, 350–352 multiresolution pyramid, 349–350 template matching, 348–349 Degeneracy problems, 453 Denoising, 262. See also Wavelet denoising Density ridges, 275 Density-weighted contrast enhancement (DWCE) filter, 15, 17 Deoxyhemoglobin, 399 Dependent variables, 463 Descending maximal lines, 273 Detection errors, 2 Detection performance, 265. See also Observer performance evaluation Detection schemes, 55 Detective quantum efficiency (DQF), 4 Diagnostic accuracy, 437 Diagnostic decision making, CAD systems in, 52 Diagnostic thinking efficacy, 437 Difference-image technique, 5, 6 Difference-of-Gaussian band-pass filter, 5, 6 Differentiable properties, 344 Digital Database for Screening Mammography (DDSM), 229 Digital image enhancement, 226 Digital imaging systems, 53 Digital phantoms, 242, 243 Digital-subtraction mammography, 40 Dimensionality, curse of, 459–460 Directed complex networks, 371, 373–376 Directed graphs, 367, 368 Directed transfer function (DTP), 403–404 ANOVA results on relative error from, 418 computer simulation results for, 415–417, 424–425
Index
514
cortical connectivity patterns obtained with, 414, 425 estimation of cortical connectivity via, 424 signal generation for, 406 structural evaluation of connectivity measurements, 409–410 Discrete-case watershed transformation, 274 Discrete dyadic wavelet transform, 229–230, 232, 326 Discrete matching deformation, 352 Discrete rating scales, 450 Displacement field model of registration, 355 Displacement variograms, 358 Distance measures, 139 for hierarchical clustering scheme, 205 Distributed classification methods, 171 model-based search methods, 174–175 particle filters, 172–174 Divide lines, 272 Double reading, 3 Dual-energy contrast-enhanced digital-subtraction mammography, 40 Dual problem, 152 3DVIEWNIX, 446 Dynamic contour models, 69, 291–292 experimental results, 297–303 in scale-space algorithm, 291, 298, 300–303 E ECG-gated image acquisition, 73 Echo-contrast injection, 68 Echogenic plaque, 91, 92 Echolucent plaques, 91 Ecological interactions, 365 Edge-detection filters, 54 Edge gradient, 5 Edge preservation, with noise reduction, 233 Effective connectivity, 397 Effectiveness, 435 Efficacy, 435 measurements of, 436 six-tiered hierarchical model of, 436 Eigen-decomposition, 159 Electrocardiogram signals, AR modeling technique and, 187 Electroencephalography (EEG), 396. See also Multimodal integration application of connectivity estimation methods to, 425–426, 426–427 application to estimation of sources of self- paced movements, 419–423 recordings of cortical connectivity, 408–409 Elementary regions, 74 Energy minimization, 332 Entropy, in GLDS, 124 Entropy-based similarity measure, 353–354 Environment conditions, 462 Equality constraints, 187 constrained-optimization formulations with, 196–198
Index
515
Equations extended Yule-Walker system of, 192–196 inequality constraints, 199 one-dimensional autoregressive modeling, 187 two-dimensional autoregressive modeling, 188–189 Yule-Walker system, 189–192 Error analysis, 441 Error estimates, 437 Error rate, 156, 459 Estimation of distribution algorithms (EDA), 175 Euclidean shortening, 68, 283 Evaluation methodologies, 64–65, 433–436 Event-related potential (ERP) data, 408–409 Experimental studies, 436 Expert observers, 440 Extended Yule-Walker system of equations, 192–196, 214 External elastic membrane (EEM) detection, 69–70 Extracted SGLDM features, 123 F False-alarm rate, 447 False negatives (FNs), 438, 460 False positive fraction (FPF), 441, 446, 447 False-positive reduction, 33, 34 rule-based, 7 using clustering, 9 using convolution neural network classifier, 7–9 False positives (FPs), 4–5, 19, 23, 438, 443, 460 classification systems for reduction of, 59–64 feature analysis to reduce, 57–59 number of marks per image, 11 Fast Fourier transform (FFT), 187 Feature extraction, 5, 19–21, 30, 52–53, 57–59, 115–116 in carotid artery plaque studies, 97–99, 103–108 in proposed plaque analysis, 115 and risk of misclassification, 58 Feature selection, 115–116 in carotid artery plaque analysis, 99 Feed-forward multilayer structure, 146 Fibroadenomas, 13 Filmless technology, 53 First-order statistics, 74 Flooding lines, 292 Flooding watershed methods, 276 Flow-line oriented watershed methods, 276 Fold catastrophe, 285, 287 Forced-choice studies, 455 Fotiadis, Dimitrios L, 51, xi Fourier power spectrum (FPS), 93, 98, 106, 130 angular sum, 130 radial sum, 130 Fractal dimension texture analysis (FDTA), 93, 98, 106, 129–130
Index
516
Fractal geometry methods, 56 in IVUS images, 74 Free-response receiver operating characteristic (FROC), 9, 64, 437. See also FROC analysis curves of, 158 Friedman two-way nonparametric test, 455 FROC analysis, of detection accuracy, 9–12, 21–27 FROC test, 453–454 FROCFIT software, 454 Full-field digital mammography (FFDM), 4, 435 Functional connectivity, 397 Functional magnetic resonance imaging (fMRI), 395, 397 application to estimation of sources of self- paced movements, 419–423 data computed from, 422 monitoring cerebral hemodynamic response with, 399–400 Fusion analysis, 15, 35–36 in IVUS images, 75–76, 77 Fuzzy c-means, 141–142 Fuzzy logic techniques, 56, 60, 61–62 Fuzzy region-growing method, 14 G Gaussian noise, additive, 201 Gaussian RFB kernel functions, 155 Gaussian scale-space, 279–281 Gaussian smoothing, 68 Gene activation dynamics, 363, 364 Gene-expression networks, 372 Generic events, 287–289, 290 Genetic algorithm (GA), 8, 59 Genomic profiling, 158–159 Genomic regulatory systems, 371 Geodesic influence zone, 274 Geographic information systems (GIS), 342 Geometrical modeling, 30–33, 34, 36–39 Geostatistical spatial modeling, 343, 359–360 Gibbs distribution, 317 Gibbs energy function, 318 Global deformation, 344 Global intensity threshold, 56, 57 Global performance index, 441, 451 Global wavelet mapping, 237–238 Gold standard, 457 clinical trials as, 435 Gradient-dependent diffusion, 283 Gradient descent, 61 Gradient magnitude evolution, 284–285, 287, 288, 290 watershed lines during, 285–287 Gradient magnitude mapping, 234, 239, 240, 242 Gradient orientation, estimation in skin thickness measurement, 325–326 Gradient vector, 321 Gradient watersheds, 276–278, 290–291
Index
517
dynamics in scale-space, 292–293 and hierarchical segmentation in scale-space, 290–291 Gram-Schmidt procedure, 164 Graph-based analysis, 363–365 Graph evolution, 376 Graph percolation, 366 Graph-searching method, 69–70 Graph theory, 367–369 and statistical physics, 366 Graphs directed, 367 dynamical evolution of, 365 Gray-level-based texture descriptors, 74 Gray-level difference (GLD), 58, 93, 97, 105 statistics, 123–124 Gray-level run length (GLRL), 58 Gray-level thresholding technique, 6 Gray scale median (GSM), 108 of ultrasound plaque images, 92 Greatest variance, 159 Greenhouse-Gasser correction, 407 Ground truth, 444, 451, 457 Growing techniques, 55, 69 Gstat software, 347 H Hadjiiski, Lubomir, 1, xi HAND100 image set, 297–303 Hanging protocol, 461 Haralick’s method, 74 Hard plaque, 74 Hausdorff distance, 445 Hierarchical clustering, for AR modeling, 204–205 Hierarchical segmentation scheme, 284 gradient magnitude evolution in, 284–285 gradient watersheds and, 290–291 intelligent interactive tool, 295–297 linking across scales, 287–290 salient-measure module, 291–294 stopping criterion stage, 294–295 watershed lines in, 285–287 High-resolution event-related potential recordings, 417–419 Histogram-based analysis, 57 Histogram modification/equalization, 54 Hit rate, 447 Hopfield neural networks, 71, 148 Hubs, 366–367, 370 Human cortical connectivity. See Cortical connectivity Hyperechoic plaques, 92 Hyperstacks, 282–283, 298, 304 Hypoechoic carotid plaques, and stroke risk, 92 Hypothesis definition, 448
Index
518
I Ill-defined masses, 208 Image acquisition in carotid plaque analysis, 114 in data and scene model, 320 Image analysis, 434 Image-based global registration, 347 Image classification methods, 137–138 distributed methods, 171 model-based search methods, 174–175 particle filters, 172–174 general, 138 basic supervised classifiers, 142–145 contextual classifiers, 148–149 neural networks, 145–148 unsupervised clustering algorithms, 138–142 modern advances in, 149 classifier ensembles, 166–171 independent component analysis, 161–166 kernel-based methods, 149–161 Image-difference technique (IDT), 156 Image display factors, 441 Image enhancement, 226, 435 Image gradient, estimation or, 325 Image-processing techniques, 52, 54, 434 evaluation strategies for, 433–436 in IVUS images, 68–69 Image quality, in IVUS sequences, 73 Image registration authors’ approach to, 347 deformation maps in, 343–344 deformation model estimation, 348–355 and geostatistical spatial modeling, 359–360 landmark-based, 341–343, 345–346 landmark detection and location, 346–347 local registration, 355–357 variogram estimation, 354–355 Image segmentation, 441, 444 Image understanding, 434 ImageChecker, 65 Incidence rate, 439 Indegree distributions, 367 in amino acid sequencing, 379, 381 dilog plot of, 386, 388, 390 for rat, 390 for xenopus, 388 for zebra fish, 386 Independent component analysis, 138, 161–166 Independent reading mode, 450 Independent variables, 463 Individual mass scoring, 26
Index
519
Inequality constraints, 186 constrained optimization with, 199–204 Insight Segmentation and Registration Toolkit (ITK), 446 Institute Tecnologie Avanzate Biomediche (ITAB), 419 Integrated CAD systems, 65–66 Intelligent interactive tools, 295–297 Intensity-based registration, 348 Intensity channel insensitivity, 348 Intensity windowing (IW), 227, 248 Inter-observer variability, 444 Interactivity, automated, 295–297 Internal carotid artery stenosis, 88 Interval-change analysis, 5, 29 Intravascular ultrasound (IVUS) images, 66–67 automated methods for ROI detection, 68–72 image preprocessing for, 68–69 image segmentation, 69–72 interpretation of, 67–68 limitations in analysis of, 73 plaque characterization in, 73–75 Intrinsic stationarity, 354, 360 Ischemic cerebrovascular events, association with carotid artery ultrasound plaque, 92 ISODATA, 140–141 Isoechoic plaques, 92 Iterative dichotomize 3 (ID3) algorithm, 144 Iterative watershed methods, 274–276 J Just-noticeable-difference-guided ACE, 227, 455 K k-means clustering, 138–140 for AR modeling, 205–206 k-nearest neighbor classifiers, 138, 143–144 Kallergi, Maria, 433, xii Katartzis, Antonis, 315, xii Kendall’s coefficient of consistence, 455 Kernel-based classification methods, 149 kernel principal-component analysis, 159–161 support vector machines, 149–159 Kernel function, 63 KNN classifier, 99, 101, 116–117 classification results of, 111–112 diagnostic yield, 113, 114 Kobonen neural network, 148 Kriging. See Ordinary Kriging; Universal Kriging Kurtosis, 119, 163 Kyriacou, Efthyvoulos, 87, xii L Labeled landmarks, 346
Index
520
LABMRMC algorithm, 451 Lagrange multipliers, 151, 153 Landmark-based image registration, 341–343, 345–346 authors’ approach to, 347 deformation maps in, 343–344 deformation model estimation, 348–355 and geostatistical spatial modeling, 359–360 local registration, 355–357 variogram estimation, 354–355 Landmark-based local registration displacement field model, 355 ordinary Kriging prediction and, 356–357 Landmark detection/location, 346–347 landmark detection and location in, 346–347 Laplacian-Gaussian (LG) edge detector, 16, 17, 58, 228, 273, 285, 322 Laws’s texture energy measures (TEM), 74, 93, 98, 106, 128 Lee, Sarah, 185, xii Levenberg-Marquardt iterated method, 324 Limited adaptive gain, 239 Linear discriminant analysis (LDA), 16 stepwise method, 20 Linear enhancement, 237, 238, 239 Linear filters, 187 Linear-inverse estimates, 422 Linear regression analysis, 446 Linear-scale evolution, 283 Linear scale-space, 279–281 Linear unsharp filtering, 55 Linking schemes, 282, 290 across scales, 287–291 Lipschitz exponents, 233 LISREL freeware software, 415 Local performance index, 442, 451 Local-range modification (LRM), 227 Local registration, landmark-based, 347 Local structure, 345, 350–352 Local thresholding criteria, 57, 323 Locally adaptive wavelet contrast enhancement, 225–226. See also Wavelet contrast enhancement Location-specific receiver operating characteristic (LROC), 64 Long run emphasis, 74 Longitudinal analysis methods, 464 Longitudinal catheter twist, 76, 77 Low-contrast regions, suppression of, 17 LROC test, 453 Lumen, defining border of, 66 Lumen/intima border, 67 Lung cancer imaging, 435, 457 M Machine learning, 166 statistical approaches to, 138
Index
521
MAFC test, 454 Magnetic resonance images segmented samples, 298–311 three-dimensional multiscale watershed segmentation of, 271–272 Magnetoencephalography (MEG), 396 Majority voting, 102 in carotid plaque analysis, 95 Malignant masses mammograms with, 210–212 nonspiculated, 14 Mammex TR, 66 Mammographic image-enhancement methods, 226–229 Mammographic interpretation, 2, 247, 248–262 comparison of current and prior mammograms, 39 Mammographic samples, 254–257 Mammographic size, 23 Mammographic views with malignant mass, 210–212 relationship between structures in multiple, 29–30 Mammography, 2, 155 applying autoregressive modeling to, 207–214 computer-aided methodologies in, 53, 226 measurement of skin thickness in, 315–316, 337 texture analysis of, 207–210 texture characterization of, 187 Mammography Quality Standards Act (MQSA), 21 MammoReader, 65 Manual contrast adjustment, 227 MAP estimation, 331 Margin of separation, 151 Margins, well-circumscribed vs. ill-defined, 13–14 Markov random field (MRF) model, 57, 315–316, 327, 337, 364 labeling scheme, 316–318, 328–331 mammographic image analysis via, 318–319 Mass detection by CAD systems, 226, 443 comparison of one- and two-view analysis, 37–39 fusion analysis, 35–36 geometrical modeling in, 30–33, 36–37 methods used in, 30–36 one-view analysis in, 33–35 study results, 36–39 two-view analysis, 35 with two-view information, 27–39 Mass-detection algorithm, 23 performance of, 25–27 Mathematical concepts in graph theory, 367–369 probabilistic concepts, 369 random graph models, 369–370 small-world and scale-free models, 370 Mathematical error analysis, 441
Index
522
Mathematical landmarks, 345 Mathematical morphology, 272 Mattia, Donatella, 395, xii Maximization of posterior marginals, 319 Maximum entropy hypothesis, 366 Mean absolute contour distance (MACD), 445–446 Mean intensity values, 57 Mean value, 118 in GLDS, 124 MedDetect, 66 Media/adventitia border, 67 Median filtering, 68, 322 Median value, 119 Medical decision trees, 144–145 Medical-image analysis clinical performance indices, 438–440 clinical study designs for, 436–438 evaluation strategies for, 433–436 nonobserver evaluation methodologies, 440–446 observer evaluation methodologies, 447–455 study power and biases, 455–464 validation models of, 436–438 Medical Image Perception Society (MIPS), 436 Medical images, watershed segmentation experimental results, 300–303, 304–311 Medical technology validation, 435 Mediolateral oblique (MLO) view, 2, 22, 226 vs. craniocaudal view, 32 Metabolic networks, 364, 371 MIAS database, 208 Microcalcification characterization, 54, 58 Microcalcification clusters, 30, 54, 251 differences in confidence levels, 261 frequency of benign and malignant, 259 ROC curves for, 252 sample of benign, 263 sample of malignant, 264 volume, density, morphology, and pathology of, 246 Microcalcifications, 226 computerized detection of, 5–13 and effects of computer-aided detection on radiologists’ performance, 12–13 false-positive reduction using clustering, 9 using convolution neural network classifier, 7–9 FROC analysis of detection accuracy, 9–12 methods for computerized detection, 6–9 preprocessing technique, 6 rule-based false-positive reduction of, 7 segmentation of, 6 Military imaging applications, 458 Minimum-distance classifiers, 142 in amino acid sequences, 377 Minimum-error thresholding technique, 323
Index
523
Misclassification, and number of features, 58 Missed cancers, 2 Model-based search methods, 171, 174–175 Model estimation, 347 Morphological analysis, 97, 99 of ultrasound images of carotid plaque, 87–93, 109, 131–132 Morphological filters, 54, 187 Morphological opening operator, 323 Morphological-reduction stage, 19, 33, 34, 35 Morphology characterization, 253, 258 Morphometrics, 345 Movement-related potentials, 408–415 MRF-based mammographic image analysis, 318–319 Multi-reader studies, 451 Multichannel filtering, 227 Multimodal integration, 422 application to estimation of sources of self- paced movements, 419–423 of fMRI and high-resolution EEG, 395–399, 426–427 methods for, 399–407 Multiple alternative forced choice (MAFC) test, 454 Multiple image information, 40 Multiple observers, 444 Multipoint-rank-order studies, 455 Multiresolution enhancement methods, 227 Multiresolution pyramid, 349–350 Multiscale image-segmentation schemes, 281, 325 current state of art, 282–284 design issues for, 281–282 Multiscale watershed segmentation, 271–272 Multiscale wavelet decomposition, 316 Multiscale wavelet processing, 262–265 Multivariate autoregressive (MVAR) model, 398, 415 Multivariate Burg algorithm, 406 N Nasuto, Slavomir J., 137, xii National Cancer Institute, 436 National Library of Medicine (NLM), 446 Negative predictive value (NPV), 438, 439 Neighborhood gray-tone difference matrices (NGTDM), 74, 93, 98, 103, 105, 124–125 busyness in, 125 coarseness in, 125 complexity in, 126 contrast in, 125 strength in, 126 Neighborhood system, 317 Neocortical processes, 396 Neural networks, 60–62, 138, 145 in detection and classification of carotid plaque, 88, 95, 102 supervised, 145–148 unsupervised, 148 Neuronal firing rates, 426
Index
524
Nicolaides, Andrew, 87, xii Nine-node kernel, 389 Nipple region, exclusion in estimation of normals to breast border, 324–325 Node degrees, 367 densities, 381 as function of weight threshold, 378 Noise due to digitization of film, 53 introduction by global intensity threshold techniques, 56 measurements of, 436 suppression of, 54 Noise equalization, 262 Noise-filtering techniques, 233 Noise overenhancement, 227 Noise reduction, 228, 434 in IVUS images, 68 Noise suppression, 322 by wavelet shrinkage, 233–235 Nominal data type, 462 Non-Gaussian inference problems, 138, 171 Non-ROC tests, 435 Nonlinear dynamics, in gene expression, 370 Nonlinear mapping, 153, 240 Nonobserver evaluation methodologies, 435, 436, 440–441, 446 computer FROC test, 442–444 computer ROC test, 441–442 segmentation validation tests, 444–446 Nonparametric classifiers, 61, 248 Nonspiculated masses, 14 performance curves for, 28 Nonsymmetric AR model coefficients, 198, 203, 204 with clustering schemes, 209 Nonuniform rotational distortion, 73 Normal landmarks, 345 Normalized convolution, 351 Normalized radial length (NRL), 19 Nucleotide sequences, 364 Nugget variograms, 359 Nuttall-Strand method, 406 O Object correspondence score, 34 Object matching, 30 Object refinement, 17–18 Observational studies, 437 Observer evaluation methodologies, 435, 436, 447 AFC and MAFC tests, 454 FROC test, 453–454 LROC test, 453 preference tests, 454–455 ROC test, 447–453 Observer experiments, 437
Index
525
Observer performance evaluation, 247–248 case sample, 245–247 detection task results, 249–253 morphology characterization task results, 253 pathology classification task results, 253–262 wavelet analysis, 245 Observer preference analysis, 454 Observer training, 461–462 Observers, defining number of, 448–449 One-dimensional autoregressive modeling, 186, 187–188 third-order cumulants with, 194 One-view analysis comparison with two-view analysis, 37–39 mammographic, 33–35 Open-source software, 357 Ordinary Kriging, 358 prediction of displacement fields, 356–357 Outdegree distributions, 367 in, 386 in amino acid sequencing, 379, 381 Overlap fraction equation, 33 Overlap reduction, 34 Overlap threshold, 23 Oversegmentation, 278, 279 P Paired t-tests, 446, 463 Panayiotakis, George, 225, 315, xii Pantziaris, Marios, 87, xii Papadopoulos, Athanassios N., 51, xii Paquerault, Sophie, 1, xii Partial area indices, 442 Partial sill variograms, 359 Particle filters, 138, 172–174 Partition function, 318 Parzen window, 143 Pathology classification task, 253–262 Patient outcome efficacy, 437 Pattern-recognition approach, 364, 434, 458, 459 to carotid plaques, 93 unsupervised classification and, 138 Pattichis, Constantinos S., 87, xii Pattichis, Marios S., 87, xiii Pearson’s correlation coefficients, 464 Per-case approach in mass detection algorithm, 25–27 mass detection performance at three marker rates, 28 Per-mammogram performance curves, 25–27 Performance curves FROC analysis, 25–27 for spiculated and nonspiculated masses, 28 Performance metrics, 436, 458
Index
526
Performance parameters, 437 estimation of, 455, 460–461 Periodicity, in SFM, 127 Petrick, Nicholas E., 1, xiii Phantom studies, 444 Pixel distribution characteristics, 436 Plaque characterization in IVUS images, 73–75 defining borders of, 66 distinguishing from blood in lumen, 68 Plaque classifications, 88, 99–101, 116–117 Plaque composition, 73–74 Plaque identification and segmentation, 95–96 Plissiti, Marina E., 51, xiii Polynomial kernel functions, 155 Positive predictive value (PPV), 438, 439 Positron-emission tomography (PET), 400 Pratikakis, Ioannis, 271, xiii Preclinical evaluation stage, 436 Prediction error (PE), 57 Prediction model, for radial distance of object in two views, 36–37 Prediction rules, 459 Predictive value, 439 Preference methodology, 454 Preference tests, 454–455 Prefrontal information flows, 412, 417 Premotor information flows, 412 Preprocessing techniques, 29, 54–56 in breast imaging, 6 in carotid plaque analysis, 114 and segmentation, 16–17 Presentation setup, 461–462 Prevalance relationship to predictive value, 439 vs. incidence rate, 439–440 Primal problem, 151 Principal component analysis (PCA), 99, 110, 138, 159–161 kurtosis based, 163 Prior probability of labeling, 330–331 Probabilistic concepts, 369, 458 Probability density functions, 436 Promam, 66 Prospective studies, 436 Protein analysis, 364, 372 Protein folding, 364 Proximity criterion, 288, 290 Pseudo-landmarks, 345 Pseudocontinuous rating scales, 450 Pullback path, 75–76 Q Quadratic spline wavelet, 231, 233
Index
527
Quality control, 457–458 Quasi-landmarks, 346 Quasicontinuous rating, 450 R Radial profile, 74 Radiologist accuracy, improved by CAD, 4, 12–13 Random network models, 365, 369–370 Random networks, 365 Random vector, 369 Range variograms, 359 Rat, 376 amino acid sequences in, 378–379 average clustering coefficients for, 380 average sequence length in, 382 in- and outdegree distributions for, 390 ten-node kernel for, 391 Rating methods, 450 Rating-scale studies, 455 Reader expertise, 449 Reading environment, 451 Reading-order effects, 450 Receiver operating characteristic (ROC) methodologies, 3, 64, 228, 437, 446 ROC curves for MC cluster-detection task, 252 Receptive fields (RFs), 279 Recursive immersion process, 275 Redundant dyadic wavelet transform, 229, 230–233 Region-based enhancement techniques, 56 Region growing, 69. See also Growing techniques Region of interest selection in MRF measurement, 327–328 in studies of cortical connectivity, 412 Regional performance index, 441, 451 Regional-registration technique, 5, 15, 30 Registration algorithms, 344 Regression methods, 149 Relative error, 416–417, 418 Relative performance measures, 444 Relaxed decision criterion, 447 Reliability, and automated methods, 68 Remote sensing, 342 Reporting mechanisms, 462 Reproducibility, and automated methods, 68 Resilient back-propagation, 61 Restoration techniques, 441 Retrospective studies, 436, 437 Ridge flow models, 283 Ridges, in watershed analysis, 272 ROC analysis, 249 software tools for, 450 ROC curves, 437, 440, 448
Index
528
ROC study design, 452 ROC test, 435, 447–453 ROCFIT software, 454 ROCKIT software, 451 ROI detection, in intravascular ultrasound images, 68–72 Root mean square (RMS) error, 335 Root-mean-square (RMS) noise, 6 Roughness, in SFM, 127–128 Ruiz, Virginie R, 137, xiii Ruiz-Alzola, Juan, 341, xiii Rule-based false-positive reduction, 7, 16, 33, 60 S Saddle points, 151, 288 Sahiner, Berkman, 1, xiii Sahli, Hichem, 271, 315, xiii Sakellaropoulos, Philipos, 225, xiii Salient-measure module, 291–294 Salinari, Serenella, 395, xiii Sample size, 440, 449, 456 Sampling approaches, 171 Scale linking across, 287–290 notion of, 279 Scale-free graph models, 365, 370 Scale invariance, 281 Scale-space dynamics of gradient watersheds in, 292–293 and estimation of breast border, 325–326 gradient watersheds and hierarchical segmentation in, 290–291 linear, 279–281 multiscale image-segmentation schemes, 281–284 notion of, 279 and segmentation, 279–284 Scale-space generator, 282 Scale-space primal sketch, 282 Scale-space sampling, 281 Scalp signal-to-noise ratio, 424 Screen-film images, vs. digital images, 229 Second Look Digital/AD, 65 Second opinion, CAD systems providing, 53 Second-order stationarity, 360 Seed objects, 17, 33, 57 Segmentation techniques, 56–57, 435, 441 automatic, 295–297 deep structure and catastrophe theory, 284 hierarchical, 284–297 for IVUS images, 69–72 and preprocessing, 16–17 scale-space and, 279–284 three-dimensional, 72 Segmentation validation tests, 444 446
Index
529
Selective enhancement, 228 Self-connections, 383 Self-organizing feature map (SOM), 95, 110,112. See also SOM classifiers Self-paced movements, 419–423 cerebral areas active during, 420 Sensitivity, 23, 33–34, 438, 439, 440, 441, 446 of breast cancer detection, 3 improvements by CAD systems, 53 of two-view analysis, 38 using two views vs. one view, 4 Sensorimotor information flows, 412, 417 Sequential Forward Selection, 59 Sequential reading mode, 450 Sequential watershed methods, 276 Shape-analysis applications, 359 Shape parameters, 93, 97, 98, 106, 130–131 Side branch attachments, 73 Signal/certainty philosophy, 351 Signal detectability, 454, 455 Signal detection theory, 447 Signal generation for DTP methodology, 406 for SEM methodology, 404–405 Signal of interest, 447, 450 Signal processing, 364 Signal-to-noise ratio AFC and MAFC tests in, 454 in brain connectivity pattern estimation, 423, 424 for DTP methodology, 406 low in CAD systems, 53 preprocessing to enhance, 6 for SEM methodology, 404 Significance measures, 282 Sill variograms, 359 Similarity measures, 353–354 Simulated annealing (SA), 8, 10, 332, 336 Simulation studies, 444 Single-energy contrast-enhanced digital- subtraction mammography, 40 Single-measurement data, statistical tests for, 463 Single-photon-emitted tomography (SPECT), 400 Single-reader studies, 451 Single-view mammograms feature analysis, 34 mass-detection program for, 15–16 Singularities, 273, 285 Skeleton by influence zone, 274 Skewness, 119 Skiadopoulos, Spyros, 225, xiii Skin external border of, 322–324 radiographic and geometrical properties of, 320–321
Index
530
Skin border, 322–324 Skin estimation/extraction methods, 322–331 Skin features, 316, 321 estimation of, 322–326 Skin imaging process, 322 Skin-region extraction, 327–331 Skin retraction, 315, 316 Skin thickness measurement, 331–332 along breast border, 333, 334, 335, 336 background of, 316–319 clinical evaluation, 332–336 data and scene model, 320–321 estimation and extraction methods, 322–331 estimation of gradient orientation in, 325–326 in mammography, 315–316 and MRF-based mammographic image analysis, 318–319 and MRF labeling, 316–318 radiographic and geometrical skin properties and, 320–321 results of, 331–335 Small-sample estimation bias, 458 Small-world models, 365, 366, 370 Snake properties, 71 Societal efficacy, 437 Society of the Photo-optical Instrumentation Engineers (SPIE), 436 Soft-copy digitized mammography, 451 ROC study design for, 452 Soft-copy display, 461 Soft plaque, 74 Soft-thresholding, 235 Software packages, 454, for ROC analysis 450–451 SOM classifiers, 95, 99, 100–101, 116 classification results of, 108–111 Somatosensory areas, 412, 420 Spatial-dependence models, 358 Spatial filtering methods, 227 Spatial gray-level-dependence matrices (SGLDM), 93, 97, 104–105, 119–123 Spatial regularization, 351 Specificity, 438, 439, 440, 441, 446 of breast cancer detection, 3 using two views vs. one view, 4 Spiculated masses, 14, 208 performance curves for, 28 Standard deviation, 119 estimates of, 437, 457 Stathaki, Tania, 185, xiv Static architectures, 167 Statistical analysis, 440 Bayesian quadratic and linear classifiers, 60 for brain connectivity studies, 407 of cortical connectivity models, 407 in IVUS image analysis, 74
Index
531
in machine learning, 138 observer performance evaluation, 248–249 and study power/biases, 462–464 Statistical-feature matrix (SFM), 93, 98, 105, 126–127 coarseness in, 127 contrast in, 127 periodicity in, 127 roughness in, 127–128 Statistical features (SF), 97, 118–119 of carotid plaques, 93 Statistical physics and complex networks, 370 and graph theory, 366 Statistical significance tests, 444 for two-treatment evaluation, 457 Steepest descent (SD), 8 Stent detection, 70–71 Stepwise discriminant analysis, 59 Stochasticity, in network dynamics, 370 Stopping-criterion stage, 294–295 Strength, in NGTDM, 126 Strict decision threshold, 447 Stroke assessment using carotid plaques, 87–93 and hypoechoic carotid plaques, 92 incidence of, 88 and plaque morphology, 116 Structural equation modeling (SEM), 397, 400–402 ANOVA results on relative error from, 416 application to movement-related potentials, 408–415 computer simulation results for, 415, 423–424 patterns obtained with, 413 signal generation for, 404–405, 406 structural evaluation of connectivity measurements, 409–410 Structure tensors, 350 Student Neuman-Keul’s test, 455 Study design parameters, 441–449 Study power and algorithm training, 458–460 biases affecting, 455–456 and database effects, 458–460 and database generation, 456–458 and database size, 456–457 and estimation of performance parameters and rates, 460–461 and ground truth vs. gold standard, 457 and presentation setup, 461–462 and quality control, 457–458 and statistical analysis, 462–464 Suarez-Santana, Eduardo, 341, xiii Successive enhancement learning (SEL) scheme, 154 Summary rating, 453 Superficial structure-dynamics of contours, 291–292
Index
532
Supervised classifiers, 138, 142 Bayesian classifiers, 142–143 decision tree, 144–145 k-nearest neighbor, 143–144 minimum-distance classifiers, 142 Supervised neural networks, 145–148 Support-vector machines, 60, 62–64, 138, 149–159 Surround region dependence matrix (SRDM), 58 Symmetric AR model coefficients, 198, 201, 202 with application of clustering schemes, 207 Symptomatic plaque, 88, 93, 109 segmented, 96 verbal interpretations of arithmetic values, 107 Systolic-diastolic image artifacts, 73 T Technical efficacy, 436 Template matching, 348–349 Ten-node kernels, 387 Test image demonstration, 241–245 Test sets, 21–23, 23–25 Texture-based classification, 19, 35, 103 of carotid plaque ultrasound images, 87–93 in IVUS images, 70, 74 Texture characterization, 214 with autoregressive models, 185–188 of mammography, 187, 207–210, 214 Texture-feature-extraction algorithms, 97 Fourier power spectrum (FPS), 130 angular sum, 130 radial sum, 130 shape parameters, 130–131 fractal dimension texture analysis (FDTA), 129–130 gray-level-difference statistics (GLDS), 123 angular second moment, 123–124 contrast, 123 entropy, 124 mean, 124 Laws’s texture energy measures (TEM), 128 morphological features, 131–132 neighborhood gray-tone-difference matrix (NGTDM), 124–125 busyness, 125 coarseness, 125 complexity, 126 contrast, 125 strength, 126 spatial gray-level-dependence matrices (SGLDM), 119–123 statistical-feature matrix (SFM), 126–127 coarseness, 127 contrast, 127 periodicity, 127 roughness, 127–128
Index
533
statistical features kurtosis, 119 mean value, 118 median value, 119 skewness, 119 standard deviation, 119 Texture features, 104–106 Texture measures, 121–123 Texture-spectrum method, 74 Therapeutic efficacy, 437 Thin-plate splines, 359 Third-order statistics, 192–196, 214 Three-dimensional artery reconstruction, 66–67, 118 models of, 74–78 Three-dimensional multiscale watershed segmentation, 271–272 Three-dimensional ultrasound imaging, 40 Thresholding techniques, 69 Time gain compensation (TGC) curve, 95 Tissue-mixture densities, 319 Topographical distance, 273, 274 TP response reaction (TPF), 447 Training data set, 21, 23–25, 60 for CAD analysis, 12 for SOM classifier, 100 Training methods, 61 for carotid artery plaque analysis, 114–115 True negatives (TNs), 438, 460 True positive fraction (TPF), 441, 446, 451 True positives (TPs), 5, 23, 64, 438, 460 Truncated nonsymmetric half-plane region of support (TNSHP), 189 Two-dimensional autoregressive modeling, 186, 188–189, 214 results from constrained optimization with equality constraints, 198 Two-stage multilayer neural network (TMNN), 156 Two-treatment evaluation, 456–457 Two-view fusion method, 15, 34. See also Fusion analysis comparison with one-view analysis, 37–39 Two-view information breast mass detection with, 27–30 comparison of one- and two-view analysis, 37–39 fusion analysis of, 35–36 geometrical modeling of, 30–33 methods used in, 30–36 one-view analysis and, 33–35 two-view analysis, 35 U Ultrasound imaging of carotid plaque for stroke assessment, 87–93 predictive value of, 91 three-dimensional, 40 vascular, 89–91
Index
534
Undirected graphs, 368 Uniform random distribution, 366 Unilateral finger extension, 420 Universal Kriging, 342 University of South Florida (USF), 22 Unlabeled landmarks, 346 Unsharp blobs, multiscale segmentation of, 282 Unsharp masking, 55, 227 Unsupervised clustering algorithms, 138 fuzzy c-means, 141–142 ISODATA, 140–141 k-means clustering, 138–140 Unsupervised neural networks, 148 U.S. Public Health Service Office on Women’s Health, 436 V Validation, 57–59 goals of, 435 of medical-imaging technology, 437 role of database generation in, 456 Validation data set, for CAD analysis, 12 Validation models, 436–438 VALMET tool, 446 Variability, reducing through segmentation validation tests, 444 Variational matching, 348–349 Variogram estimation, 354–355 Variogram shapes, 359 Vessel borders, 68 Vessel wall calcification, 73 Virtual skin, 320 Visible Human Project, 445, 446 Visual C++, 241 Visual cortex, BOLD signals in, 427 Visual grading analysis, 454 Voltage-sensitive dyes, 427 Voronoi diagrams, 283 W Warps, 343 Watershed algorithms, 275–276 Watershed analysis, 272 gradient watersheds, 276–278 oversegmentation and, 278 watershed transformation, 272–276 Watershed lines, 272, 274 during gradient magnitude evolution, 285–287 Watershed segmentation, 271–272 experimental results in, 297–311 scale-space and, 279–284 Watershed transformation, 272–276 Watershed valuation, 291–292
Index
535
Watts and Strogatz small-world models, 365 Wave2 software, 241 Wavelet analysis, 58, 227 in amino acid sequencing, 364 Wavelet-based enhancement (WE), 228 Wavelet contrast enhancement adaptive wavelet mapping, 238–241 background, 226–229 discussion, 262–265 global wavelet mapping, 237–238 locally adaptive, 225–226 materials and methods discrete dyadic wavelet transform review, 229–230 implementation, 241 redundant dyadic wavelet transform, 230–233 test image demonstration and quantitative evaluation, 241–245 wavelet contrast enhancement, 237–241 wavelet denoising, 233–237 observer performance evaluation, 245 case sample, 245–247 observer performance, 247–248 results, 249–262 statistical analysis, 248–249 Wavelet decomposition, 156 Wavelet denoising, 233 adaptive wavelet shrinkage, 235–237 noise suppression by wavelet shrinkage, 233–235 Wavelet filtering, 6, 14, 56 Wavelet shrinkage, 235 Wavelet transform methods, 227–228, 232, 241, 325, 326 Weight-filtered deformation, 352 Weight matrix thresholding, 376 average clustering coefficients and, 380 and average sequence length, 382 and maximum cluster size, 384 and node degree, 378 Weighted averaging based on confidence measure, 102–103 in carotid plaque analysis, 95 Weighted networks, 373–376 Well-defined masses, 208 Westin, Carl-Frederik, 341, xiv White matter, watershed segmentation of, 272, 280, 300–311 Wilcoxon signed ranks test, 248, 455 results for microcalcification cluster detection, 251, 262 Wilks’s lambda criterion, 20 Williams index, 446 Workstation-user interface, 461 X Xenopus, 376 amino acid sequences in, 377–378
Index
536
average clustering coefficients for, 380 average sequence length in, 382 in- and outdegree distributions, 388 nine-node kernel for, 389 Y Yule-Walker system of equations, 186, 189–192, 214 extended, 192–196 Z Zebra fish, 376 amino acid sequences in, 377 average clustering coefficients for, 380 average sequence length in, 382 in- and outdegree distributions for, 386