Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4792
Nicholas Ayache Sébastien Ourselin Anthony Maeder (Eds.)
Medical Image Computing and Computer-Assisted Intervention – MICCAI 2007 10th International Conference Brisbane, Australia, October 29 – November 2, 2007 Proceedings, Part II
13
Volume Editors Nicholas Ayache INRIA, Asclepios Project-Team 2004 Route des Lucioles, 06902 Sophia-Antipolis, France E-mail:
[email protected] Sébastien Ourselin Anthony Maeder CSIRO ICT Centre, e-Health Research Centre 20/300 Adelaide St., Brisbane, Queensland 4000, Australia E-mail: {sebastien.ourselin, anthony.maeder}@csiro.au
Library of Congress Control Number: 2007937392 CR Subject Classification (1998): I.5, I.4, I.3.5-8, I.2.9-10, J.3, J.6 LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics ISSN ISBN-10 ISBN-13
0302-9743 3-540-75758-9 Springer Berlin Heidelberg New York 978-3-540-75758-0 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12175437 06/3180 543210
Preface
The 10th International Conference on Medical Imaging and Computer Assisted Intervention, MICCAI 2007, was held at the Brisbane Convention and Exhibition Centre, South Bank, Brisbane, Australia from 29th October to 2nd November 2007. MICCAI has become a premier international conference in this domain, with in-depth papers on the multidisciplinary fields of biomedical image computing, computer assisted intervention and medical robotics. The conference brings together biological scientists, clinicians, computer scientists, engineers, mathematicians, physicists and other interested researchers and offers them a forum to exchange ideas in these exciting and rapidly growing fields. The conference is both very selective and very attractive: this year we received a record number of 637 submissions from 35 countries and 6 continents, from which 237 papers were selected for publication. Some interesting facts about the distribution of submitted and accepted papers are shown graphically at the end of this preface. A number of modifications were introduced into the selection process this year. 1. An enlarged Program Committee of 71 members was recruited by the Program Chair and Co-chair, to get a larger body of expertise and geographical coverage. 2. New key words regrouped within 7 new categories were introduced to describe the content of the submissions and the expertise of the reviewers. 3. Each submitted paper was assigned to 3 Program Committee members whose responsibility it was to assign each paper to 3 external experts (outside of the Program Committee membership) who provided scores and detailed reports in a double blind procedure. 4. Program Committee members provided a set of normalized scores for the whole set of papers for which they were responsible (typically 27 papers). They did this using the external reviews and their own reading of the papers and had to complete missing reviews themselves. Program Committee members eventually had to provide a recommendation for acceptance of the top 35% of their assigned papers. 5. During a 2 day meeting of about 20 members of the Program Committee in Sophia-Antipolis, France, borderline papers were examined carefully and the final set of papers was accepted to appear in the LNCS proceedings. A top list of about 100 papers was scrutinized to provide the Program Chair and Co-chair with a list of 54 potential podium presentations. 6. From this list, the Program Chair and Co-chair selected 38 podium presentations to create a program with a reasonable number of oral sessions and spread of content.
VI
Preface
7. Because 199 excellent contributions would be presented as posters, it was decided in consultation with the MICCAI Society Board to augment the time allocated to the poster sessions, and replace the oral poster teasers by continuous video teasers run on large screens during the conference. The selection procedure was very selective, and many good papers remained among the 400 rejected. We received 9 factual complaints from the authors of rejected papers. A subcommittee of the Program Committee treated all of them equally, checking carefully that no mistake had been made during the selection procedure. In a few cases, an additional review was requested from an independent Program Committee member. In the end, all the original decisions were maintained, but some additional information was provided to the authors to better explain the final decision. Seven MICCAI Young Scientist Awards were presented by the MICCAI Society on the last day of the conference. The selection was made before the conference by nominating automatically the 21 eligible papers with the highest normalized scores (provided by the Program Committee during the reviewing procedure), and regrouping them into the 7 main categories of the conference. A subgroup of the Program Committee had to vote to elect one paper out of 3 in each category. The 2007 MedIA-MICCAI Prize was offered by Elsevier to the first author of an outstanding article in the special issue of the Medical Image Analysis Journal dedicated to the previous conference MICCAI 2006. The selection was organized by the guest-editors of this special issue. We want to thank wholeheartedly all Program Committee members for their exceptional work, as well as the numerous external expert reviewers (who are listed on the next pages). We should also acknowledge the substantial contribution made towards the successful execution of MICCAI 2007 by the BioMedical Image Analysis Laboratory team at the CSIRO ICT Centre / e-Health Research Centre. It was our pleasure to welcome MICCAI 2007 attendees in Brisbane. This was the first time the conference had been held in Australia, indeed only the second time outside of Europe/North America, the other being MICCAI 2002 in Japan. This trend will continue with MICCAI 2010, which is planned for Beijing. The vibrant sub-tropical river city of Brisbane with its modern style and world-class conference venue was a popular choice and a convenient point of departure for delegates who took the opportunity while there to see more of the Australian outback. We thank our two invited keynote speakers, Prof. Peter Hunter from the Bioengineering Institute at the University of Auckland, New Zealand, and Prof. Stuart Crozier from Biomedical Engineering at the University of Queensland, Brisbane, whose excellent presentations were a highlight of the conference. We also acknowledge with much gratitude the contributions of Terry Peters, MICCAI 2007 General Co-Chair, whose strong connection with the MICCAI Society and past MICCAI conferences proved invaluable to us. We also note our thanks
Preface
VII
to our sponsors, without whose financial assistance the event would have been a far lesser one. We look forward to welcoming you to MICCAI 2008, to be held 4-8 September in New York City, USA, and MICCAI 2009, scheduled to be held in London, UK. October 2007
Nicholas Ayache S´ebastien Ourselin Anthony Maeder
VIII
Preface
MICCAI 2007 Papers by Topic Computer Assisted Interventional Systems and Robotics 14%
Computational Physiology 6%
Computational Anatomy Visualization and 8% Interaction 3%
General Biological Image Computing 3%
None Specified 1% Neuroscience Image Computing 8% Innovative Clinical and Biological Applications 11%
General Medical Image Computing 46%
General Medical Image Computing
Computer Assisted Interventional Systems and Robotics
Statistical image analysis Other (Computer Assisted Interventional Systems and Robotics)
9
Processing X-ray, CT, MR (anatomical, functional, diffusion, spectroscopic), SPECT,
9
39
PDEs and Level Sets methods
4
Other (General Medical Image Computing)
Instrument & Patient Localization and Tracking
3
17
21
Non linear registration and fusion 9
Morphometric and functional segmentation Image-guided robotized intervention
4
3
Methodological tools for validation 1
Grid-enabled image processing algorithms 4
Advanced Medical Robotics
Extraction of visual features : texture, shape, connectivity, motion, etc
19
Fig. 1. View at a glance of MICCAI 2007 accepted submissions based on the declared primary keyword. A total of 237 full papers were presented.
206
Africa 0% 1%
Asia
North America
Full paper submissions: 637
24 22
15% 5%
19
Asia Australia and Oceania 58
7
Europe
2
Europe
an Ta iw
el
Ja pa n Ko re a, So ut h Si ng ap or e
hi na H
on g
C
South America
Ira n
USA
Ko ng
Canada
In di a
North America 38%
Others 31
67 48
7
6
5 1
Is ra
41%
52
19
bi a ol om C
an d
ia st ra l
Ze al ew
2
N
Eg yp t
2
Br az il
1
Au
y
1
UK
rk e Tu
er la
nd
en
1
i tz
ain
ed
Sp
Sw
Sl ov
en
ia
l
y
ga
wa
rtu
or N
5
2
1
Po
ly
2
Sw
ec h
10
7
5
4
ds
1
It a
2
er lan
1
Re pu bl D e ic nm ar k Fi nl an d Fr an ce Ge rm an y Gr ee ce H un ga ry
um lg i Cz
Au
Be
str
ia
2
Ne th
8
6
Fig. 2. Distribution of MICCAI 2007 submissions (637 in total) by continent
MICCAI Young Scientist Awards
The MICCAI Young Scientist Award is a prize of US$500 awarded to the first author (in person) for the best paper in a particular topic area, as judged by reviewing and presentation (oral or poster). At MICCAI 2007, up to 7 prizes were available, in the topic areas publicised in the conference CFP: 1. 2. 3. 4. 5. 6. 7.
General Medical Image Computing Computer Assisted Intervention Systems and Robotics Visualization and Interaction General Biological and Neuroscience Image Computing Computational Anatomy Computational Physiology Innovative Clinical and Biological Applications
All current first author students and early career scientists attending MICCAI 2007 were eligible. The awards were announced and presented at the closing session of the conference on Thursday, 1st November 2007.
MICCAI 2005 Student Awards Image Segmentation and Analysis: Pingkun Yan, “MRA Image Segmentation with Capillary Active Contour” Image Registration: Ashraf Mohamed, “Deformable Registration of Brain Tumor Images via a Statistical Model of Tumor Induced Deformation” Computer-Assisted Interventions and Robotics: Henry C. Lin, “Automatic Detection and Segmentation of Robot Assisted Surgical Motions” Simulation and Visualization: Peter Savadjiev, “3D Curve Inference for Diffusion MRI Regularization” Clinical Application: Srinivasan Rajagopalan, “Schwarz Meets Schwann: Design and Fabrication of Biomorphic Tissue Engineering Scaffolds”
MICCAI 2006 Student Awards Image Segmentation and Registration: Delphine Nain, “Shape-Driven 3D Segmentation Using Spherical Wavelets” Image Analysis: Karl Sj¨ ostrand, “The Entire Regularization Path for the Support Vector Domain Description”
X
MICCAI Young Scientist Awards
Simulation and Visualization: Andrew W. Dowsey, “Motion-Compensated MR Valve Imaging with COMB Tag Tracking and Super-Resolution Enhancement” Computer-Assisted Interventions and Robotics: Paul M. Novotny, “GPU Based Real-Time Instrument Tracking with Three Dimensional Ultrasound” Clincial Applications: Jian Zhang, “A Pilot Study of Robot-Assisted Cochlear Implant Surgery Using Steerable Electrode Arrays”
The 2007 MedIA-MICCAI Prize This prize is awarded each year by Elsevier to the first author of an outstanding article of the previous MICCAI conference, which is published in the MICCAI special issue of the Medical Image Analysis Journal. In 2006, the prize was awarded to T. Vercauteren, first author of the article: Vercauteren, T., Perchant, A., Pennec, X., Malandain, G., Ayache, N.: Robust mosaicing with correction of motion distortions and tissue deformations for in vivo fibered microscopy. Med. Image Anal. 10(5), 673–692 (2006) In 2005, the prize was awarded to D. Burschka and M. Jackowski who are the first authors of the articles: Burschka, D., Li, M., Ishii, M., Taylor, R.H., Hager, G.D.: Scale invariant registration of monucular endoscopic images to CT-Scans for sinus surgery. Med. Image Anal. 9(5), 413–426 (2005) Jackowski, M., Kao, C.Y., Qiu, M., Constable, R.T., Staib, L.H.: White matter tractography by anisotropic wave front evolution and diffusion tensor imaging. Med. Image Anal. 9(5), 427–440 (2005)
Organization
Executive Committee General Chair General Co-chair Program Chair Program Co-chair
Anthony Maeder (CSIRO, Australia) Terry Peters (Robarts Research Institute, Canada) Nicholas Ayache (INRIA, France) S´ebastien Ourselin (CSIRO, Australia)
Program Committee Elsa Angelini (ENST, Paris, France) Simon R. Arridge (University College London, UK) Leon Axel (University Medical Centre, USA) Christian Barillot (IRISA, Rennes, France) Margrit Betke (Boston University, USA) Elizabeth Bullitt (University of North Carolina, Chapel Hill , USA) Albert Chung (Hong Kong University of Science and Technology, China) Ela Claridge (The University of Birmingham, UK) Stuart Crozier (University of Queensland, Australia) Christos Davatzikos (University of Pennsylvania, USA) Marleen de Bruijne (University of Copenhagen, Denmark) Rachid Deriche (INRIA, Sophia Antipolis, France) Etienne Dombre (CNRS, Montpellier, France) James S. Duncan (Yale University, USA) Gary Egan (Howard Florey Institute, Australia) Randy Ellis (Queens University, Canada) Gabor Fichtinger (Johns Hopkins University, USA) Alejandro Frangi (Pompeu Fabra University, Barcelona, Spain) Guido Gerig (University of North Carolina, Chapel Hill, USA) Polina Golland (Massachusetts Institute of Technology, USA) Miguel Angel Gonzalez Ballester (University of Bern, Switzerland) Richard Hartley (Australian National University, Australia) David Hawkes (University College London, UK) Pheng Ann Heng (The Chinese University of Hong Kong, China) Robert Howe (Harvard University, USA) Peter Hunter (The University of Auckland, New Zealand) Tianzi Jiang (The Chinese Academy of Sciences, China) Sarang Joshi (University of Utah, USA)
XII
Organization
Leo Joskowicz (The Hebrew University of Jerusalem, Israel) Hans Knustsson (Linkoping University, Sweden) Rasmus Larsen (Technical University of Denmark, Denmark) Boudewijn Lelieveldt (Leiden University Medical Centre, Netherlands) Cristian Lorenz (Philips, Hamburg, Germany) Frederik Maes (Katholieke Universiteit Leuven, Belgium) Gregoire Malandain (INRIA, Sophia Antipolis, France) Jean-Francois Mangin (CEA, SHFJ, Orsay, France) Dimitris Metaxas (Rutgers University, New Jersey, USA) Kensaku Mori (Mori Nagoya University, Japan) Nassir Navab (TUM, Munich, Germany) Poul Nielsen (The University of Auckland, New Zealand) Wiro Niessen (Erasmus Medical School, Rotterdam, Netherlands) Alison Noble (Oxford University, UK) Jean-Christophe Olivo-Marin (Institut Pasteur, Paris, France) Nikos Paragios (Ecole Centrale de Paris, France) Xavier Pennec (INRIA, Sophia Antipolis, France) Franjo Pernus (University of Ljubljana, Slovenia) Josien Pluim (University Medical Center, Utrecht, Netherlands) Jean-Baptiste Poline (CEA, SHFJ, Orsay, France) Jerry L. Prince (Johns Hopkins University, USA) Richard A. Robb (Mayo Clinic, College of Medicine, Rochester, Minnesota, USA) Daniel Rueckert (Imperial College, London, UK) Tim Salcudean (The University of British Columbia, Canada) Yoshinobu Sato (Osaka University, Japan) Achim Schweikard (Institute for Robotics and Cognitive Systems, Germany) Pengcheng Shi (Hong Kong University of Science and Technology, China) Stephen Smith (Oxford University, UK) Lawrence Staib (Yale University, USA) Colin Studholme (University of California, San Francisco, USA) Gabor Sz´ekely (ETH, Zurich, Switzerland) Russell Taylor (Johns Hopkins University, USA) Jean-Philippe Thiran (EPFL, Lausanne, Switzerland) Jocelyne Troccaz (CNRS, Grenoble, France) Bram van Ginneken (University Medical Center, Utrecht, Netherlands) Koen Van Leemput (HUS, Helsinki, Finland) Baba Vemuri (University of Florida, USA) Simon Warfield (Harvard University, USA) Sandy Wells (Massachusetts Institute of Technology, USA) Carl-Fredrik Westin (Westin Harvard University, USA) Ross Whitaker (University of Utah, USA) Chenyang Xu (Siemens Corporate Research, USA) Guang Zhong Yang (Imperial College, London, UK)
Organization
XIII
MICCAI Board Nicholas Ayache, INRIA, Sophia Antipolis, France Alan Colchester, University of Kent, Canterbury, UK James Duncan, Yale University, New Haven, Connecticut, USA Gabor Fichtinger, Johns Hopkins University, Baltimore, Maryland, USA Guido Gerig, University of North Carolina, Chapel Hill, North Carolina, USA Anthony Maeder, University of Queensland, Brisbane, Australia Dimitris Metaxas, Rutgers University, Piscataway Campus, New Jersey, USA Nassir Navab, Technische Universit¨at, Munich, Germany Mads Nielsen, IT University of Copenhagen, Copenhagen, Denmark Alison Noble, University of Oxford, Oxford, UK Terry Peters, Robarts Research Institute, London, Ontario, Canada Richard Robb, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
MICCAI Society Society Officers President and Board Chair Executive Director Executive Secretary Treasurer Elections Officer
Alan Colchester Richard Robb Nicholas Ayache Terry Peters Karl Heinz Hoehne
Society Staff Membership Coordinator Publication Coordinator
Gabor Sz´ekely, ETH, Zurich, Switzerland Nobuhiko Hata, Brigham and Women’s Hospital, Boston, USA Communications Coordinator Kirby Vosburgh, CIMIT, Boston, USA Industry Relations Coordinator Tina Kapur, Brigham and Women’s Hospital, Boston, USA
Local Planning Committee Sponsors and Exhibitors Registration and VIP Liaison Tutorials and Workshops Posters Social Events Technical Proceedings Support Professional Society Liaison Webmaster Student & Travel Awards
Oscar Acosta-Tamayo Tony Adriaansen Pierrick Bourgeat Hans Frimmel, Olivier Salvado Justin Boyle Jason Dowling Brian Lovell Jason Pickersgill & Josh Passenger Olivier Salvado
XIV
Organization
Sponsors CSIRO ICT Centre e-Health Research Centre Northern Digital, Inc. Medtronic, Inc. The Australian Pattern Recognition Society CSIRO Preventative Health Flagship Siemens Corporate Research GE Global Research
Reviewers Abend, Alan Abolmaesumi, Purang Acar, Burak Acosta Tamayo, Oscar Acton, Scott T. Adali, Tulay Aja-Fern´ andez, Santiago Alexander, Daniel Allen, Peter Alterovitz, Ron Amini, Amir An, Jungha Andersson, Mats Antiga, Luca Ardekani, Babak Ashburner, John Atkins, Stella Atkinson, David Avants, Brian Awate, Suyash Aylward, Stephen Azar, Fred S. Azzabou, Noura Babalola, Kolawole Bach Cuadra, Meritxell Baillet, Sylvain Bajcsy, Ruzena Bansal, Ravi Bardinet, Eric Barmpoutis, Angelos Barratt, Dean Bartoli, Adrien
Bartz, Dirk Basser, Peter Batchelor, Philip Baumann, Michael Bazin, Pierre-Louis Beckmann, Christian Beichel, Reinhard Bello, Fernando Benali, Habib Berger, Marie-Odile Bhalerao, Abhir Bharkhada, Deepak Bhatia, Kanwal Bilston, Lynne Birkfellner, Wolfgang Bischof, Horst Blanquer, Ignacio Blezek, Daniel Bloch, Isabelle Bockenbach, Olivier Boctor, Emad Bodensteiner, Christoph Bogunovic, Hrvoje Bosch, Johan Botha, Charl Bouix, Sylvain Boukerroui, Djamal Bourgeat, Pierrick Bresson, Xavier Brummer, Marijn Bucki, Marek Buehler, Katja
Organization
Buelow, Thomas Bueno Garcia, Gloria Buie, Damien Buzug, Thorsten Caan, Matthan Cai, Wenli Calhoun, Vince Camara, Oscar Cameron, Bruce Cammoun, Leila Camp, Jon Cardenas, Valerie Carneiro, Gustavo Carson, Paul Cates, Joshua Cathier, Pascal Cattin, Philippe Cavusoglu, Cenk Celler, Anna Chakravarty, Mallar Chaney, Edward Chang, Sukmoon Chappelow, Jonathan Chefd’hotel, Christophe Chen, Jian Chen, Ting Chi, Ying Chinzei, Kiyoyuki Chiu, Bernard Christensen, Gary Chua, Joselito Chui, Chee Kong Chui, Yim Pan Chung, Adrian Chung, Moo Chung, Pau-Choo Cinquin, Philippe Ciuciu, Philippe Clarysse, Patrick Clatz, Olivier Cleary, Kevin Cois, Constantine Collins, Louis Collins, David Colliot, Olivier
Commowick, Olivier Cootes, Tim Corso, Jason Cotin, Stephane Coulon, Olivier Coupe, Pierrick Crouch, Jessica Crum, William D’Agostino, Emiliano Dam, Erik Dan, Ippeita Darkner, Sune Dauguet, Julien Davis, Brad Dawant, Benoit De Craene, Mathieu Deguchi, Daisuke Dehghan, Ehsan Delingette, Herv´e DeLorenzo, Christine Deng, Xiang Desai, Jaydev Descoteaux, Maxime Dey, Joyoni Diamond, Solomon Gilbert Dieterich, Sonja Dijkstra, Jouke Dillenseger, Jean-Louis DiMaio, Simon Dirk, Loeckx Dodel, Silke Dornheim, Jana Dorval, Thierry Douiri, Abdel Duan, Qi Duay, Val´erie Dubois, Marie-Dominique Duchesne, Simon Dupont, Pierre Durrleman, Stanley Ecabert, Olivier Edwards, Philip Eggers, Holger Ehrhardt, Jan El-Baz, Ayman
XV
XVI
Organization
Ellingsen, Lotta Elson, Daniel Ersbøll, Bjarne Fahmi, Rachid Fan, Yong Farag, Aly Farman, Allan Fenster, Aaron Fetita, Catalin Feuerstein, Marco Fieten, Lorenz Fillard, Pierre Fiorini, Paolo Fischer, Bernd Fischer, Gregory Fitzpatrick, J. Michael Fleig, Oliver Fletcher, P. Thomas Florack, Luc Florin, Charles Forsyth, David Fouard, Celine Freiman, Moti Freysinger, Wolfgang Fripp, Jurgen Frouin, Fr´ed´erique Funka-Lea, Gareth Gangloff, Jacques Garnero, Line Gaser, Christian Gassert, Roger Gavrilescu, Maria Gee, James Gee, Andrew Genovesio, Auguste Gerard, Olivier Ghebreab, Sennay Gibaud, Bernard Giger, Maryellen Gilhuijs, Kenneth Gilmore, John Glory, Estelle Gobbi, David Goh, Alvina Goksel, Orcun
Gong, Qiyong Goodlett, Casey Goris, Michael Grady, Leo Grau, Vicente Greenspan, Hayit Gregoire, Marie Grimson, Eric Groher, Martin Grunert, Ronny Gu, Lixu Guerrero, Julian Guimond, Alexandre Hager, Gregory D Hahn, Horst Hall, Matt Hamarneh, Ghassan Han, Xiao Hansen, Klaus Hanson, Dennis Harders, Matthias Hata, Nobuhiko He, Huiguang He, Yong Heckemann, Rolf Heintzmann, Rainer Hellier, Pierre Ho, HonPong Hodgson, Antony Hoffmann, Kenneth Holden, Mark Holdsworth, David Holmes, David Hornegger, Joachim Horton, Ashley Hu, Mingxing Hu, Qingmao Hua, Jing Huang, Junzhou Huang, Xiaolei Huang, Heng Hutton, Brian Iglesias, Juan Eugenio J¨ ager, Florian Jain, Ameet
Organization
James, Adam Janke, Andrew Jannin, Pierre Jaramaz, Branislav Jenkinson, Mark Jin, Ge John, Nigel Johnston, Leigh Jolly, Marie-Pierre Jomier, Julien Jordan, Petr Ju, Tao Kabus, Sven Kakadiaris, Ioannis Karjalainen, Pasi Karssemeijer, Nico Karwoski, Ron Kazanzides, Peter Keil, Andreas Kerdok, Amy Keriven, Renaud Kettenbach, Joachim Khamene, Ali Kier, Christian Kikinis, Ron Kindlmann, Gordon Kiraly, Atilla Kiss, Gabriel Kitasaka, Takayuki Knoerlein, Benjamin Kodipaka, Santhosh Konietschke, Rainer Konukoglu, Ender Korb, Werner Koseki, Yoshihiko Kozerke, Sebastian Kozic, Nina Krieger, Axel Kriegeskorte, Nikolaus Krissian, Karl Krol, Andrzej Kronreif, Gernot Krupa, Alexandre Krupinski, Elizabeth Krut, S´ebastien
Kukuk, Markus Kuroda, Kagayaki Kurtcuoglu, Vartan Kwon, Dong-Soo Kybic, Jan Lai, Shang-Hong Lambrou, Tryphon Lamperth, Michael Lasser, Tobias Law, W.K. Lazar, Mariana Lee, Su-Lin Lee, Bryan Leemans, Alexander Lekadir, Karim Lenglet, Christophe Lepore, Natasha Leung, K. Y. Esther Levman, Jacob Li, Kang Li, Shuo Li, Ming Liao, Shu Liao, Rui Lieby, Paulette Likar, Bostjan Lin, Fuchun Linguraru, Marius George Linte, Cristian Liu, Yanxi Liu, Huafeng Liu, Jimin Lohmann, Gabriele Loog, Marco Lorenzen, Peter Lueders, Eileen Lum, Mitchell Ma, Burton Macq, Benoit Madabhushi, Anant Manduca, Armando Manniesing, Rashindra Marchal, Maud Marchesini, Renato Marsland, Stephen
XVII
XVIII
Organization
Martel, Sylvain Martens, Volker Mart´ı, Robert Martin-Fernandez, Marcos Masood, Khalid Masutani, Yoshitaka McGraw, Tim Meas-Yedid, Vannary Meier, Dominik Meikle, Steve Melonakos, John Mendoza, Cesar Merlet, Jean-Pierre Merloz, Philippe Mewes, Andrea Meyer, Chuck Miller, James Milles, Julien Modersitzki, Jan Mohamed, Ashraf Monahan, Emily Montagnat, Johan Montillo, Albert Morandi, Xavier Moratal, David Morel, Guillaume Mueller, Klaus Mulkern, Robert Murgasova, Maria Murphy, Philip Nakamoto, Masahiko Nash, Martyn Navas, K.A. Nelson, Bradley Nichols, Thomas Nicolau, Stephane Niemeijer, Meindert Nikou, Christophoros Nimsky, Christopher Novotny, Paul Nowinski, Wieslaw Nuyts, Johan O’Donnell, Lauren Ogier, Arnaud Okamura, Allison
O’Keefe, Graeme Olabarriaga, Silvia ´ Olafsd´ ottir, Hildur Oliver, Arnau Olsen, Ole Fogh Oost, Elco Otake, Yoshito Ozarslan, Evren Padfield, Dirk Padoy, Nicolas Palaniappan, Kannappan Pang, Wai-Man Papademetris, Xenios Papadopoulo, Th´eo Patriciu, Alexandru Patronik, Nicholas Pavlidis, Ioannis Pechaud, Mickael Peine, William Peitgen, Heinz-Otto Pekar, Vladimir Penney, Graeme Perperidis, Dimitrios Peters, Terry Petit, Yvan Pham, Dzung Phillips, Roger Pichon, Eric Pitiot, Alain Pizer, Stephen Plaskos, Christopher Pock, Thomas Pohl, Kilian Maria Poignet, Philippe Poupon, Cyril Prager, Richard Prastawa, Marcel Prause, Guido Preim, Bernhard Prima, Sylvain Qian, Zhen Qian, Xiaoning Raaymakers, Bas Radaelli, Alessandro Rajagopal, Vijayaraghavan
Organization
Rajagopalan, Srinivasan Rasche, Volker Ratnanather, Tilak Raucent, Benoit Reinhardt, Joseph Renaud, Pierre Restif, Christophe Rettmann, Maryam Rexilius, Jan Reyes, Mauricio Rhode, Kawal Rittscher, Jens Robles-Kelly, Antonio Rodriguez y Baena, Ferdinando Rohlfing, Torsten Rohling, Robert Rohr, Karl Rose, Chris Rosen, Jacob Rousseau, Fran¸cois Rousson, Mikael Ruiz-Alzola, Juan Russakoff, Daniel Rydell, Joakim Sabuncu, Mert Rory Sabuncu, Mert Sadowsky, Ofri Salvado, Olivier San Jose Estepar, Raul Sanchez Castro, Francisco Javier Santamaria-Pang, Alberto Schaap, Michiel Schilham, Arnold Schlaefer, Alexander Schmid, Volker Schnabel, Julia Schwarz, Tobias Seemann, Gunnar Segonne, Florent Sermesant, Maxime Shah, Shishir Sharma, Aayush Sharp, Peter Sharp, Greg Shekhar, Raj
Shen, Hong Shen, Dinggang Shimizu, Akinobu Siddiqi, Kaleem Sielhorst, Tobias Sijbers, Jan Sinha, Shantanu Sj¨ ostrand, Karl Sled, John Smith, Keith Soler, Luc Sonka, Milan Stewart, Charles Stewart, James Stindel, Eric Stoel, Berend Stoianovici, Dan Stoll, Jeff Stoyanov, Danail Styner, Martin Suetens, Paul Sugita, Naohiko Suinesiaputra, Avan Sun, Yiyong Sundar, Hari Szczerba, Dominik Szilagyi, Laszlo Tagare, Hemant Talbot, Hugues Talib, Haydar Talos, Ion-Florin Tanner, Christine Tao, Xiaodong Tarte, Segolene Tasdizen, Tolga Taylor, Zeike Taylor, Jonathan Tek, Huseyin Tendick, Frank Terzopoulos, Demetri Th´evenaz, Philippe Thirion, Bertrand Tieu, Kinh Todd-Pokropek, Andrew Todman, Alison
XIX
XX
Organization
Toews, Matthew Tohka, Jussi Tomazevic, Dejan Tonet, Oliver Tong, Shan Tosun, Duygu Traub, Joerg Trejos, Ana Luisa Tsao, Jeffrey Tschumperl´e, David Tsechpenakis, Gavriil Tsekos, Nikolaos Twining, Carole Urschler, Martin van Assen, Hans van de Ville, Dimitri van der Bom, Martijn van der Geest, Rob van Rikxoort, Eva van Walsum, Theo Vandermeulen, Dirk Ventikos, Yiannis Vercauteren, Tom Verma, Ragini Vermandel, Maximilien Vidal, Rene Vidholm, Erik Vilanova, Anna Villa, Mari Cruz Villain, Nicolas Villard, Caroline von Berg, Jens von Lavante, Etienne von Siebenthal, Martin Vosburgh, Kirby Vossepoel, Albert Vrooman, Henri Vrtovec, Tomaz Wang, Defeng Wang, Fei Wang, Guodong Wang, Yongmei Michelle Wang, Yongtian Wang, Zhizhou Wassermann, Demian
Weese, J¨ urgen Wegner, Ingmar Wein, Wolfgang Weisenfeld, Neil Wengert, Christian West, Jay Westenberg, Michel Westermann, Ruediger Whitcher, Brandon Wiemker, Rafael Wiest-Daessle, Nicolas Wigstrom, Lars Wiles, Andrew Wink, Onno Wong, Ken Wong, Kenneth Wong, Stephen Wong, Tien-Tsin Wong, Wilbur Wood, Bradford Wood, Fiona Worsley, Keith W¨ orz, Stefan Wˇsrn, Heinz Wu, Jue Xia, Yan Xie, Jun Xu, Sheng Xu, Ye Xue, Hui Xue, Zhong Yan, Pingkun Yang, King Yang, Lin Yang, Yihong Yaniv, Ziv Yeo, Boon Thye Yeung, Sai-Kit Yogesan, Kanagasingam Yoshida, Hiro Young, Alistair Young, Stewart Yu, Yang Yue, Ning Yuen, Shelten
Organization
Yushkevich, Paul Zacharaki, Evangelia Zemiti, Nabil Zerubia, Josiane Zhan, Wang Zhang, Fan Zhang, Heye Zhang, Hui Zhang, Xiangwei
Zhang, Yong Zheng, Guoyan Zheng, Yefeng Zhou, Jinghao Zhou, Kevin Zhou, Xiang Ziyan, Ulas Zollei, Lilla Zwiggelaar, Reyer
XXI
Table of Contents – Part II
Computer Assisted Intervention and Robotics - II Real-Time Tissue Tracking with B-Mode Ultrasound Using Speckle and Visual Servoing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexandre Krupa, Gabor Fichtinger, and Gregory D. Hager Intra-operative 3D Guidance in Prostate Brachytherapy Using a Non-isocentric C-arm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ameet K. Jain, A. Deguet, Iulian I. Iordachita, Gouthami Chintalapani, J. Blevins, Y. Le, E. Armour, C. Burdette, Danny Y. Song, and Gabor Fichtinger A Multi-view Opto-Xray Imaging System: Development and First Application in Trauma Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joerg Traub, Tim Hauke Heibel, Philipp Dressel, Sandro Michael Heining, Rainer Graumann, and Nassir Navab Towards 3D Ultrasound Image Based Soft Tissue Tracking: A Transrectal Ultrasound Prostate Image Alignment System . . . . . . . . . . . . . Michael Baumann, Pierre Mozer, Vincent Daanen, and Jocelyne Troccaz A Probabilistic Framework for Tracking Deformable Soft Tissue in Minimally Invasive Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Mountney, Benny Lo, Surapa Thiemjarus, Danail Stoyanov, and Guang Zhong-Yang Precision Targeting of Liver Lesions with a Needle-Based Soft Tissue Navigation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L. Maier-Hein, F. Pianka, A. Seitel, S.A. M¨ uller, A. Tekbas, M. Seitel, I. Wolf, B.M. Schmied, and H.-P. Meinzer Dynamic MRI Scan Plane Control for Passive Tracking of Instruments and Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon P. DiMaio, E. Samset, Gregory S. Fischer, Iulian I. Iordachita, Gabor Fichtinger, Ferenc A. Jolesz, and Clare MC Tempany Design and Preliminary Accuracy Studies of an MRI-Guided Transrectal Prostate Intervention System . . . . . . . . . . . . . . . . . . . . . . . . . . . Axel Krieger, Csaba Csoma, Iulian I. Iordachita, Peter Guion, Anurag K. Singh, Gabor Fichtinger, and Louis L. Whitcomb
1
9
18
26
34
42
50
59
XXIV
Table of Contents – Part II
Thoracoscopic Surgical Navigation System for Cancer Localization in Collapsed Lung Based on Estimation of Lung Deformation . . . . . . . . . . . . Masahiko Nakamoto, Naoki Aburaya, Yoshinobu Sato, Kozo Konishi, Ichiro Yoshino, Makoto Hashizume, and Shinichi Tamura
68
Visualization and Interaction Clinical Evaluation of a Respiratory Gated Guidance System for Liver Punctures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S.A. Nicolau, Xavier Pennec, Luc Soler, and Nicholas Ayache
77
Rapid Voxel Classification Methodology for Interactive 3D Medical Image Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qi Zhang, Roy Eagleson, and Terry M. Peters
86
Towards Subject-Specific Models of the Dynamic Heart for Image-Guided Mitral Valve Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cristian A. Linte, Marcin Wierzbicki, John Moore, Stephen H. Little, G´erard M. Guiraudon, and Terry M. Peters
94
pq-space Based Non-Photorealistic Rendering for Augmented Reality . . . Mirna Lerotic, Adrian J. Chung, George P. Mylonas, and Guang-Zhong Yang
102
Eye-Gaze Driven Surgical Workflow Segmentation . . . . . . . . . . . . . . . . . . . . A. James, D. Vieira, Benny Lo, Ara Darzi, and Guang-Zhong Yang
110
Neuroscience Image Computing - I Prior Knowledge Driven Multiscale Segmentation of Brain MRI . . . . . . . . Ayelet Akselrod-Ballin, Meirav Galun, John Moshe Gomori, Achi Brandt, and Ronen Basri
118
Longitudinal Cortical Registration for Developing Neonates . . . . . . . . . . . Hui Xue, Latha Srinivasan, Shuzhou Jiang, Mary A. Rutherford, A. David Edwards, Daniel Rueckert, and Joseph V. Hajnal
127
Regional Homogeneity and Anatomical Parcellation for fMRI Image Classification: Application to Schizophrenia and Normal Controls . . . . . . Feng Shi, Yong Liu, Tianzi Jiang, Yuan Zhou, Wanlin Zhu, Jiefeng Jiang, Haihong Liu, and Zhening Liu
136
Probabilistic Fiber Tracking Using Particle Filtering . . . . . . . . . . . . . . . . . . Fan Zhang, Casey Goodlett, Edwin Hancock, and Guido Gerig
144
SMT: Split and Merge Tractography for DT-MRI . . . . . . . . . . . . . . . . . . . . U˘gur Bozkaya and Burak Acar
153
Table of Contents – Part II
XXV
Tract-Based Morphometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lauren J. O’Donnell, Carl-Fredrik Westin, and Alexandra J. Golby
161
Towards Whole Brain Segmentation by a Hybrid Model . . . . . . . . . . . . . . . Zhuowen Tu and Arthur W. Toga
169
Computational Anatomy - II A Family of Principal Component Analyses for Dealing with Outliers . . . J. Eugenio Iglesias, Marleen de Bruijne, Marco Loog, Fran¸cois Lauze, and Mads Nielsen Automatic Segmentation of Articular Cartilage in Magnetic Resonance Images of the Knee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jurgen Fripp, Stuart Crozier, Simon K. Warfield, and S´ebastien Ourselin Automated Model-Based Rib Cage Segmentation and Labeling in CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tobias Klinder, Cristian Lorenz, Jens von Berg, Sebastian P.M. Dries, Thomas B¨ ulow, and J¨ orn Ostermann
178
186
195
Efficient Selection of the Most Similar Image in a Database for Critical Structures Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olivier Commowick and Gr´egoire Malandain
203
Unbiased White Matter Atlas Construction Using Diffusion Tensor Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hui Zhang, Paul A. Yushkevich, Daniel Rueckert, and James C. Gee
211
Innovative Clinical and Biological Applications - II Real-Time SPECT and 2D Ultrasound Image Registration . . . . . . . . . . . . Marek Bucki, Fabrice Chassat, Francisco Galdames, Takeshi Asahi, Daniel Pizarro, and Gabriel Lobo
219
A Multiphysics Simulation of a Healthy and a Diseased Abdominal Aorta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert H.P. McGregor, Dominik Szczerba, and G´ abor Sz´ekely
227
New Motion Correction Models for Automatic Identification of Renal Transplant Rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ayman S. El-Baz, Georgy Gimel’farb, and Mohamed A. El-Ghar
235
Detecting Mechanical Abnormalities in Prostate Tissue Using FE-Based Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrick Courtis and Abbas Samani
244
XXVI
Table of Contents – Part II
Real-Time Fusion of Ultrasound and Gamma Probe for Navigated Localization of Liver Metastases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Wendler, Marco Feuerstein, Joerg Traub, Tobias Lasser, Jakob Vogel, Farhad Daghighian, Sibylle I. Ziegler, and Nassir Navab Fast and Robust Analysis of Dynamic Contrast Enhanced MRI Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olga Kubassova, Mikael Boesen, Roger D. Boyle, Marco A. Cimmino, Karl E. Jensen, Henning Bliddal, and Alexandra Radjenovic
252
261
Spectroscopic and Cellular Imaging Functional Near Infrared Spectroscopy in Novice and Expert Surgeons – A Manifold Embedding Approach . . . . . . . . . . . . . . . . . . . . . . . . Daniel Richard Leff, Felipe Orihuela-Espina, Louis Atallah, Ara Darzi, and Guang-Zhong Yang
270
A Hierarchical Unsupervised Spectral Clustering Scheme for Detection of Prostate Cancer from Magnetic Resonance Spectroscopy (MRS) . . . . . Pallavi Tiwari, Anant Madabhushi, and Mark Rosen
278
A Clinically Motivated 2-Fold Framework for Quantifying and Classifying Immunohistochemically Stained Specimens . . . . . . . . . . . . . . . . Bonnie Hall, Wenjin Chen, Michael Reiss, and David J. Foran
287
Cell Population Tracking and Lineage Construction with Spatiotemporal Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kang Li, Mei Chen, and Takeo Kanade
295
Spatio-Temporal Registration Spatiotemporal Normalization for Longitudinal Analysis of Gray Matter Atrophy in Frontotemporal Dementia . . . . . . . . . . . . . . . . . . . . . . . . Brian Avants, Chivon Anderson, Murray Grossman, and James C. Gee Population Based Analysis of Directional Information in Serial Deformation Tensor Morphometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Colin Studholme and Valerie Cardenas Non-parametric Diffeomorphic Image Registration with the Demons Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tom Vercauteren, Xavier Pennec, Aymeric Perchant, and Nicholas Ayache Three-Dimensional Ultrasound Mosaicing . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Wachinger, Wolfgang Wein, and Nassir Navab
303
311
319
327
Table of Contents – Part II
XXVII
General Medical Image Computing - III Automated Extraction of Lymph Nodes from 3-D Abdominal CT Images Using 3-D Minimum Directional Difference Filter . . . . . . . . . . . . . . Takayuki Kitasaka, Yukihiro Tsujimura, Yoshihiko Nakamura, Kensaku Mori, Yasuhito Suenaga, Masaaki Ito, and Shigeru Nawano Non-Local Means Variants for Denoising of Diffusion-Weighted and Diffusion Tensor MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicolas Wiest-Daessl´e, Sylvain Prima, Pierrick Coup´e, Sean Patrick Morrissey, and Christian Barillot Quantifying Calcification in the Lumbar Aorta on X-Ray Images . . . . . . . Lars A. Conrad-Hansen, Marleen de Bruijne, Fran¸cois Lauze, L´ aszl´ o B. Tank´ o, Paola C. Pettersen, Qing He, Jianghong Chen, Claus Christiansen, and Mads Nielsen Physically Motivated Enhancement of Color Images for Fiber Endoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Winter, Thorsten Zerfaß, Matthias Elter, Stephan Rupp, and Thomas Wittenberg Signal LMMSE Estimation from Multiple Samples in MRI and DT-MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Santiago Aja-Fern´ andez, Carlos Alberola-L´ opez, and Carl-Fredrik Westin Quantifying Heterogeneity in Dynamic Contrast-Enhanced MRI Parameter Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.J. Rose, S. Mills, J.P.B. O’Connor, G.A. Buonaccorsi, C. Roberts, Y. Watson, B. Whitcher, G. Jayson, A. Jackson, and G.J.M. Parker Improving Temporal Fidelity in k-t BLAST MRI Reconstruction . . . . . . . Andreas Sigfridsson, Mats Andersson, Lars Wigstr¨ om, John-Peder Escobar Kvitting, and Hans Knutsson Segmentation and Classification of Breast Tumor Using Dynamic Contrast-Enhanced MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuanjie Zheng, Sajjad Baloch, Sarah Englander, Mitchell D. Schnall, and Dinggang Shen Automatic Whole Heart Segmentation in Static Magnetic Resonance Image Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jochen Peters, Olivier Ecabert, Carsten Meyer, Hauke Schramm, Reinhard Kneser, Alexandra Groth, and J¨ urgen Weese
336
344
352
360
368
376
385
393
402
XXVIII
Table of Contents – Part II
PCA-Based Magnetic Field Modeling: Application for On-Line MR Temperature Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. Maclair, B. Denis de Senneville, M. Ries, B. Quesson, P. Desbarats, J. Benois-Pineau, and C.T.W. Moonen
411
A Probabilistic Model for Haustral Curvatures with Applications to Colon CAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Melonakos, Paulo Mendon¸ca, Rahul Bhotka, and Saad Sirohey
420
LV Motion Tracking from 3D Echocardiography Using Textural and Structural Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andriy Myronenko, Xubo Song, and David J. Sahn
428
A Novel 3D Multi-scale Lineness Filter for Vessel Detection . . . . . . . . . . . H.E. Bennink, H.C. van Assen, G.J. Streekstra, R. ter Wee, J.A.E. Spaan, and Bart M. ter Haar Romeny Live-Vessel: Extending Livewire for Simultaneous Extraction of Optimal Medial and Boundary Paths in Vascular Images . . . . . . . . . . . . . . Kelvin Poon, Ghassan Hamarneh, and Rafeef Abugharbieh A Point-Wise Quantification of Asymmetry Using Deformation Fields: Application to the Study of the Crouzon Mouse Model . . . . . . . . . . . . . . . ´ Hildur Olafsd´ ottir, Stephanie Lanche, Tron A. Darvann, Nuno V. Hermann, Rasmus Larsen, Bjarne K. Ersbøll, Estanislao Oubel, Alejandro F. Frangi, Per Larsen, Chad A. Perlyn, Gillian M. Morriss-Kay, and Sven Kreiborg Object Localization Based on Markov Random Fields and Symmetry Interest Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ren´e Donner, Branislav Micusik, Georg Langs, Lech Szumilas, Philipp Peloschek, Klaus Friedrich, and Horst Bischof
436
444
452
460
2D Motion Analysis of Long Axis Cardiac Tagged MRI . . . . . . . . . . . . . . . Ting Chen, Sohae Chung, and Leon Axel
469
MCMC Curve Sampling for Image Segmentation . . . . . . . . . . . . . . . . . . . . . Ayres C. Fan, John W. Fisher III, William M. Wells III, James J. Levitt, and Alan S. Willsky
477
Automatic Centerline Extraction of Irregular Tubular Structures Using Probability Volumes from Multiphoton Imaging . . . . . . . . . . . . . . . . . . . . . . A. Santamar´ıa-Pang, C.M. Colbert, P. Saggau, and Ioannis A. Kakadiaris Γ -Convergence Approximation to Piecewise Smooth Medical Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jungha An, Mikael Rousson, and Chenyang Xu
486
495
Table of Contents – Part II
Is a Single Energy Functional Sufficient? Adaptive Energy Functionals and Automatic Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chris McIntosh and Ghassan Hamarneh A Duality Based Algorithm for TV-L1 -Optical-Flow Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Pock, Martin Urschler, Christopher Zach, Reinhard Beichel, and Horst Bischof Deformable 2D-3D Registration of the Pelvis with a Limited Field of View, Using Shape Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ofri Sadowsky, Gouthami Chintalapani, and Russell H. Taylor Segmentation-driven 2D-3D Registration for Abdominal Catheter Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Groher, Frederik Bender, Ralf-Thorsten Hoffmann, and Nassir Navab Primal/Dual Linear Programming and Statistical Atlases for Cartilage Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ben Glocker, Nikos Komodakis, Nikos Paragios, Christian Glaser, Georgios Tziritas, and Nassir Navab Similarity Metrics for Groupwise Non-rigid Registration . . . . . . . . . . . . . . . Kanwal K. Bhatia, Joseph V. Hajnal, Alexander Hammers, and Daniel Rueckert A Comprehensive System for Intraoperative 3D Brain Deformation Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christine DeLorenzo, Xenophon Papademetris, Kenneth P. Vives, Dennis D. Spencer, and James S. Duncan Bayesian Tracking of Tubular Structures and Its Application to Carotid Arteries in CTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michiel Schaap, Rashindra Manniesing, Ihor Smal, Theo van Walsum, Aad van der Lugt, and Wiro Niessen Automatic Fetal Measurements in Ultrasound Using Constrained Probabilistic Boosting Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gustavo Carneiro, Bogdan Georgescu, Sara Good, and Dorin Comaniciu
XXIX
503
511
519
527
536
544
553
562
571
Quantifying Effect-Specific Mammographic Density . . . . . . . . . . . . . . . . . . Jakob Raundahl, Marco Loog, Paola C. Pettersen, and Mads Nielsen
580
Revisiting the Evaluation of Segmentation Results: Introducing Confidence Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christophe Restif
588
XXX
Table of Contents – Part II
Error Analysis of Calibration Materials on Dual-Energy Mammography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xuanqin Mou and Xi Chen
596
Computer Assisted Intervention and Robotics - III A MR Compatible Mechatronic System to Facilitate Magic Angle Experiments in Vivo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haytham Elhawary, Aleksandar Zivanovic, Marc Rea, Zion Tsz Ho Tse, Donald McRobbie, Ian Young, Martyn Paley, Brian Davies, and Michael Lamp´erth
604
Variational Guidewire Tracking Using Phase Congruency . . . . . . . . . . . . . Greg Slabaugh, Koon Kong, Gozde Unal, and Tong Fang
612
Endoscopic Navigation for Minimally Invasive Suturing . . . . . . . . . . . . . . . Christian Wengert, Lukas Bossard, Armin H¨ aberling, Charles Baur, G´ abor Sz´ekely, and Philippe C. Cattin
620
On Fiducial Target Registration Error in the Presence of Anisotropic Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Burton Ma, Mehdi Hedjazi Moghari, Randy E. Ellis, and Purang Abolmaesumi Rotational Roadmapping: A New Image-Based Navigation Technique for the Interventional Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markus Kukuk and Sandy Napel Bronchoscope Tracking Without Fiducial Markers Using Ultra-tiny Electromagnetic Tracking System and Its Evaluation in Different Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kensaku Mori, Daisuke Deguchi, Kazuyoshi Ishitani, Takayuki Kitasaka, Yasuhito Suenaga, Yosihnori Hasegawa, Kazuyoshi Imaizumi, and Hirotsugu Takabatake Online Estimation of the Target Registration Error for n-Ocular Optical Tracking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tobias Sielhorst, Martin Bauer, Oliver Wenisch, Gudrun Klinker, and Nassir Navab Assessment of Perceptual Quality for Gaze-Contingent Motion Stabilization in Robotic Assisted Minimally Invasive Surgery . . . . . . . . . . George P. Mylonas, Danail Stoyanov, Ara Darzi, and Guang-Zhong Yang Prediction of Respiratory Motion with Wavelet-Based Multiscale Autoregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Floris Ernst, Alexander Schlaefer, and Achim Schweikard
628
636
644
652
660
668
Table of Contents – Part II
Multi-criteria Trajectory Planning for Hepatic Radiofrequency Ablation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claire Baegert, Caroline Villard, Pascal Schreck, and Luc Soler
XXXI
676
General Biological Imaging Computing A Bayesian 3D Volume Reconstruction for Confocal Micro-rotation Cell Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yong Yu, Alain Trouv´e, and Bernard Chalemond
685
Bias Image Correction Via Stationarity Maximization . . . . . . . . . . . . . . . . T. Dorval, A. Ogier, and A. Genovesio
693
Toward Optimal Matching for 3D Reconstruction of Brachytherapy Seeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Labat, Ameet K. Jain, Gabor Fichtinger, and Jerry L. Prince Alignment of Large Image Series Using Cubic B-Splines Tessellation: Application to Transmission Electron Microscopy Data . . . . . . . . . . . . . . . Julien Dauguet, Davi Bock, R. Clay Reid, and Simon K. Warfield Quality-Based Registration and Reconstruction of Optical Tomography Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Wein, Moritz Blume, Ulrich Leischner, Hans-Ulrich Dodt, and Nassir Navab Simultaneous Segmentation, Kinetic Parameter Estimation, and Uncertainty Visualization of Dynamic PET Images . . . . . . . . . . . . . . . . . . . Ahmed Saad, Ben Smith, Ghassan Hamarneh, and Torsten M¨ oller
701
710
718
726
Neuroscience Image Computing - II Nonlinear Analysis of BOLD Signal: Biophysical Modeling, Physiological States, and Functional Activation . . . . . . . . . . . . . . . . . . . . . Zhenghui Hu and Pengcheng Shi
734
Effectiveness of the Finite Impulse Response Model in Content-Based fMRI Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bing Bai, Paul Kantor, and Ali Shokoufandeh
742
Sources of Variability in MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wanmei Ou, Polina Golland, and Matti H¨ am¨ al¨ ainen
751
Customised Cytoarchitectonic Probability Maps Using Deformable Registration: Primary Auditory Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lara Bailey, Purang Abolmaesumi, Julian Tam, Patricia Morosan, Rhodri Cusack, Katrin Amunts, and Ingrid Johnsrude
760
XXXII
Table of Contents – Part II
Segmentation of Q-Ball Images Using Statistical Surface Evolution . . . . . Maxime Descoteaux and Rachid Deriche Evaluation of Shape-Based Normalization in the Corpus Callosum for White Matter Connectivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hui Sun, Paul A. Yushkevich, Hui Zhang, Philip A. Cook, Jeffrey T. Duda, Tony J. Simon, and James C. Gee Accuracy Assessment of Global and Local Atrophy Measurement Techniques with Realistic Simulated Longitudinal Data . . . . . . . . . . . . . . Oscar Camara, Rachael I. Scahill, Julia A. Schnabel, William R. Crum, Gerard R. Ridgway, Derek L.G. Hill, and Nick C. Fox Combinatorial Optimization for Electrode Labeling of EEG Caps . . . . . . Micka¨el P´echaud, Renaud Keriven, Th´eo Papadopoulo, and Jean-Michel Badier
769
777
785
793
Computational Anatomy - III Analysis of Deformation of the Human Ear and Canal Caused by Mandibular Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sune Darkner, Rasmus Larsen, and Rasmus R. Paulsen
801
Shape Registration by Simultaneously Optimizing Representation and Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yifeng Jiang, Jun Xie, Deqing Sun, and Hungtat Tsui
809
Landmark Correspondence Optimization for Coupled Surfaces . . . . . . . . . Lin Shi, Defeng Wang, Pheng Ann Heng, Tien-Tsin Wong, Winnie C.W. Chu, Benson H.Y. Yeung, and Jack C.Y. Cheng Mean Template for Tensor-Based Morphometry Using Deformation Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natasha Lepor´e, Caroline Brun, Xavier Pennec, Yi-Yu Chou, Oscar L. Lopez, Howard J. Aizenstein, James T. Becker, Arthur W. Toga, and Paul M. Thompson Shape-Based Myocardial Contractility Analysis Using Multivariate Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karim Lekadir, Niall Keenan, Dudley Pennell, and Guang-Zhong Yang
818
826
834
Computational Physiology - II Orthopedics Surgery Trainer with PPU-Accelerated Blood and Tissue Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wai-Man Pang, Jing Qin, Yim-Pan Chui, Tien-Tsin Wong, Kwok-Sui Leung, and Pheng Ann Heng
842
Table of Contents – Part II
XXXIII
Interactive Contacts Resolution Using Smooth Surface Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J´er´emie Dequidt, Julien Lenoir, and St´ephane Cotin
850
Using Statistical Shape Analysis for the Determination of Uterine Deformation States During Hydrometra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Harders and G´ abor Sz´ekely
858
Predictive K-PLSR Myocardial Contractility Modeling with Phase Contrast MR Velocity Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Su-Lin Lee, Qian Wu, Andrew Huntbatch, and Guang-Zhong Yang
866
A Coupled Finite Element Model of Tumor Growth and Vascularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bryn A. Lloyd, Dominik Szczerba, and G´ abor Sz´ekely
874
Innovative Clinical and Biological Applications - III Autism Diagnostics by 3D Texture Analysis of Cerebral White Matter Gyrifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ayman S. El-Baz, Manuel F. Casanova, Georgy Gimel’farb, Meghan Mott, and Andrew E. Switala
882
3-D Analysis of Cortical Morphometry in Differential Diagnosis of Parkinson’s Plus Syndromes: Mapping Frontal Lobe Cortical Atrophy in Progressive Supranuclear Palsy Patients . . . . . . . . . . . . . . . . . . . . . . . . . . Duygu Tosun, Simon Duchesne, Yan Rolland, Arthur W. Toga, Marc V´erin, and Christian Barillot
891
Tissue Characterization Using Fractal Dimension of High Frequency Ultrasound RF Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehdi Moradi, Parvin Mousavi, and Purang Abolmaesumi
900
Towards Intra-operative 3D Nuclear Imaging: Reconstruction of 3D Radioactive Distributions Using Tracked Gamma Probes . . . . . . . . . . . . . . Thomas Wendler, Alexander Hartl, Tobias Lasser, Joerg Traub, Farhad Daghighian, Sibylle I. Ziegler, and Nassir Navab Instrumentation for Epidural Anesthesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . King-wei Hor, Denis Tran, Allaudin Kamani, Vickie Lessoway, and Robert Rohling Small Animal Radiation Research Platform: Imaging, Mechanics, Control and Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Matinfar, Owen Gray, Iulian I. Iordachita, Chris Kennedy, Eric Ford, John Wong, Russell H. Taylor, and Peter Kazanzides
909
918
926
XXXIV
Table of Contents – Part II
Proof of Concept of a Simple Computer–Assisted Technique for Correcting Bone Deformities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Burton Ma, Amber L. Simpson, and Randy E. Ellis
935
Global Registration of Multiple Point Sets: Feasibility and Applications in Multi-fragment Fracture Fixation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehdi Hedjazi Moghari and Purang Abolmaesumi
943
Precise Estimation of Postoperative Cup Alignment from Single Standard X-Ray Radiograph with Gonadal Shielding . . . . . . . . . . . . . . . . . Guoyan Zheng, Simon Steppacher, Xuan Zhang, and Moritz Tannast
951
Fully Automated and Adaptive Detection of Amyloid Plaques in Stained Brain Sections of Alzheimer Transgenic Mice . . . . . . . . . . . . . . . . . Abdelmonem Feki, Olivier Teboul, Albertine Dubois, Bruno Bozon, Alexis Faure, Philippe Hantraye, Marc Dhenain, Benoit Delatour, and Thierry Delzescaux
960
Non-rigid Registration of Pre-procedural MR Images with Intra-procedural Unenhanced CT Images for Improved Targeting of Tumors During Liver Radiofrequency Ablations . . . . . . . . . . . . . . . . . . . . . . N. Archip, S. Tatli, P. Morrison, Ferenc A. Jolesz, Simon K. Warfield, and S. Silverman
969
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
979
Real-Time Tissue Tracking with B-Mode Ultrasound Using Speckle and Visual Servoing Alexandre Krupa1 , Gabor Fichtinger2 , and Gregory D. Hager2 1
2
IRISA - INRIA Rennes, France
[email protected] Engineering Research Center, Johns Hopkins University, USA {gabor,hager}@cs.jhu.edu
Abstract. We present a method for real-time tracking of moving soft tissue with B-mode ultrasound (US). The method makes use of the speckle information contained in the US images to estimate the in-plane and out-of-plane motion of a fixed target relative to the ultrasound scan plane. The motion information is then used as closed-loop feedback to a robot which corrects for the target motion. The concept is demonstrated for translation motions in an experimental setup consisting of an ultrasound speckle phantom, a robot for simulating tissue motion, and a robot that performs motion stabilization from US images. This concept shows promise for US-guided procedures that require real-time motion tracking and compensation.
1
Introduction
Quantitative ultrasound guidance (US) has great potential for aiding a wide range of diagnostic and minimally invasive surgical applications. However, one of the barriers to wider application is the challenge of locating and maintaining targets of interest within the US scan-plane, particularly when the underlying tissue is in motion. This problem can be alleviated, to some degree, through the use of recently developed 3D ultrasound systems. However, a more practical solution is to create a means of stabilizing a traditional B-mode ultrasound imager relative to a target. This capability can be exploited in many applications, for example to automatically move the US probe to maintain an appropriate view of moving soft tissues during US scanning or to synchronize the insertion of a needle into a moving target during biopsy or local therapy. In this paper, we present a system that is capable of fully automatic, realtime tracking and motion compensation of a moving soft tissue target using a sequence of B-mode ultrasound images. Contrary to prior work in this area, which has relied on segmenting structures of interest [1,2], we make direct use of the speckle information contained in the US images. While US speckle is usually considered to be noise from an imaging point of view, it in fact results from the
The authors acknowledge the support of the National Science Foundation under Engineering Research Center grant EEC-9731748.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 1–8, 2007. c Springer-Verlag Berlin Heidelberg 2007
2
A. Krupa, G. Fichtinger, and G.D. Hager
Fig. 1. (left) Experimental decorrelation curves obtained by measuring the correlation value between 25 patches of B-scan I1 and their corresponding patches in B-scan I2 along the elevation distance d (right)
coherent reflection of microscopic structures contained in soft tissue. As such, it is spatially coherent. Furthermore, an US beam is several mm wide. As a result, there is substantial overlap between US scan planes with small lateral displacements and, therefore, substantial correlation of the speckle information between successive images. Speckle correlation occurs for both in-plane and outof-plane motion, thereby making it possible to track both out-of plane and inplane motion, and raising the possibility of calculating full 6-DOF relative pose of speckle patches. Initially, speckle information has been used to estimate multi-dimensional flow in 2D ultrasound image ([3]). Recently several authors ([4,5]) have published speckle decorrelation techniques to allow freehand 3D US scanning without a position sensor on the US probe. Their techniques depend on experimentally calibrating speckle decorrelation curves from real soft tissues and/or speckle simulating phantoms. These curves (Fig. 1) are obtained by capturing B-mode images at known distances d along the elevation direction (i.e. orthogonal to the image plane) and measuring the normalized correlation coefficients for a finite number of rectangular patches fixed in the images. The imaging procedure then entails capturing an US stream by moving the probe in a given direction. The relative in-plane and out-of-plane position between each image is then estimated, off-line, from the estimated elevation distances from at least 3 non-collinear patches in the image plane. These distances are computed from the calibrated decorrelation curves using the measured inter-patch correlation value for each image patch. In our experimental scenario, we also perform an offline calibration procedure to relate speckle decorrelation to elevation motion. However, we subsequently servo the US probe to track a user-selected B-scan target in a fully automatic, online manner. The 6-DOF motion of the target B-scan is extracted by an estimation method using the speckle information and an image region tracking algorithm based on grey level intensity. A visual servoing scheme is then used
Real-Time Tissue Tracking with B-Mode Ultrasound
3
to control the probe displacement. Section 2 presents the methods used to extract 6-DOF rigid motion of the target B-scan image. The visual servoing control laws are developed in section 3 and section 4 presents first results obtained from ex-vivo experiments where only translation motions are considered.
2
Motion Extraction
The overall tracking problem is to minimize the relative position between the current B-scan (denoted by a Cartesian frame {c}) and a target B-scan (denoted by a Cartesian frame {i}). The full 6 DOF target plane position can be decomposed by two successive homogeneous transformations: c Hi = c Hp p Hi where c Hp and p Hi describing the in-plane and out-of-plane displacement of the target, respectively. Note that {p} corresponds to an intermediate “virtual” plane. The in-plane displacement corresponds to the translations x and y along the X and Y axes of the current image plane and the angular rotation γ around the Z axis (orthogonal to the image), such that: ⎛ ⎞ cos(γ) − sin(γ) 0 x ⎜ sin(γ) cos(γ) 0 y ⎟ c ⎟ Hp = ⎜ (1) ⎝ 0 0 1 0⎠ 0 0 01 We use a classical template tracking technique [6] to extract the in-plane motion parameters x, y, γ. This information is then used to relate the image coordinates of patches in the two images for the purposes of estimating out-of-plane motion using speckle decorrelation. To extract the out-of-plane motion, we use the Gaussian model introduced in [4]. From experimental observations (Fig. 1), we found that the elevation distance between a patch in the target plane and the corresponding patch in the σ 2 ln(ρ), where σ ˆ = 0.72 mm is the current image can be estimated by dˆ = −2ˆ mean resolution cell width (identified from experimental decorrelation curves). To compute the full out-of-plane motion, we compute the elevation distance of a grid of patches (25 in our current system), and fit a plane to this data. However, the Gaussian model does not detect the sign of the elevation distance for a given patch. Thus, we employ the following algorithm to estimate the out-ofplane position of the target plane with respect to the virtual plane {p}. We first set a random sign on each inter-patch distance and estimate (with a least-square algorithm) an initial position of the target plane using these signs. We then use the iterative algorithm we presented in [7] to determine the correct signed distances and the associated plane. This algorithm, which minimizes the leastsquare error of the estimated target plane, converges to two stable solutions that are symmetrical around plane {p}. The two solutions correspond to the positive and negative elevation distances z, respectively. Note that from one solution we can easily determine the second. By formulating the out-of-plane relative position as a combination of a translation z along the Z axis of plane {p} and
4
A. Krupa, G. Fichtinger, and G.D. Hager
two successive rotations α, β around the Y and X axes of {p}, we obtain the following homogeneous transformation matrix for out-of-plane motion: ⎛ ⎞ cos(α) cos(α) sin(θ) sin(α) cos(θ) 0 ⎜ 0 cos(θ) − sin(θ) 0 ⎟ p ⎟ Hi = ⎜ (2) ⎝ − sin(α) cos(α) sin(θ) cos(α) cos(θ) z ⎠ 0 0 0 1 The two symmetrical solutions for the 6 DOF motion are then given by the estimates: c
i (+) = H
c
p pH i (+) H
and
c
i (−) = H
c
p pH i (−) H
(3)
where (+) indicates the solution obtained for zˆ > 0, with α ˆ = atan(ˆ a/ˆ c), θˆ = −asin(ˆb) and (-) indicates the solution corresponding to zˆ < 0 with α ˆ = atan(−ˆ a/ˆ c), θˆ = −asin(−ˆb). Here (ˆ a, ˆb, cˆ) is the normal vector of the estimated target plane that is obtained for the solution zˆ > 0. The subscript ˆ denotes values provided by the template tracking and plane estimation methods. It will be purposely dropped in the next of the paper for clarity of presentation. This method works only locally about the target region due to the rapid rate of speckle decorrelation with out-of-plane motion. Therefore, in order to increase the range of convergence, we augment the basic algorithm with a FIFO buffer of intermediate planes {i} between the target {t} and current plane {c}. These planes, which are acquired online as the probe moves, are chosen to be close enough to be well “speckle correlated” and thus provide a “path” of ultrasound images that can be traced back to the target. The complete algorithm summarizing our method for extracting target plane position is described in Fig. 2 (for positive elevation distances) and Fig. 3 (for negative elevation distances.) At initialization, the target plane is captured in the initial B-scan image and stored in a FIFO buffer (plane) starting with index i = 0. The current image is also stored as the target image (imageref erence = currentplane). A small negative elevation displacement is then applied to the probe in order to obtain an initial positive elevation distance z[0] ≥ s > 0 of plane[0] with respect to the current B-scan plane. Here s is a small threshold distance fixed to guarantee speckle correlation between US images. The algorithm goes to the case of positive elevation distance. The array index is then incremented and an intermediate plane is stored (plane[i] = currentplane) with the homogeneous matrix i Hi−1 = c Hi−1 (+) describing the position of plane[i − 1] with respect to plane[i] and given by (3). Each time an intermediate plane is added, the target image used by the in-plane motion tracker is also updated (imageref erence = currentplane). After initialization, the configuration of planes corresponds to case 1 in Fig. 2, where the target plane position
1 is c Ht = c Hi (+) i k Hk−1 . Now, we suppose that the target plane moves for some reason. By computing (3) for c Hi and c Hi−1 , we can: 1) determine the consistent pair of solutions that express the current plane relative to plane[i] and plane[i − 1], 2) determine which of cases 1, 2 or 3 is valid and 3) compute the target elevation position c Ht accordingly. As shown, the three cases are:
Real-Time Tissue Tracking with B-Mode Ultrasound
5
Fig. 2. (top) possible planes configurations and (bottom) process used to manage the intermediates planes when the target elevation distance is positive
1) if the current plane moves a distance s beyond the top of the FIFO array, then a new intermediate plane is added or 2) if the current plane is between the top two planes of the FIFO array, then no change occurs, or 3) if the elevation distance decreases, then the last intermediate plane is removed from the FIFO array. In the latter case, a special situation arises when there are only two planes (i = 1) in the array. In this case, if the absolute value of the target elevation distance reaches the threshold s, then the algorithm switches to the second mode described in Fig. 3 which is the symmetric logic for negative elevations. For this mode, the possible configurations of planes are illustrated by cases 4 to 6 in Fig. 3. The algorithm switches back to the first mode when the target plane elevation position becomes positive again.
3
Visual Servoing
Now, as the position of the B-scan target with respect to the current plane has been estimated, we move the robot (holding the probe) in order to follow the target plane. In our approach, a 3D visual servoing control scheme is used to minimize the relative position between the current and target planes. The T T describing the error vector is the 6 dimensional pose vector x = (t PT c , θu )
6
A. Krupa, G. Fichtinger, and G.D. Hager
Fig. 3. (top) possible planes configurations and (bottom) process used to manage the intermediates planes when the target elevation distance is negative
position of the current plane frame {c} with respect to the target plane frame {t}. Here t Pc is the translation vector obtained directly from the 4th column of t t Hc = c H−1 t , and θu is the angle-axis representation of the rotation Rc [8]. The variation of x is related to the velocity screw v = (vx , vy , vz , ωx , ωy , ωz )T of the ultrasound probe by x˙ = Ls v. In visual servoing, Ls is called the interaction matrix and is given in this case by (cf. [9]): t Rc 03 (4) Ls = 0 I − θ [u] + (1 − sincθ )[u] 3 3 × × 2 sinc2 θ2 where I3 is the 3 × 3 identity matrix and [u]× is the skew matrix of the vector preproduct linked with u. The visual servoing task (cf. [9]) can then be expressed as a regulation to zero of the pose x and is performed by applying the following control law: v = −λL−1 s x where λ is the proportional coefficient involved for a exponential convergence.
4
Experiments and Results
We have tested the motion stabilization method on 2-DOF motions combining a translation along the image X axis (in-plane translation) and elevation Z axis
Real-Time Tissue Tracking with B-Mode Ultrasound
phantom
7
robot 2
US probe image
X
Z X
Z
robot 1 Tracking error (mm)
position of the two robotsX(mm) axis 30 20 10
x robot 1 z robot 1 x robot 2 z robot 2
4
x (in−plane)
3
z (out−plane)
2 1
0
0
−10
−1 −2
−20
−3 −30 0
50
100 time (s)
150
200
−4 0
50
100 time (s)
150
200
Fig. 4. (top) experimental setup - (bottom-left) evolution of the robots positions (bottom-right) tracking error
(out-of-plane translation). The experimental, setup, shown in Fig. 4, consists of two X-Z Cartesian robots fixed and aligned on an optical table. The first robot provides a ground truth displacement for an US speckle phantom. The second robot holds a transrectal 6.5 Mhz US transducter and is controlled as described above to track a target plane. The US image is 440 × 320 pixels with resolution of 0.125 mm/pixel. A laptop computer (Pentium IV 2 Ghz) captures the US stream at 10 fps, extracts the target plane position by using a grid of 25 patches and computes the velocity control vector applied to the probe holding robot. The plots in Fig. 4 show the evolution of the robots positions and the tracking error when sinusoidal motions (magnitude of 30 mm on each axis) were applied to the phantom. The dynamic tracking error was below 3 mm for in-plane translation and 3.5 mm for the elevation translation. This error is attributed the dynamics of the target motion, time delays in the control scheme, and the dynamics of the probe holding robot. These errors could be reduced if a prediction of its variation was introduced into the control law by some method such as Kalman filter or generalized predictive controller [10]. Adopting recent
8
A. Krupa, G. Fichtinger, and G.D. Hager
methods [11] for more accurate and efficient identification of fully developed speckle patches should also improve on tracking performance and may allow estimation of relative motion between different soft tissue elements. In order to determine the static accuracy of the tracking robotic task, we applied a set of 140 random positions to the phantom by using ramp trajectories while tracking the target plane by the robotized probe. When the probe stabilized at a position, the phantom was held motionless for 2 seconds and the locations of the two robots were recorded. We recorded a static error of 0.0219±0.05 mm (mean ± standard deviation) for the in-plane tracking and 0.0233±0.05 mm for the out-of-plane tracking, which is close to the positioning accuracy of the robots (± 0.05 mm). In conclusion, results obtained from 2-DOF in-plane and out-of-plane motions demonstrated the potential of our approach. We are presently adding rotational stages to the robots to experimentally validate full 6-DOF motion tracking and visual servoing capabilities of the current algorithm described in this paper.
References 1. Abolmaesumi, P., Salcudean, S.E., Zhu, W.H., Sirouspour, M., DiMaio, S.: Imageguided control of a robot for medical ultrasound. IEEE Trans. Robotics and Automation 18, 11–23 (2002) 2. Hong, J., Dohi, T., Hashizume, M., Konishi, K., Hata, N.: An ultrasound-driven needle insertion robot for percutaneous cholecystostomy. Physics in Medicine and Biology 49(3), 441–455 (2004) 3. Bohs, L.N., Geiman, B.J., Anderson, M.E., Gebhart, S.C., Trahey, G.E.: Speckle tracking for multi-dimensional flow estimation. Ultrasonics 28(1-8), 369–375 (2000) 4. Gee, A.H., Housden, R.J., Hassenpflug, P., Treece, G.M., Prager, R.W.: Sensorless freehand 3D ultrasound in real tissues: Speckle decorrelation without fully developed speckle. Medical Image Analysis 10(2), 137–149 (2006) 5. Chang, R.-F., Wu, W.-J., Chen, D.-R., Chen, W.-M., Shu, W., Lee, J.-H., Jeng, L.-B.: 3-D US frame positioning using speckle decorrelation and image registration. Ultrasound in Med. & Bio. 29(6), 801–812 (2003) 6. Hager, G.D., Belhumeur, P.N.: Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(10), 1025–1039 (1998) 7. Krupa, A., Fichtinger, G., Hager, G.D.: Full Motion Tracking in Ultrasound Using Image Speckle Information and Visual Servoing. In: ICRA 2007. IEEE Int. Conf. on Robotics and Automation, Roma, Italy, IEEE Computer Society Press, Los Alamitos (2007) 8. Craig, J.J.: Introduction to Robotics: Mechanics and Control, 2nd edn. AddisonWesley, London, UK (1989) 9. Chaumette, F., Hutchinson, S.: Visual Servo Control, Part I: Basic Approaches. IEEE Robotics and Automation Magazine 13(4), 82–90 (2006) 10. Ginhoux, R., Gangloff, J., de Mathelin, M., Soler, L., Sanchez, M.M.A., Marescaux, J.: Active Filtering of Physiological Motion in Robotized Surgery Using Predictive Control. IEEE Transactions on Robotics 21(1), 67–79 (2005) 11. Rivaz, H., Boctor, E., Fichtinger, G.: Ultrasound Speckle Detection Using Low Order Moments. In: IEEE International Ultrasonics Symposium, Vancouver, Canada, IEEE Computer Society Press, Los Alamitos (2006)
Intra-operative 3D Guidance in Prostate Brachytherapy Using a Non-isocentric C-arm A. Jain1,3 , A. Deguet1 , I. Iordachita1 , G. Chintalapani1 , J. Blevins2 , Y. Le1 , E. Armour1 , C. Burdette2 , D. Song1 , and G. Fichtinger1 1
Johns Hopkins University Acoustic MedSystems Inc. Philips Research North America 2
3
Abstract. Intra-operative guidance in Transrectal Ultrasound (TRUS) guided prostate brachytherapy requires localization of inserted radioactive seeds relative to the prostate. Seeds were reconstructed using a typical C-arm, and exported to a commercial brachytherapy system for dosimetry analysis. Technical obstacles for 3D reconstruction on a nonisocentric C-arm included pose-dependent C-arm calibration; distortion correction; pose estimation of C-arm images; seed reconstruction; and C-arm to TRUS registration. In precision-machined hard phantoms with 40-100 seeds, we correctly reconstructed 99.8% seeds with a mean 3D accuracy of 0.68 mm. In soft tissue phantoms with 45-87 seeds and clinically realistic 15o C-arm motion, we correctly reconstructed 100% seeds with an accuracy of 1.3 mm. The reconstructed 3D seed positions were then registered to the prostate segmented from TRUS. In a Phase-1 clinical trial, so far on 4 patients with 66-84 seeds, we achieved intra-operative monitoring of seed distribution and dosimetry. We optimized the 100% prescribed iso-dose contour by inserting an average of 3.75 additional seeds, making intra-operative dosimetry possible on a typical C-arm, at negligible additional cost to the existing clinical installation.
1
Introduction
With an approximate annual incidence of 220,000 new cases and 33,000 deaths (United States) prostate cancer continues to be the most common cancer in men. Transrectal Ultrasound (TRUS) guided permanent low-dose-rate brachytherapy (insertion of radioactive seeds into the prostate) has emerged as a common & effective treatment modality for early stage low risk prostate cancer, with an expected 50,000 surgeries every year. The success of brachytherapy (i.e. maximizing its efficacy while minimizing its co-morbidity) chiefly depends on our ability to tailor the therapeutic dose to the patient’s individual anatomy. The main limitation in contemporary brachytherapy is intra-operative tissue expansion (edema), causing incorrect seed placement, which may potentially lead to insufficient dose to the cancer and/or excessive radiation to the rectum, urethra, or bladder. The former might permit the cancer to relapse, while the latter
Supported by DoD PC050170, DoD PC050042 and NIH 2R44CA099374.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 9–17, 2007. c Springer-Verlag Berlin Heidelberg 2007
10
A. Jain et al.
causes adverse side effects like rectal ulceration. According to a comprehensive review by the American Brachytherapy Society [1], the pre-planned technique used for permanent prostate brachytherapy has limitations that may be overcome by intra-operative planning. Prostate brachytherapy is almost exclusively performed under TRUS guidance. Various researchers have tried to segment the seeds from TRUS images by linking seeds with spacers, using X-rays to initialize segmentation, using vibro-acoustography or transurethral ultrasound as a new imaging modality, or segmenting them directly in TRUS images by using corrugated seeds that are better visible than conventional ones [2]. But even when meticulously handsegmented, up to 25% of the seeds may remain hidden in ultrasound. C-arms are also ubiquitous, though used only for gross visual assessment of the implanted seed positions (approximately 60% of the practitioners using it in the operating room [3]). In spite of significant efforts that have been made towards computational fluoroscopic guidance in general surgery [4], C-arms cannot yet be used for intra-operative brachytherapy guidance due to a plethora of technical limitations. While several groups have published protocols and clinical outcomes favorably supporting C-arm fluoroscopy for intra-operative dosimetric analysis [5,6,7], this technique is yet to become a standard of care across hospitals. In this paper we report a system to reconstruct 3D seed positions (visible in X-ray) and spatially register them to the prostate (visible in TRUS). Our primary contribution is our ability to use any typical non-isocentric uncalibrated Carm present in most hospitals, in comparison to the use of calibrated isocentric machines [5,6] or an approximate reconstruction [7], as reported in the literature.
2
Methods and Materials
The system is designed to easily integrate with commercial brachytherapy installations. We employ a regular clinical brachytherapy setup, without alteration, including a treatment planning workstation & stabilizer/stepper (Interplant , CMS, St Louis), TRUS (B&K Medical Pro Focus) and a C-arm (GE OEC
Seed Matching (MARSHAL)
3D Seed Locations
Prostate info from TRUS
Intra-operative Dose Guidance
C-arm Pose (FTRAC)
Fig. 1. Overview of the proposed solution. The FTRAC fiducial tracks C-arms, and also registers TRUS to C-arm images, making quantitative brachytherapy possible.
Intra-operative 3D Guidance in Prostate Brachytherapy
11
98/9900). The C-arm is interfaced with a laptop through an NTSC video line and frame grabber, making the image capture independent of the C-arm model. Workflow: The clinical workflow (Fig. 1) is identical to the standard procedure until the clinician decides to run a reconstruction and optimization. A set of Carm images are collected with a separation as wide as clinically possible (10−15o around AP-axis) and synchronously transferred to the laptop. After processing the images, the seeds are reconstructed and exported to the Interplant system. The physician uses standard Interplant tools to analyze, optimize and modify the remainder of the plan. The procedure concludes when the exit dosimetry shows no cold spots (under-radiated locations). Numerous technical obstacles have to be overcome to realize C-arm based intraoperative dosimetry: (a) C-arm calibration; (b) image distortion correction; (c) pose estimation of C-arm images; (d) seed reconstruction; (e) registration of Carm to TRUS; (f) dosimetry analysis; and finally (g) implant optimization. We have developed a system that overcomes these limitations in providing quantitative intra-operative dosimetry. In what follows, we will describe briefly each component of the system, skipping the mathematical framework for lack of space. C-arm Source Calibration and Image Distortion: Since both C-arm calibration and distortion are pose-dependent, contemporary fluoroscopy calibrates/ distortion-corrects at each imaging pose using a cumbersome calibration-fixture, which is a significant liability. Our approach is Wrong X-ray source ¯f a complete departure. Using a mathematical & True X-ray D ¯ source experimental framework, we demonstrated that f¯ P calibration is not critical for prostate seed reconP F struction. Just an approximate pre-operative calF F ibration suffices [8]. The central intuition is that F object reconstruction using a mis-calibrated C-arm ¯L changes only the absolute positions of the objects, L ¯ but not their relative ones (Fig. 2). Additionally, p F statistical analysis of the distortion in a 15o limited workspace around the AP-axis revealed that Fig. 2. Mis-calibration conjust a pre-operative correction can reduce the aver- serves relative reconstruction age distortion from 3.31 mm to 0.51 mm, sufficient between objects A and B (eg. for accurate 3D reconstruction. The numbers are seeds) expected to be similar for other C-arms too. 2
1
2
1
A
B
A
B
2
1
I
Pose Estimation: The most critical component for 3D reconstruction is C-arm pose estimation. C-arms available in most hospitals do not have encoded rotational joints, making the amount of C-arm motion unavailable. C-arm tracking using auxiliary trackers is expensive, inaccurate in the presence of metal (EM tracking) or intrudes in the operating room (optical tracking). There has been some work on fiducial based tracking, wherein a fiducial (usually large for accuracy) is introduced in the X-ray FOV and its projection in the image encodes the
12
A. Jain et al.
6 DOF pose of the C-arm. We proposed a new fluoroscope tracking (FTRAC) [9] fiducial design that uses an ellipse (key contribution), allowing for a small (3x3x5cm) yet accurate fiducial. In particular, the small size makes it easier to be always in the FOV & to be robust to image distortion. Extensive phantom experiments indicated a mean tracking accuracy on distorted C-arms of 0.56 mm in translation and 0.25o in rotation, an accuracy comparable to expensive external trackers. Seed Segmentation: We developed an automated seed segmentation algorithm that employs the morphological top-hat transform to perform the basic seed segmentation, followed by thresholding, region labeling, and finally a two-phase classification to segment both single seeds & clusters. The result of the segmentation is verified on the screen to allow for a manual bypass by the surgeon.
Fig. 3. The FTRAC fiducial mounted over the seedinsertion needle template using a mechanical connector. An X-ray image of the same.
Seed Correspondence & Reconstruction: The 3D coordinates of the implanted seeds can now be triangulated by resolving the correspondence of seeds in the multiple X-ray images. We formalized seed correspondence to a networkflow-based combinatorial optimization, wherein the desired solution is the flow with minimum cost. Using this abstraction, we proposed an algorithm (MARSHAL [10]) that runs in cubic-time using any number of images. In comparison, previous solutions have predominantly been heuristic explorations of the large search space (10300 ). In addition, the framework also robustly resolves all the seeds that are hidden in the images (typically 4-7% due to the high density). MARSHAL typically reconstructs 99.8% of the seeds and runs in under 5s in MATLAB (a 95% minimum-detection-rate is usually deemed sufficient [11]). Registration of C-arm to TRUS: The FTRAC is attached to the needle-insertion template by a precisely machined mechanical connector (Fig. 4) in a known relative way (pre-calibration). The template has already been calibrated to TRUS as per the usual clinical protocol. Thus a simple application of the various known frame transformations, registers the 3D seeds (FTRAC) to the prostate (TRUS).
Fig. 4. FTRAC & template pre-calibration using a rigid mount
System Implementation, Dosimetry Analysis and Implant Optimization: We have integrated all the above functions into a MATLAB program with a GUI. The package runs on a laptop that sends reconstructed seed positions (in template coordinates) to the Interplant system. In order to not require a new FDA approval, we maintain the integrity of the FDA-approved Interplant by not modifying the commercial software, and instead use a text file to export the 3D seed locations.
Intra-operative 3D Guidance in Prostate Brachytherapy
13
The physician uses standard Interplant tools (isodose coverage, etc.) for dose analysis, and if needed, modifies the residual plan to avoid hot spots or fill in cold spots. This process can be repeated multiple times during the surgery.
3
Phantom Experiments and Results
We have extensively tested the system and its components in various phantoms and in an ongoing Phase-1 clinical trial. To do so, we introduce the terms absolute and relative reconstruction errors. Using X-ray images, the seeds are reconstructed with respect to (w.r.t.) the FTRAC frame. In experiments where the ground truth location of the seeds w.r.t. the FTRAC is known, the comparative analysis is called absolute accuracy. Sometimes (eg. in patients), the true seed locations w.r.t. the FTRAC are not available and the reconstruction can only be compared to the seeds extracted from post-op data (using a rigid point-cloud transform), in which case the evaluation is called relative accuracy. Solid Seed Phantom: An acetol (Delrin) phantom consisting of ten slabs (5mm each) was fabricated (Fig. 5 (a)). This phantom provides a multitude of implants with sub-mm ground truth accuracy. The fiducial was rigidly attached to the phantom in a known way, establishing the accurate ground truth 3D location of each seed. Realistic prostate implants (1.56 seeds/cc, 40-100 seeds) were imaged within a 30o cone around the AP-axis. The true correspondence was manually established by using the 3D locations, known from the precise fabrication. Averaged results indicate that we correctly match 98.5% & 99.8% of the seeds using 3 & 4 images (100 & 75 total trials) respectively. The mean 3D absolute reconstruction accuracy was 0.65 mm (STD 0.27 mm), while the relative accuracy was 0.35 mm. Furthermore, using 4 images yielded only one poorly mis-matched seed from the 75 datasets, suggesting the use of 4 images for better clinical guidance. Soft Training Phantoms: We fully seeded three standard prostate brachytherapy phantoms (Fig. 5 (b)) with realistic implant plans (45, 49, 87 seeds). Seed
FTRAC X-ray
TRUS Display
Interplant
Spacers
Template FTRAC Patient Needle Insertion Template Prostate Phantom
TRUS Probe
TRUS Probe TRUS Stepper
TRUS Stepper
(a)
(b)
(c)
Fig. 5. (a) An image of the solid seed phantom attached to the fiducial with a typical X-ray image of the combination. (b) An annotated image of the experimental setup for the training phantom experiment. (c) The clinical setup from the Phase-I clinical trial.
14
A. Jain et al.
locations reconstructed from fluoro using realistic (maximum available clinically) image separation (about 15o ) were compared to their corresponding ground truth locations segmented manually in CT (1mm slice thickness). Additionally, the 45 & 87-seed phantoms were rigidly attached to the FTRAC, providing the absolute ground truth (from CT). The 49-seed phantom was used to conduct a full scale practice-surgery, in which case the 3D reconstruction could be compared only to the seed cloud from post-op CT (without FTRAC), providing just relative accuracy. Note that our reconstruction accuracy (as evident from the previous experiments) is better than the CT resolution. The absolute reconstruction errors for the 45, 87-seed phantoms were 1.64 mm & 0.95 mm (STD 0.17 mm), while the relative reconstruction errors for the 45, 49, 87-seed phantoms were 0.22 mm, 0.29 mm, 0.20 mm (STD 0.13 mm). A mean translation shift of 1.32 mm was observed in the 3D reconstructions, predominantly due to the limited C-arm workspace (solid-phantom experiments with 30o motion have 0.65 mm accuracy). It was observed that the shift was mostly random & not in any particular direction. Nevertheless, the accuracy is sufficient for brachytherapy, especially since a small shift still detects the cold spots. Patients: A total of 11 batches of reconstructions were carried out on 4 patients with 2 − 3 batches/patient & 22 − 84 seeds/batch. Since the seeds migrate by the time a post-operative CT is taken, there is no easy ground truth for real patients. Hence, for each reconstruction, 5 − 6 additional X-ray images were taken. The reconstructed 3D seed locations were projected on these additional images and compared to their segmented corresponding 2D locations. The results from 55 projections gave a 2D mean error of 1.59 mm (STD 0.33 mm, max 2.44 mm), indicating a sub-mm 3D accuracy (errors get magnified when projected). Registration Accuracy: To measure the accuracy of the fiducial to template registration, three batches of five needles each were inserted randomly at random depths into the template. Their reconstructed tip locations were then compared to their true measured locations (both in template coordinates). The limitedangle image-capture protocol was kept similar to that used in the clinic. The average absolute error (reconstruction together with registration) was 1.03 mm (STD 0.20 mm), while the average relative error was 0.36 mm (STD 0.31 mm), with an average translation shift of 0.97 mm. System Accuracy: To measure the full system error, 5 needles (tips) were inserted into a prostate brachytherapy training phantom, reconstructed in 3D and exported to the Interplant software. Manual segmentation of the needles in TRUS images (sagittal for depth and transverse for X-Y) provided ground truth. The mean absolute error for the 5 needle tips was 4 mm (STD 0.53 mm), with a translation shift of 3.94 mm. In comparison, the relative error for the complete system was only 0.83 mm (STD 0.18 mm). The shift can mainly be attributed to a bias in the Template-TRUS pre-calibration (∼ 3 mm) done as part of current clinical practice, & in the 3D reconstruction (∼ 1 mm). Nevertheless, we
Intra-operative 3D Guidance in Prostate Brachytherapy
15
removed this shift in the clinical cases by applying a translation-offset to the reconstructed X-ray seed coordinates. This offset was intra-operatively estimated by comparing the centroid of the reconstructed seeds with that of the planned seed locations, and by aligning the two together. Note that the centroid is a first-order statistic and robust to any spatially symmetric noise/displacement model. Though a heuristic, it provided excellent qualitative results according to the surgeon, who read the visual cues at the reconstructed seed locations in TRUS images. Based on the experiments so far and the surgeon’s feedback, the overall accuracy of the system is expected to be 1 − 2 mm during clinical use.
(a)
(b)
Fig. 6. (a) The system is able to detect cold spots. The 100% iso-dose contours (pink) as assumed by the planning system (top) and as computed by the proposed system (bottom), discovering 2 cold spots. Red marks the prostate boundary. The green squares delineate the seed coordinates, detecting 4 seeds that had drifted out of slice. (b) The system can visualize intra-operative edema (mean 4.6 mm, STD 2.4 mm, max 12.3 mm). The ’planned’ (red) versus the ’reconstructed’ (blue) seed positions as seen in the template view. A trend of outward radiation from their initial locations is observed.
Phase-I Clinical Trial: We have treated 4 patients so far (Fig. 5 (c)), out of a total of 6 that will be enrolled. Intra-operative dosimetry was performed halfway during the surgery, at the end, and after additional seed insertions. The current protocol adds 15 minutes to the OR time for each reconstruction, including the capture of extra images (validation), reconstruction, and dosimetry optimization. In regular clinical practice, we anticipate the need for only a single exit-dosimetry reconstruction, increasing the OR time by about 10 minutes. In all the patients the final dosimetry detected cold spots (Fig. 6 (a)). The clinician grew quickly to trust the system in detecting cold spots, and instead minimized potential hot spots during the surgery. All patients were released from the operating room with satisfactory outcomes. Intra-operative visualization of edema (prostate swelling) was also possible (Fig. 6 (b)), which was found to be 0.73, 4.64, 4.59, 4.05 mm (STD 1.1, 2.2, 2.34, 2.37 mm). The seeds (and hence the prostate) showed a clear tendency for outward migration from their drop positions (with maximums up to 15 mm). Edema is the single largest factor that makes the perfect delivery
16
A. Jain et al.
of the pre-planned dose nearly impossible. In almost all the patients, towards the end of the surgery, it was found that the apex of the prostate (surgeon-end) was under-dosed. The medical team found the intra-operative visualization of under-dosed regions valuable, inserting an additional 1, 2, 3, 9 seeds to make the 100% prescribed iso-dose contour cover the prostate. A further comparison of the exit implant to Day-0 CTs (2 mm slices) showed mean errors of 5.43, 6.16, 3.13, 5.15 mm (STD 2.46, 2.96, 2.02, 2.71 mm), indicating a further post-operative seed migration. Though, post-operative seed migration is an inherent limitation in brachytherapy, surgeons usually accommodate for it by slightly over-dosing the patient (note that sub-mm seed placement is non-critical). A study with 40 patients is currently being planned, to make a statistically relevant evaluation of the medical benefit of the system using clinical indicators.
4
Conclusion, Shortcomings and Future Work
A system for brachytherapy seed reconstruction has been presented, with extensive phantom and clinical trials. The absolute seed reconstruction accuracy from phantom trials is 1.3 mm using 15o C-arm motion, sufficient for detection of any cold spots. It shows usefulness and great potential from the limited Phase-1 patient trials. The system (a) requires no significant hardware; (b) does not alter the current clinical workflow; (c) can be used with any C-arm; & (d) integrates easily with any pre-existing brachytherapy installation, making it economically sustainable and scalable. There is some added radiation to the patient, though insignificant when compared to that from the seeds. Though not critical, primary shortcomings include (a) 15 minute additional OR time; (b) supervision during segmentation; & (c) a small translation bias. Furthermore, a TRUS based quantitative methodology is necessary to evaluate both the final system performance and clinical outcomes. Research is currently underway to remove these limitations, and to conduct a more detailed study using clinical indicators.
References 1. Nag, et al.: Intraoperative planning and evaluation of permanent prostate brachytherapy: report of the american brachytherapy society. IJROBP 51 (2001) 2. Tornes, A., Eriksen, M.: A new brachytherapy seed design for improved ultrasound visualization. In: IEEE Symposium on Ultrasonics, pp. 1278–1283. IEEE Computer Society Press, Los Alamitos (2003) 3. Prestidge, et al.: A survey of current clinical practice of permanent prostate brachytherapy in the united states. IJROBP 15, 40(2), 461–465 (1998) 4. Hofstetter, et al.: Fluoroscopy as an imaging means for computer-assisted surgical navigation. CAS 4(2), 65–76 (1999) 5. Reed, et al.: Intraoperative fluoroscopic dose assessment in prostate brachytherapy patients. Int. J. Radiat. Oncol. Biol. Phys. 63, 301–307 (2005) 6. Todor, et al.: Intraoperative dynamic dosimetry for prostate implants. Phys. Med. Biol. 48(9), 1153–1171 (2003)
Intra-operative 3D Guidance in Prostate Brachytherapy
17
7. French, et al.: Computing intraoperative dosimetry for prostate brachytherapy using trus and fluoroscopy. Acad. Rad. 12, 1262–1272 (2005) 8. Jain, et al.: C-arm calibration - is it really necessary? In: SPIE Medical Imaging; Visualization, Image-Guided Procedures, and Display (2007) 9. Jain, et al.: A robust fluoroscope tracking fiducial. Med. Phys. 32, 3185–3198 (2005) 10. Kon, R., Jain, A., Fichtinger, G.: Hidden seed reconstruction from c-arm images in brachytherapy. In: IEEE ISBI, pp. 526–529. IEEE Computer Society Press, Los Alamitos (2006) 11. Su, et al.: Examination of dosimetry accuracy as a function of seed detection rate in permanent prostate brachytherapy. Med. Phy. 32, 3049–3056 (2005)
A Multi-view Opto-Xray Imaging System Development and First Application in Trauma Surgery Joerg Traub1 , Tim Hauke Heibel1 , Philipp Dressel1 , Sandro Michael Heining2 , Rainer Graumann3 , and Nassir Navab1 1
2
Computer Aided Medical Procedures (CAMP), TUM, Munich, Germany Trauma Surgery Department, Klinikum Innenstadt, LMU Munich, Germany 3 Siemens SP, Siemens Medical, Erlangen, Germany
Abstract. The success of minimally invasive trauma and orthopedic surgery procedures has resulted in an increase of the use of fluoroscopic imaging. A system aiming to reduce the amount of radiation has been introduced by Navab et al. [1]. It uses an optical imaging system rigidly attached to the gantry such that the optical and X-ray imaging geometry is identical. As an extension to their solution, we developed a multi-view system which offers 3D navigation during trauma surgery and orthopedic procedures. We use an additional video camera in an orthogonal arrangement to the first video camera and a minimum of two X-ray images. Furthermore, tools such as a surgical drill are extended by optical markers and tracked with the same optical cameras. Exploiting that the cross ratio is invariant in projective geometry, we can estimate the tip of the instrument in the X-ray image without external tracking systems. This paper thus introduces the first multi-view Opto- Xray system for computer aided surgery. First tests have proven the accuracy of the calibration and the instrument tracking. Phantom and cadaver experiments were conducted for pedicle screw placement in spinal surgery. Using a postoperative CT, we evaluate the quality of the placement of the pedicle screws in 3D.
1
Introduction
Mobile C-arm systems are established in everyday routines in orthopedic and trauma surgery. The trend toward minimally invasive applications increases the use of fluoroscopic images within surgery and thus the radiation dose [2,3]. Nowadays the combined use of mobile C-arms, that are capable of 3D reconstruction, and a tracking system provide navigation information during surgery, e.g. [4]. Systems using this technique use so called registration free navigation methods based on a mobile C-arm with 3D reconstruction capabilities tracked by an external optical tracking system. The imaging device is tracked and the volume is reconstructed in the same reference frame in which the instruments and the patient are tracked. Hayashibe et al. [5] combined the registration free navigation approach using an intra-operative tracked C-arm with reconstruction capabilities and in-situ visualization by volume rendered views from any arbitrary position of a swivel arm mounted monitor. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 18–25, 2007. c Springer-Verlag Berlin Heidelberg 2007
A Multi-view Opto-Xray Imaging System
19
Augmenting interventional imaging data using mirror constructions were proposed by Stetten et al. [6] for tomographic reflection on Ultrasound and Fichtinger et al. for navigated needle insertion based on CT [7] and MR [8]. Another approach for augmentation of intraoperative image data is the physical attachment of an optical camera to an X-ray source as proposed by Navab et al. [1]. It uses a single optical camera rigidly attached to the gantry such that the optical and X-ray imaging geometry is aligned. This enabled a real time video image and X-ray overlay that was registered by construction. No registration of the patient was required in their approach. This provided an accurate positioning and guidance of instruments in 2D. However, no depth control was possible. Thus their system was limited to applications where depth did not matter, like in intramedullary-nail locking as proposed by their group [9]. As an extension to their proposed system, we developed a system that is also capable of depth control during trauma surgery and orthopedic procedures using only one additional X-ray image and a second video camera that is rigidly attached to the C-arm. Furthermore, we implemented a system to track an instrument in 2D. Using cross ratio we estimate the position of the tip in the image. After one time calibration of the newly attached second video camera we are able to show the instrument tip in the orthogonal X-ray view. The feasibility of the system has been validated trough cadaver studies where we successfully identified all six pedicle screws placed using the procedure. The accuracy of the placement was validated using a postoperative CT.
2 2.1
System Setup System Components
The system consists of an Iso3D C-arm (Siemens Medical, Erlangen, Germany) with two attached Flea video color cameras (Point Grey Research Inc., Vancouver, BC, Canada) (see figure 1). The first camera is attached as proposed earlier by Navab et al. using a double mirror construction with X-ray transparent mirrors [1]. The second camera is attached orthogonal to the gantry such that its view is aligned with the X-ray image after a 90 degrees orbital rotation of the C-arm (see figure 1). Furthermore, the system includes a standard PC with a framegrabber card to access the analog images of the C-arm. A custom developed visualization and navigation software is used (see section 2.3). 2.2
System Calibration
For both cameras the calibration process can be devided in two consecutive steps. In the first step the cameras are physically attached such that the optical center and axis virtually coincide with the X-ray imaging system at all time for the gantry mounted camera and at particular C-arm positions for the orthogonal mounted camera. The second step is to compute the homographies to align the video images with the X-ray images. For both video cameras the distortion is computed using the Matlab camera calibration toolbox and the images
20
J. Traub et al.
Fig. 1. The C-arm with two attached optical cameras. The first camera is attached to the gantry with a double mirror construction. The second camera is attached in an orthogonal direction with a single mirror construction.
are undistorted using Intel OpenCV library. The use of flat panel displays or standard distortion correction methods is recommended 1 . Notation. Since we have images at different positions of the C-arm superscript 0 denote cameras and images at a 0 degree orbital rotation and superscript 90 denote cameras and images acquired by the C-arm after an orbital rotation around 90 degree. Furthermore subscript x is used for X-ray, g for gantry mounted, and o for the orthogonal mounted cameras. X-ray to Gantry Mounted Camera Calibration. Initially the gantry mounted camera is physically placed such, that its optical center and axis are aligned respectively with the X-ray source. This alignment is achieved by a bi-planar calibration phantom and mounting the camera using a double mirror construction. To superimpose the X-ray image onto the video image a homography HIg0 ←Ix0 is computed. Thanks to this homography the effects of X-ray distortions close to the image center are diminished. The procedure of the gantry mounted camera calibration is described in detail by Navab et al. [1]. X-ray to Orthogonal Mounted Camera Calibration. We constrained the attachment of the second camera to be orthogonal with respect to the gantry. This attachment provides best results for the depth navigation, assuming the instruments are always used down the beam as in the single camera navigation system [1]. The physical attachment and calibration of the second camera at alternative positions is also possible with the procedure described in this section. 1
See http://campar.in.tum.de/Events/VideoCamCSquare for a video demonstration of the calibration and navigation procedure.
A Multi-view Opto-Xray Imaging System
21
To acquire an X-ray image Ix90 corresponding to the view Io0 of the orthogonal mounted camera, we have to ensure that after an orbital rotation the optical center and axis of the X-ray gantry and the orthogonal camera are aligned. Since the gantry mounted camera is already physically aligned with the X-ray device, the problem can be reduced to physically aligning the gantry mounted and orthogonal mounted camera after rotation. This alignment is achieved with a bi-planar calibration pattern. A set of markers is placed on each plane such that subsets of two markers, one on each plane, are aligned in the image of the gantry mounted camera Ig0 at the initial position of the C-arm (see figure 2(e)). In the next step, the C-arm is rotated by −90 degrees in orbital(see figure 2(c)). Now the orthogonal mounted camera is physically moved in six degrees of freedom, until all marker tuples from the calibration pattern are lined up in image Io−90 in exactly the same way as they were for the gantry mounted camera (see figure 2(f)). Note that the calibration only has to be performed once the system is constructed. Once the system is built, the alignment is preserved by construction. For a proper alignment of the X-ray image Ix90 at 90 degree rotation in orbital direction and the image Io0 of the orthogonal camera with no rotation, two homographies remain to be computed. A first homography HIg90 ←Ix90 that maps the X-ray image Ix90 to the gantry mounted image Ig90 and a second homography HIo0 ←Ig90 mapping the image of the gantry mounted camera to the image of the orthogonal mounted camera. The final homography used to map the X-ray image Ix90 onto the orthogonal mounted camera image Io0 uses the hompgraphy HIo0 ←Ix90 = HIo0 ←Ig90 · HIg90 ←Ix90 , a combination of the two homograpies computed earlier. Both homographies are computed using corresponding points in the images. Even though the gantry camera is rigidly mounted a second estimation of the homography HIg90 ←Ix90 is determined to approximately compensate distortion effects of the X-ray image after rotation. 2.3
Navigation System
The navigation does not require any further calibration or registration procedure. The previously described calibration routine has to be performed only once while the system is built and it is valid as long as the cameras do not move with respect to the gantry. For the navigation the acquisition of two X-ray images Ix0 and Ix90 with a precise orbital rotation, such that the acquired images correspond to the images of the gantry attached camera Ig0 and the orthogonal camera Io0 , has to be ensured. Therefore an image Io0 of the second camera is captured before the rotation. Using image overlay techniques of this captured image Io0 and a live video image Ig0→90 during the orbital rotation with the homography HIo0 ←Ig90 applied, we ensure that the first camera has the same position and orientation as the second camera before the rotation. Thus the X-ray image Ix90 we take from this position corresponds to the captured image Io0 . Furthermore, after precise rotation of the C-arm back to its original position, the orthogonal taken X-ray image Ix90 can be overlaid on the live image Io0 of the orthogonal mounted camera
22
J. Traub et al.
by applying the computed homography HIo0 ←Ix90 . The rotation back is ensured using combined X-ray and optical markers attached to the side of our surgical object that are visible in the X-ray image Ix90 and the image Io0 of the orthogonal camera. The acquisition of a second X-ray image Ix0 at position zero and the use of the homography HIg0 ←Ix0 enables the lateral control using the gantry mounted camera (see figure 3(a). The image Io0 of the orthogonal camera is used by an instrument tracking module (see figure 2.4). The estimated distal end of the instrument in the orthogonal image Ig0 is superimposed on the X-ray image Ix90 taken at 90 degree rotation (see figure 3(c)). 2.4
Instrument Tracking
The surgical tool is extended by three markers collinear arranged on the instrument axis. We use retro-reflective circular markers that are illuminated by an additional light source attached to the orthogonal camera. This setup results in the markers being seen by the orthogonal camera as bright ellipses, which can be easily detected by image thresholding. From the binary image all contours are extracted using the Intel OpenCV library. In a post-processing step we filter those contours having a low compactness value and those having a smaller area than a threshold (default values are 0.6 for the compactness and 50 pixels for the area). For all contours being retained by the filtering routine, the sub-pixel centroids are computed based on grayscale image moments. Finally those three contours yielding an optimal least squares line fitting are assumed to be the ones corresponding to our circular markers. Having three collinear markers detected in the 2D image plane, we are able to compute the position of the instrument tip. Given the 3D geometry of our instrument, i.e. the position of the distal end of the instrument with respect to the other three markers, we compute the tip of the instrument in the image based on the cross-ratio. cross =
Δx12 Δx23 Δy12 Δy23 d12 d23 = = d13 d24 Δx13 Δx24 Δy13 Δy24
(1)
Here dij are the distances between the markers i and j, respectively between a marker and the tool tip. Investigating the distances in x- and y-direction separately gives us Δxij and Δyij where Δx24 = |x2 − x4 | and Δy24 = |y2 − y4 | contain the unknown coordinates x4 and y4 of the instrument tip. Since the Xray image Ix90 is registered with the live video image Io0 of the second camera by HIo0 ←Ig90 , we know exactly the position of the tip in the X-ray image Ix90 taken at 90 degree rotation (see figure 3(c)).
3
Experiments and Results
First the feasibility of the system was tested on a spine phantom. We used a tracked awl, a pedicle probe and a T-handle to place pedicle screws. Using an
A Multi-view Opto-Xray Imaging System
23
(a) X-ray with misaligned (b) Camera 1 with mis- (c) Camera 2 with mismarkers. aligned markers. aligned markers.
(d) X-ray markers.
with
aligned (e) Camera 1 with aligned (f) Camera 2 with aligned markers. markers.
Fig. 2. The calibration phantom in the different imaging systems
(a) First camera for lateral po- (b) Second camera (c) Superimposition of depth sitioning. for depth tracking. tracking onto the X-ray image. Fig. 3. The navigation interface including the lateral positioning of the instrument and the depth control using cross ratio
orthogonal control X-ray image we could visually verify the accuracy of the depth navigation. In a cadaver experiment we placed eight pedicle screws (Universal Spine System USS, Synthes, Umkirch) with a diameter of 6.2 mm in four vertebrae of the thoracic and lumbar spine (Th12-L3). The surgical procedure was carried out
24
J. Traub et al.
in three steps using a pedicle awl to open the cortical bone, a pedicle probe to penetrate the pedicle, and a T-handle for screw implantation. For the guided procedure both augmented views, the one for 2D positioning (see figure 4(a)) and the one for depth control (see figure 4(b)), were used simultaneously. After aligning the optical axis of the C-arm imaging system with the desired direction of the pedicle screws, the acquisition of only two X-ray images was required for each pedicle screw. This is a considerable reduction of radiation compared to standard fluoro based procedure. The accuracy of the pedicle screw placement was verified by a postinterventional CT-scan ((see figure 4(c) and 4(d))) using a clinical scale proposed by Arand et al. [10]. Five pedicle screws were classified by an medical expert to be in group A, i.e. central screw position without perforation. The other three screws were classified to be in group B, i.e. lateral screw perforation within thread diameter. For none of the eight pedicle screws a medial perforation in direction of the spinal canal occurred.
(a) Tracking.
(b) Control X-ray.
(c) Sagittal CT.
(d) Transversal CT.
Fig. 4. Evaluation of the developed system using a control X-ray Ix90 and postinterventional CT data
4
Discussion
We have extended a real-time video augmented X-ray to a multi-view Opto-Xray imaging system. The previously proposed single camera augmentation system has proven to be efficient for trauma surgery and orthopedic applications where 3D did not matter, e.g. intramedullary-nail locking. The original system was extended by a second camera mounted in an orthogonal arrangement to the first camera. The second camera, thanks to a calibration and navigation protocol, enables applications for trauma surgery that are only possible at the moment using permanent fluoroscopic imaging or C-arm with 3D reconstruction capabilities and external tracking systems, both resulting in considerable increase of the radiation dose. Our newly developed system proved that it is possible to perform these procedures with the use of only two X-ray images and the assumption that the object does not move after the X-ray acquisition. If the object moves, simply another pair of X-rays has to be acquired. Using our proposed system and calibration procedure we are neither limited to exactly two cameras nor to a specific physical arrangement. First cadaver experiments demonstrated that the new system can be easily integrated into the clinical workflow while reducing the
A Multi-view Opto-Xray Imaging System
25
radiation dose compared to other methods. The observed accuracy during the experiments is clinically acceptable. Further work will compare quantified results with CT based and C-arm based standard navigation techniques. The invention and implementation of a system for real-time augmentation of orthogonal X-ray views of a surgery, opens the way for development of new C-arms with integrated 3D navigation capabilities with no further need for online calibration. Acknowlegments. Special thanks to Siemens Medical SP and Benjamin Ockert.
References 1. Navab, N., Mitschke, M., Bani-Hashemi, A.: Merging visible and invisible: Two camera-augmented mobile C-arm (CAMC) applications. In: IWAR, pp. 134–141 (1999) 2. Boszczyk, B.M, Bierschneider, M., Panzer, S., Panzer, W., Harstall, R., Schmid, K., Jaksche, H.: Fluoroscopic radiation exposure of the kyphoplasty patient. European Spine Journal 15, 347–355 (2006) 3. Synowitz, M., Kiwit, J.: Surgeon’s radiation exposure during percutaneous vertebroplasty. J. Neurosurg. Spine. 4, 106–109 (2006) 4. Siewerdsen, J.H., Moseley, D.J., Burch, S., Bisland, S.K., Bogaards, A., Wilson, B.C., Jaffray, D.A.: Volume ct with a flat-panel detector on a mobile, isocentric c-arm: Pre-clinical investigation in guidance of minimally invasive surgery. Medical Physics 32(1), 241–254 (2005) 5. Hayashibe, M., Suzuki, N., Hattori, A., Otake, Y., Suzuki, S., Nakata, N.: Surgical navigation display system using volume rendering of intraoperatively scanned ct images. Computer Aided Surgery 11(5), 240–246 (2006) 6. Stetten, G.D., Chib, V.: Magnified real-time tomographic reflection. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, Springer, Heidelberg (2001) 7. Fichtinger, G., Deguet, A., Masamune, K., Balogh, E., Fischer, G.S., Mathieu, H., Taylor, R.H., Zinreich, S.J., Fayad, L.M.: Image overlay guidance for needle insertion in ct scanner. IEEE Transactions on Biomedical Engineering 52(8), 1415– 1424 (2005) 8. Fischer, G.S., Deguet, A., Schlattman, D., Taylor, R., Fayad, L., Zinreich, S.J., Fichtinger, G.: Mri image overlay: Applications to arthrography needle insertion. In: Medicine Meets Virtual Reality (MMVR), vol. 14 (2006) 9. Heining, S.M., Wiesner, S., Euler, E., Mutschler, W., Navab, N.: Locking of intramedullary nails under video-augmented flouroscopic control: first clinical application in a cadaver study. In: Proceedings of CAOS, Montreal, Canada (2006) 10. Arand, M., Schempf, M., Fleiter, T., Kinzl, L., Gebhard, F.: Qualitative and quantitative accuracy of caos in a standardized in vitro spine model. Clin. Orthop. Relat. Res. 450, 118–128 (2006)
Towards 3D Ultrasound Image Based Soft Tissue Tracking: A Transrectal Ultrasound Prostate Image Alignment System Michael Baumann1,2 , Pierre Mozer1,3 , Vincent Daanen2 , and Jocelyne Troccaz1 1
Universit´e J. Fourier, Laboratoire TIMC, Grenoble, France; CNRS, UMR 5525; Institut National Polytechnique de Grenoble 2 Koelis SAS, 5 av. du Grand Sablon, 38700 La Tronche, France 3 La Piti´e-Salpˆetri`ere Hospital, Urology Department, 75651 Paris Cedex 13, France
[email protected]
Abstract. The emergence of real-time 3D ultrasound (US) makes it possible to consider image-based tracking of subcutaneous soft tissue targets for computer guided diagnosis and therapy. We propose a 3D transrectal US based tracking system for precise prostate biopsy sample localisation. The aim is to improve sample distribution, to enable targeting of unsampled regions for repeated biopsies, and to make post-interventional quality controls possible. Since the patient is not immobilized, since the prostate is mobile and due to the fact that probe movements are only constrained by the rectum during biopsy acquisition, the tracking system must be able to estimate rigid transformations that are beyond the capture range of common image similarity measures. We propose a fast and robust multiresolution attribute-vector registration approach that combines global and local optimization methods to solve this problem. Global optimization is performed on a probe movement model that reduces the dimensionality of the search space and thus renders optimization efficient. The method was tested on 237 prostate volumes acquired from 14 different patients for 3D to 3D and 3D to orthogonal 2D slices registration. The 3D-3D version of the algorithm converged correctly in 96.7% of all cases in 6.5s with an accuracy of 1.41mm (r.m.s.) and 3.84mm (max). The 3D to slices method yielded a success rate of 88.9% in 2.3s with an accuracy of 1.37mm (r.m.s.) and 4.3mm (max).
1 Introduction Computer-guidance for medical interventions on subcutaneous soft tissue targets is a challenging subject, since the target tracking problem is still not satisfactorily solved. The main difficulties are caused by the elasticity, mobility and inaccessibility of soft tissues. With 3D US a real-time volume imaging technology became available that provides enough spatial tissue information to make image-based tracking possible. Imagebased tracking is essentially a mono-modal image registration problem with a real-time constraint. The primary task is to find the physical transformation T in a transformation space T between two images of the same object. The choice of T depends on the underlying physical transformation (e.g. rigid, affine or elastic) and the requirements of the target application. An extensive review on registration methods is given in [1]. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 26–33, 2007. c Springer-Verlag Berlin Heidelberg 2007
Towards 3D Ultrasound Image Based Soft Tissue Tracking
27
Nowadays, research on mono-modal 3D US registration of soft tissue images focusses on rapid deformation estimation. Most studies in this domain, however make the implicit assumption that the rigid part of the transformation to estimate is either small or known. Confronted with combinations of large rigid transformations and elastic deformations, the proposed solutions fail without rigid pre-registration. For many clinical applications large rigid transformations can be avoided by immobilizing both the patient and the US probe. In the case of interventions without total anesthesia this however causes considerable patient discomfort. Moreover, it is sometimes impossible to fix the US probe, e.g. when the probe serves as a guide for surgical instruments. The respiratory and the cardiac cycle can be additional sources of tissue displacements. In all these cases it is necessary to identify the rigid part of the transformation before carrying out image-based deformation estimation. Estimation of large rigid transformations is basically a global optimization problem since common similarity measures exhibit search-friendly characteristics (e.g. convexity) only in a small region near the global solution. The computational burden of global optimization in a 6-D rigid transformation space is prohibitive for tracking tasks. [2, 3] propose to reduce the intra-interventional computation time of global searches by precomputing a feature-based index hash table. During intervention, similarity evaluation is replaced by computation of the geometric index followed by a fast data-base lookup. In the context of US image tracking, this approach has the disadvantage of relying on feature extraction, which often lacks robustness when confronted with partial target images, speckle and US shadows. Also, it cannot reduce the complexity of the optimization problem and pre-computation time is not negligible. Relatively few investigations involving 3D US image based tracking of soft tissues have been reported. In the context of respiratory gated radiation treatment, [4] acquire a localized 3D US reference image of the liver or the pancreas in breath-hold state and register it rigidly with the treatment planning CT volume. During therapy, localized US slices of the organ are continuously compared with the reference volume using image correlation to retrieve the planning position of the organ. In [5] real-time 3D US images of the beating heart are registered multimodally with a set of 4-D MR images covering the entire cardiac cycle. A localizer is used to initialize the spatial registration process while the ECG signal serves for temporal alignment. The authors achieve precise rigid registration in an overall computation time of 1 second with a mutual information based rigid registration algorithm. In both studies relative rigid movements between probe and target organ are limited to movements caused by the respiratory or cardiac cycles, which are predictable and repeatable to a certain extent. The target application of this work is 3D transrectal ultrasound (TRUS) prostate biopsy trajectory tracking. Today, prostate biopsies are carried out using 2D TRUS probes equipped with a guide for spring needle guns. With the current standard biopsy protocol, consisting typically of 12 regularly distributed samples, it is impossible to know the exact biopsy locations after acquisition, which makes precise biopsy-based tumor localization, quality control and targeted repeated biopsies impossible. A TRUSbased prostate tracking system would make it possible to project all sample locations into a reference image of the prostate and thus to identify the exact sampling locations.
28
M. Baumann et al.
Image-based prostate biopsy tracking is, however, challenging: (i) the gland moves and gets deformed under the pressure of the TRUS probe. (ii) The patient is neither immobilized nor under total anesthesia. Most patients move significantly during the biopsy procedure. (iii) Since the probe serves also to guide the rigidly attached needle, probe movements are important. Rotations around the principal probe axis of more than 180◦ and tilting of up to 40◦ are frequent. Also, the probe head wanders over the gland surface during needle placement, which leads to relative displacements of up to 3cm. The global search problem thus fully applies to prostate alignment: tracking a reference on a calibrated TRUS probe cannot solve the problem due to (i) and (ii), and it is not very success promising to minimize similarity measures on biopsy images using only fast down-hill optimizers because of (iii). In this study we propose a solution to the global search problem for TRUS prostate image tracking, which consists in a search space reduction using a probe movement model. We further identify an efficient intensitybased similarity measure for TRUS prostate images and describe a fast multi-resolution optimization framework. Finally, the robustness, accuracy, precision and performance of the method are evaluated on 237 prostate volumes from 14 patients.
2 Methods 2.1 A Framework for US Image-Based Tracking The purpose of a tracking system is to provide the transformation between an object in reference space and the same object in tracking space at a given moment. In the case of image-based tracking, the reference space is determined by the choice of a reference image to which all subsequently acquired images will be registered. In the case of 3D TRUS prostate biospies, it is convenient to acquire a 3D US volume as reference just some minutes before the intervention. Unfortunately, most currently available 3D US systems do not provide real-time access to volume data. They can, however, visualize two or three orthogonal 2D (o2D) slices inside the field of view of the probe in real-time. These slices can be captured using a frame-grabber and used for registration with a previously acquired reference volume [4, 5]. Note that compared to 2D US images, o2D planes deliver considerably more spatial information, which potentially makes 3D to o2D registration more robust than 3D to 2D registration. In this work we will evaluate both 3D to 3D and 3D to o2D registration for image-based tracking. Registration algorithms can be separated into two main classes: intensity-based and feature-based algorithms. As it is challenging to define robust and fast feature extraction algorithms for US images of the prostate, due to the low SNR of US images and the absence of clearly identifiable geometric features in the prostate, this study focuses on intensity-based approaches. Intensity-based measures are known for their robustness in presence of noise and partial image overlaps [1]. Image registration can be modeled as a minimization process of an image similarity measure that depends on a transformation T . There exist robust and fast algorithms for local minimization of image similarity measures. The condition for convergency to the target transformation Tˆ is that the optimizer starts from a point inside the capture range
Towards 3D Ultrasound Image Based Soft Tissue Tracking
29
of Tˆ [6]. However, the capture range of common intensity measures (e.g. the Pearson correlation coefficient (CC) or normalized mutual information (NMI)) is relatively small compared to the transformation space that can be observed for TRUS prostate biopsies. This problem can be attacked from two sides: the first approach is to extend the capture range by improving the similarity measure, while the second method consists in finding a point inside the capture range using a priori knowledge on the probe position. Several parts of the registration approach require information about the prostate location in the reference image. For our purpose it is sufficient to set an axis-aligned bounding box on the prostate boundaries in the reference image. The bounding box has to be defined by the clinician. No bounding box is needed for the tracking images. 2.2 Extending the Capture Range Similarity Measure: We chose CC as similarity measure since it yields a larger capture range than NMI for mono-modal US registration. Compared to sums of squared distances (SSD), it is insensitive to linear intensity transformations and is capable of detecting inverse correlations. Intensity shifts can occur due to probe pressure variation, while inverse correlations can be observed when evaluating transformations far from the physical solution, in particular for gradient magnitude images. Multi-resolution pyramid: Optimizing on coarse resolution levels of a gaussian pyramid yields some important advantages: coarse levels are statistical aggregates of the original image which are free of high-frequency noise, in particular speckle noise. Once the optimization on the coarsest level is terminated, the solution will be refined on denser levels, but from a considerably better starting point. This approach not only improves the characteristics of the similarity measure by reducing noise, but also considerably speeds up registration time, as most of the optimization can be performed on low-resolution images. Attribute-vector approach: The capture range can be extended by combining measures of different aspects of the images to be compared [7, 8]. Since there is a strong probability that the similarity measure produces for every aspect a significant minimum near the correct solution, it is possible to amplify and widen the capture range of the solution by combining the measures. Also, it is less likely that noise-related local minima are produced at identical locations, which makes it possible to flatten them out in a combined measure. For this study we chose to evaluate the image intensity and its gradient magnitude (I and J are the images to be compared): EnIJ (T ) := (1 − CC(I, J ◦ T )) · (1 − CC(||∇I||, ||∇J ◦ T ||))
(1)
To improve performance and since gradient intensities are highly random on noisy highresolution images, attribute vectors are only used on low resolution levels of the image pyramid. Panorama images: The pyramid-like form of the US beam and the fact that the probe also serves to guide the biopsy needle makes it unavoidable that the gland is often only partially imaged. Hence at least the reference image should contain the entire prostate;
30
M. Baumann et al.
otherwise the similarity measure may yield random results when the image overlap gets too small during registration. We therefore acquire three partial prostate volumes using the following protocol: the operator first acquires one image where the prostate is centered in the US beam, and then takes two additional images with rotations of 60◦ around the principal axis of the probe. Care is taken to avoid deformation and US shadows. The panorama image resulting from compounding these acquisitions finally serves as reference. 2.3 Finding a Point in the Capture Range Mechanical probe movement model. To estimate large transformations between images, it is necessary to find a point inside the capture range of the similarity measure. Regular sampling of a 6-D rigid transformation space using a very sparse grid size of 10 already requires 106 function evaluations, which results in an unacceptable computational burden. The physical constraints exerted by the rectum on probe movements, and the fact that the probe head always remains in contact with the thin rectal wall at the prostate location lead to the following assumptions: 1) the probe head is always in contact with the prostate membrane, 2) the most important rotations occur around the principal axis of the probe, and 3) all other rotations have a rotation point that can be approximated by a unique fixed point F Prect in the rectum. With these assumptions it is possible to define a probe movement model based on a prostate surface approximation, the probe position in the US image (which is known) and a rotational fixed point in the rectum. As shown in Fig. 1(a), the prostate surface is approximated by a bounding-box aligned ellipsoid. The ellipsoid is modeled using a 2D polar parameterization P RSurf (α,β) . The origin P RSurf (0,0) of the parameterization corresponds to the intersection of the line from the prostate center CP ro to F PRect . As illustrated in Fig. 1(b), P RSurf (α,β) implements assumption 1) by determining plausible US transducer positions on the prostate surface. Assumption 3) is satisfied by requiring that the principal probe axis must always pass through F PRect . Finally, a rotation about the principal probe axis implements assumption 2) and thus adds a third DOF (See Fig. 1(c)).
(a)
(b)
(c)
Fig. 1. Mechanical probe movement model in 2D: (a) shows the computation of the search model surface origin P RSurf (0, 0) from the prostate center CP ro and the (hypothetical) rectal probe fixed point F PRect . In (b), a 2D polar parameterization is used to determine a surface point P RSurf (α, β). The probe is then rotated and translated such that its US origin OU S coincides with P RSurf (α, β). In (c), the probe is rotated around its principal axis by an angle λ.
Towards 3D Ultrasound Image Based Soft Tissue Tracking
31
Systematic Exploration. The 3D subspace defined by the probe movement model is systematically explored using equidistant steps. To minimize the computational burden, systematic exploration is performed on the coarsest resolution level. Since the exploration grid points do not change during an intervention, it is possible to precompute and to store all resclices of the panoramic image necessary for the evaluation of the intensity measure. The rotational space around the principal axis of the probe is unconstrained (360◦), while tilting ranges are limited to the maximum value determined on test data, plus a security margin. The number of steps per dimension are also experimentally determined. The five best results of the systematic exploration are stored with the constraint that all transformations respect a minimum distance between each other. If two results are too close, only the best one is stored. Next, a local search using the Powell-Brent algorithm is performed only on the coarsest pyramid level for each of the five results. The best result of the five local searches is finally used as the start point for a multi-level local optimization. The last level of the final search can be chosen in function of the desired precision and computation time. Note that compared to a single multi-level local search, five local optimizations on the coarsest level are negligible in terms of computation time.
3 Experiments and Results The presented method was validated on 237 3D images of the prostate acquired during biopsy of 14 different patients. The imaging device was a GE 3D US Voluson 730 equipped with a volume-swept transrectal probe (GE RIC5-9). All images, except the images used for panorama image creation, were acquired immediately after a biopsy shot. Both 3D to 3D and 3D to o2D registration were evaluated. All registrations were carried out in a post-processing step. The o2D images used in the tests were not framegrabbed but reconstructed from 3D images. The image resolution was 2003 . The voxel side lengths varied from 0.33mm to 0.47mm. A five-level resolution pyramid was used for 3D to 3D registration; for 3D to o2D only four levels were used. The final multilevel search was carried out from the coarsest to the third-finest level for 3D to 3D, and to the second-finest level for 3D to o2D registration. A total of 12960 grid points on the movement model were explored during a search run. Registration was carried out on a Pentium 4 with 3GHz. To measure reproducibility and registration success, 10 registrations were carried out for each volume pair from slightly perturbated start points by adding noise of 2mm and 2◦ . This yielded 10 transformations Ti that approximate the unknown rigid transformation between the prostate in both volumes. The average transformation T of the Ti was computed with the method presented in [9]. The euclidean distance error iE = ||Ti · C − T · C||, with C being the image center, and the angular error iA , which corresponds to the rotation angle of Ti−1 ·T , were used to compute the root mean square (r.m.s.) errors E and A . A registration was considered successful if E < 2.0mm and A < 5 degrees, and if the result T was visually satisfactory when superimposing both volumes in a composite image (See Fig. 2(c)). Reconstruction accuracy evaluation was more difficult to implement since there is no straight-forward gold standard. In some images, the needle trajectories from previous biopsies were still visible. In these cases, the trajectories were manually segmented, and
32
M. Baumann et al.
the angular error between corresponding needle trajectories were used to evaluate rotational accuracy. Also, some patients had significant and clearly visible calcifications inside the prostate. The distances between segmented calcifications were used to determine the translational accuracy. Tab. 1 and Fig. 2 show the results of the evaluations. Table 1. Test results: Numbers in brackets indicate the number of evaluated registrations
Registration success Average computation time Angular precision A (reproducibility, r.m.s.) Euclidean precision E (reproducibility, r.m.s.) Needle trajectory reconstruction (r.m.s.) Needle trajectory reconstruction (max) Calcification reconstruction (r.m.s.) Calcification reconstruction (max)
3D-3D
3D-o2D
96.7% (237) 6.5s (237) 1.75◦ (229) 0.62mm (229) 4.72◦ (10) 10.04◦ (10) 1.41mm (189) 3.84mm (189)
87.7% (237) 2.3s (237) 1.71◦ (208) 0.47mm (208) 4.74◦ (9) 10.5◦ (9) 1.37mm (181) 4.30mm (181)
The overhead introduced by the systematic model-based exploration accounts for about 25% of 3D-3D , and for 35% of 3D-o2D registration time. The five optimizations on the coarsest level account for about 10% in 3D-3D, and for 20% in 3D-o2D. Panorama image pre-processing and pre-computation of the images for systematic exploration are performed before the intervention and require about one minute of computation time.
(a)
(b)
(c)
(d)
Fig. 2. Registration accuracy: (a) shows the target image, and (b) the aligned panorama image. In (c) both volumes are superimposed to illustrate registration accuracy for the urethra (arrow), and (d) illustrates the registration accuracy in the upper gland.
4 Discussion This study presents a fast and robust rigid registration framework for TRUS prostate images in the context of unconstrained patient movements, of only anatomy-constrained probe movements and of probe-induced prostate displacements. The algorithm yields reproducible results and acceptable accuracy for both 3D-3D and 3D-o2D registration. The success-rate of 3D-3D registration is very satisfactory, since all failures were either due to significant US shadows caused by only partial contact of the probe head with the rectal wall or by air bubbles in the US contact gel, or to an insufficient US depth
Towards 3D Ultrasound Image Based Soft Tissue Tracking
33
with the result that parts of the gland membrane are not visible in the images. In these cases the similarity measure fails because of missing information in the image, and an algorithmic remedy probably does not exist. Additional failures can be observed for 3Do2D registration, in particular for very small prostates, for which the coronal plane does not contain any prostatic tissue. 3D-o2D registration is also more sensible to poor image quality (e.g. low contrast), to large deformations and to partial prostate images (for which often only one plane contains prostatic tissue). Note that the presented algorithm is not very sensible to bounding box placement precision. Computation time of local searches could be accelerated using the GPU for image reslicing (which corresponds to approximatively 95% of the computational burden of a similarity measure evaluation), while further optimization of the systematic exploration would require parallelization of the evaluations. The presented algorithm in particular accurately registers the prostate membranes that are distant to the probe head, and the urethra. The relatively high angular r.m.s. error observed in the needle reconstruction study can be explained with probe-related local deformations that are particularly strong at the needle entry point. We are currently working on a biomechanical gland deformation model that allows for estimation of deformations to improve the accuracy of tissue registration near the probe head. Acknowledgements. This work was supported by grants from the Agence Nationale de la Recherche (TecSan program, SMI project), from the French Ministry of Industry (ANRT agency), from the French Ministry of Health (PHRC program, Prostate-echo project) and from Koelis S.A.S., France. The clinical data were acquired at the urology department of the Piti´e la Salp´etri`ere hospital, Paris.
References 1. Zitova, B., Flusser, J.: Image registration methods: a survey. Image and Vision Computing 21, 977–1000 (2003) 2. Gu’eziec, A.P., Pennec, X., Ayache, N.: Medical image registration using geometric hashing. IEEE Comput. Sci. Eng. 4(4), 29–41 (1997) 3. Eadie, L.H., de Cunha, D., Davidson, R.B., Seifalian, A.M.: A real time pointer to a preoperative surgical planning index block of ultrasound images for image guided surgery. In: SPIE 2004, San Jose, California, USA, vol. 5295, pp. 14–23 (2004) 4. Sawada, A., Yoda, K., Kokubo, M., Kunieda, T., Nagata, Y., Hiraoka, M.: A technique for noninvasive respiratory gated radiation treatment based on a real time 3D ultrasound image correlation: A phantom study. Medical Physics 31(2), 245–250 (2004) 5. Huang, X., Hill, N.A., Ren, J., Guiraudon, G., Boughner, D.R., Peters, T.M.: Dynamic 3D ultrasound and MR image registration of the beating heart. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 171–178. Springer, Heidelberg (2005) 6. Shekhar, R., Zagrodsky, V.: Mutual information-based rigid and nonrigid registration of ultrasound volumes. IEEE Trans. Med. Imag. 21(1), 9–22 (2002) 7. Shen, D., Davatzikos, C.: Hammer: hierarchical attribute matching mechanism for elastic registration. IEEE Trans. Med. Imag. 21(11), 1421–1439 (2002) 8. Foroughi, P., Abolmaesumi, P.: Elastic registration of 3D ultrasound images. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 83–90. Springer, Heidelberg (2005) 9. Gramkow, C.: On averaging rotations. Journal of Mathematical Imaging and Vision 15, 7–16 (2001)
A Probabilistic Framework for Tracking Deformable Soft Tissue in Minimally Invasive Surgery Peter Mountney1,2, Benny Lo1,2, Surapa Thiemjarus1, Danail Stoyanov2, and Guang Zhong-Yang1,2 1
Department of Computing, Institute of Biomedical Engineering Imperial College, London SW7 2BZ, UK 2
Abstract. The use of vision based algorithms in minimally invasive surgery has attracted significant attention in recent years due to its potential in providing in situ 3D tissue deformation recovery for intra-operative surgical guidance and robotic navigation. Thus far, a large number of feature descriptors have been proposed in computer vision but direct application of these techniques to minimally invasive surgery has shown significant problems due to free-form tissue deformation and varying visual appearances of surgical scenes. This paper evaluates the current state-of-the-art feature descriptors in computer vision and outlines their respective performance issues when used for deformation tracking. A novel probabilistic framework for selecting the most discriminative descriptors is presented and a Bayesian fusion method is used to boost the accuracy and temporal persistency of soft-tissue deformation tracking. The performance of the proposed method is evaluated with both simulated data with known ground truth, as well as in vivo video sequences recorded from robotic assisted MIS procedures. Keywords: feature selection, descriptors, features, Minimally Invasive Surgery.
1 Introduction Minimally Invasive Surgery (MIS) represents one of the major advances in modern healthcare. This approach has a number of well known advantages for the patients including shorter hospitalization, reduced post-surgical trauma and morbidity. However, MIS procedures also have a number of limitations such as reduced instrument control, difficult hand-eye coordination and poor operating field localization. These impose significant demand on the surgeon and require extensive skills in manual dexterity and 3D visuomotor control. With the recent introduction of MIS surgical robots, dexterity is enhanced by microprocessor controlled mechanical wrists, allowing motion scaling for reducing gross hand movements and the performance of micro-scale tasks that are otherwise not possible. In order to perform MIS with improved precision and repeatability, intra-operative surgical guidance is essential for complex surgical tasks. In prostatectomy, for example, 3D visualization of the surrounding anatomy can result in improved neurovascular bundle preservation N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 34–41, 2007. © Springer-Verlag Berlin Heidelberg 2007
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS
35
and enhanced continence and potency rates. The effectiveness and clinical benefit of intra-operative guidance have been well recognized in neuro and orthopedic surgeries. Its application to cardiothoracic or gastrointestinal surgery, however, remains problematic as the complexity of tissue deformation imposes a significant challenge. The major difficulty involved is in the accurate reconstruction of dynamic deformation of the soft-tissue in vivo so that patient-specific preoperative/intraoperative data can be registered to the changing surgical field-of-views. This is also the prerequisite of providing augmented reality or advanced robotic control with dynamic active constraints and motion stabilization. Existing imaging modalities, such as intra-operative ultrasound, potentially offer detailed morphological information of the soft-tissue. However, there are recognised difficulties in integrating these imaging techniques for complex MIS procedures. Recent research has shown that it is more practical to rely on optical based techniques by using the existing laparoscopic camera to avoid further complication of the current MIS setup. It has been demonstrated that by introducing fiducial markers onto the exposed tissue surface, it is possible to obtain dynamic characteristics of the tissue in real-time [1]. Less invasive methods using optical flow and image derived features have also been attempted to infer tissue deformation [2]. These methods, however, impose strong geometrical constraints on the underlying tissue surface. They are generally not able to cater for large tissue deformation as experienced in cardiothoracic and gastrointestinal procedures. Existing research has shown that the major difficulty of using vision based techniques for inferring tissue deformation is in the accurate identification and tracking of surface features. They need to be robust to tissue deformation, specular highlights, and inter-reflecting lighting conditions. In computer vision, the issue of reliable feature tracking is a well researched topic for disparity analysis and depth reconstruction. Existing techniques, however, are mainly tailored for rigid man-made environments. Thus far, a large number of feature descriptors have been proposed and many of them are only invariant to perspective transformation due to camera motion [3]. Direct application of these techniques to MIS has shown significant problems due to free-form tissue deformation and contrastingly different visual appearances of changing surgical scenes. The purpose of this paper is to evaluate existing feature descriptors in computer vision and outline their respective performance issues when applied to MIS deformation tracking. A novel probabilistic framework for selecting the most discriminative descriptors is presented and a Bayesian fusion method is used to boost the accuracy and temporal persistency of soft-tissue deformation tracking. The performance of the proposed method is evaluated with both simulated data with known ground truth, as well as in vivo video sequences recorded from robotic assisted MIS procedures.
2 Methods 2.1 Feature Descriptors and Matching In computer vision, feature descriptors are successfully used in many applications in rigid man-made environments for robotic navigation, object recognition, video data mining and tracking. For tissue deformation tracking, however, the effectiveness of
36
P. Mountney et al.
existing techniques has not been studied in detail. To determine their respective quality for MIS, we evaluated a total of 21 descriptors, including seven different descriptors extended to work with color invariant space using techniques outlined in [4]. Color invariant descriptors are identified by a ‘C’ prefix. Subsequently, a machine learning method for inferring the most informative descriptors is proposed for Bayesian fusion. Table 1 provides a summary of all the descriptors used in this study. For clarity of terminology, we define a feature as a visual cue in an image. A detector is a low level feature extractor applied to all image pixels (such as edges and corners), whereas a descriptor provides a high level signature that describes the visual characteristics around a detected feature. Table 1. A summary of the feature descriptors evaluated in this study ID
Descriptor
SIFT, CSIFT[4]
Scale Invariant Feature Transform, robust to scale and rotation changes.
GLOH, CGLOH
Gradient Location Orientation Histogram, SIFT with log polar location grid.
SURF[5], CSURF
Speeded Up Robust Features, robust to scale and rotation changes.
Spin, CSpin
Spin images, a 2D histogram of pixel intensity measured by the distance from the centre of the feature.
MOM, CMOM
Moment invariants computed up to the 2nd order and 2nd degree.
CC, CCC
Cross correlation, a 9×9 uniform sample template of the smoothed feature.
SF, CSF
Steerable Filters, Gaussian derivatives are computed up to the 4th order.
DI, CDI
Differential Invariants, Gaussian derivatives are computed up to the 4th order.
GIH[6]
Geodesic-Intensity Histogram, A 2D surface embedded in 3D space is used to create a descriptor which is robust to deformation.
CCCI [7]
Color Constant Color Indexing, A color based descriptor invariant to illumination which uses histogram of color angle.
BR-CCCI
Sensitivity of CCCI to blur is reduced using the approach in[8].
CBOR [9]
Color Based Object Recognition, a similar approach to CCCI using alternative color angle
BR-CBOR
Sensitivity of CBOR to blur is reduced using the approach in[8].
For tissue deformation tracking and surface reconstruction, it is important to identify which features detected in an image sequence represent material correspondence. This process is known as matching and depending on the feature descriptor used, matching can be performed in different ways, e.g., using normalized cross-correlation over image regions or by measuring the Euclidean or Mahalanobis distance between descriptors. 2.2 Descriptor Selection and Descriptor Fusion With the availability of a set of possible descriptors, it is important to establish their respective discriminative power in representing salient visual features that are suitable for subsequent feature tracking. To this end, a BFFS algorithm is used. It is a machine
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS
37
learning approach formulated as a filter algorithm for reducing the complexity of multiple descriptors while maintaining the overall inferencing accuracy. The advantage of this method is that the selection of descriptors is purely based on the data distribution, and thus is unbiased towards a specific model. The criteria for descriptor selection are based on the expected Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), and therefore the selected descriptor yield the best classification performance in terms of the ROC curve or sensitivity and specificity for an ideal classifier. Under this framework, the expected AUC is interpreted as a metric which describes the intrinsic discriminability of the descriptors in classification. The basic principle of the algorithm is described in [13]. There are three major challenges related to the selection of the optimal set of descriptors: 1) the presence of irrelevant descriptors, 2) the presence of correlated or redundant descriptors and 3) the presence of descriptor interaction. Thus far, BFFS has been implemented using both forward and backward search strategies and it has been observed that the backward elimination suffers less from interaction [10,11,13]. In each step of the backward selection approach, a descriptor di which minimizes the objective function D (di ) will be eliminated from the descriptor set G (k ), resulting in a new set G − {d i }. To maximize the performance of the model, the standard BFFS prefers the descriptor set that maximizes the expected AUC. This is equivalent to discarding, at each step, the descriptor that contributes to the smallest change in the expected AUC. (k )
D (d i ) = E AUC (G (k ) ) − E AUC (G (k ) − {d i })
(1)
where G = {d j , 1 ≤ j ≤ n − k + 1} denotes the descriptor set at the beginning of the iteration k, and E AUC () () is a function which returns the expected AUC given by its parameter. Since the discriminability of the descriptor set before elimination E AUC (G (k ) ) is constant regardless of d i , omitting the term in general does not affect the ranking of the features. While irrelevant descriptors are uninformative, redundant descriptors are often useful despite the fact that their presence may not necessarily increase the expected AUC. With the evaluation function described in Eq. (1), irrelevant and redundant descriptors are treated in the same manner since both contribute little to the overall model performance. In order to discard irrelevant descriptors before removing redundant descriptors, the following objective function has been proposed: (k )
Dr (di ) = − (1 − ω1 ) × E AUC (G (k ) − {di }) + ω1 × E AUC (di )
(2)
where ω1 is the weighting factor ranging between 0 and 1. This function attempts to to maximise the discriminability of the selected descriptor set while minimizing the discriminability of the eliminated descriptors. Once the relevant descriptors are derived by using BFFS, a Naïve Bayesian Network (NBN) is used in this study to provide a probabilistic fusing of the selected descriptors. The result can subsequently be used for feature matching, where two features are classified as either matching or not matching by fusing the similarity measurements between descriptors to estimate the posterior probabilities. The NBN was trained on a subset of data with ground truth.
38
P. Mountney et al.
3 Experiments and Results To evaluate the proposed framework for feature descriptor selection, two MIS image sequences with large tissue deformation were used. The first shown in Fig. 1a-e is a simulated dataset with known ground truth, where tissue deformation is modeled by sequentially warping a textured 3D mesh using a Gaussian mixture model. The second sequence shown in Fig. 2a-d is an in vivo sequence from a laparoscopic cholecystectomy procedure, where the ground truth data is defined manually. Both sequences involve significant tissue deformation due to instrument-tissue interaction near the cystic duct. Low level features for these images were detected using the Difference of Gaussian (DoG) and the Maximally Stable Extremal Regions (MSER) detectors. Descriptor performance is quantitatively evaluated with respect to deformation using two metrics, sensitivity - the ratio of correctly matched features to the total number of corresponding features between two images and 1-specificity - the ratio of incorrectly matched features to the total number of non corresponding features. Results are presented in the form of ROC curves in Fig. 1 and Fig. 2. A good descriptor should be able to correctly identify matching features whilst having a minimum number of mismatches. Individual descriptors use a manually defined threshold on the Euclidean distance between descriptors to determine matching features. This threshold is varied to obtain the curves on the graphs. Our fusion approach has no manually defined threshold and is shown as a point on the graph. Ground truth data is acquired for quantitative analysis. On the simulated data, feature detection was performed on the first frame to provide an initial set of feature positions. These positions were identified on the 3D mesh enabling ground truth to be generated for subsequent images by projecting the deformed mesh positions back into the image plane. To acquire ground truth for in vivo data, feature detection was performed on each frame and corresponding features were matched manually. The AUC graph shown in Fig. 1 illustrates that by effective fusion of descriptor responses, the overall descriminability of the system is improved, which allows better matching of feature landmarks under large tissue deformation. The derived AUC curve (bottom left) indicates the ID of the top performing descriptors in a descending order. It is evident that after CGLOH, the addition of further feature descriptors does not provide additional performance enhancement to the combined feature descriptors. The ROC graph (bottom right) shows the performance of the fused descriptor when the top n descriptors are used (represented as Fn ). Ideal descriptors will have high sensitivity and low 1-specificity. It is evident from these graphs that descriptor fusion can obtain a higher level of sensitivity than that of individual descriptors for an acceptable specificity. This enables the fusion technique to match more features and remain robust. The best performing descriptor is Spin and its sensitivity is 11.96% less than the fusion method for the specificity achieved with fusion. To obtain the same level of sensitivity using only the Spin descriptor specificity has to be compromised resulting in a 19.16% increase and a drop in robustness of feature matching.
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS
39
Fig. 1. (a-e) Example images showing the simulated data for evaluating the performance of different feature descriptors. The two graphs represent the AUC and the ROC (sensitivity vs. 1specificity) curves of the descriptors used. For clarity, only the six best performing descriptors are shown for the ROC graph.
Fig. 2. (a-d) Images form an in vivo laparoscopic cholecystectomy procedure showing instrument tissue interaction. The two graphs illustrate the AUC and the ROC (sensitivity vs. 1specificity) curves of the descriptors used. As in Fig. 1, only the six best performing descriptors are shown for the ROC graph for clarity.
For in vivo validation, a total of 40 matched ground truth features were used. Detailed analysis results are shown in Fig. 2. It is evident that by descriptor fusion, the discriminative power of feature description is enhanced. The fused method obtains a specificity of 0.235 which gives a 30.63% improvement in sensitivity over the best performing descriptor GIH at the given specificity. This demonstrates the fused descriptor is capable of matching considerably more features than any individual descriptor for deforming tissue. Detailed performance analysis has shown that for
40
P. Mountney et al.
MIS images, the best performing individual descriptors are Spin, SIFT, SURF, DIH and GLOH. Computing the descriptors in color invariant space has no apparent effect on discriminability but the process is more computationally intensive. By using the proposed Bayesian fusion method, however, we are able to reliably match significantly more features than by using individual descriptors. Fusion
Tissue deformation
SIFT
Tissue deformation
Fig. 3. 3D deformation tracking and depth reconstruction based on computational stereo by using the proposed descriptor fusion and SIFT methods for a robotic assisted lung lobectomy procedure. SIFT was identified by the BFFS as the most discriminative descriptor for this image sequence. Improved feature persistence is achieved by using the proposed fusion method, leading to improved 3D deformation recovery.
To further illustrate the practical value of the proposed framework, the fused descriptor was applied to 3D stereo deformation recovery for an in vivo stereoscopic sequence from a lung lobectomy procedure performed by using a daVinci® robot. The representative 3D reconstruction results by using the proposed matching scheme are shown in Fig. 3. Visual features as detected in the first video frame were matched across the entire image sequence for temporal deformation recovery. Features that were successfully tracked both in time and space were used for 3D depth reconstruction. The overlay of dense and sparse reconstructions with the proposed method indicates the persistence of features by using the descriptor fusion scheme. The robustness of the derived features in persistently matching through time is an important prerequisite of all vision-based 3D tissue deformation techniques. The results obtained in this study indicate the practical value of the proposed method in underpinning the development of accurate in vivo 3D deformation reconstruction techniques.
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS
41
4 Discussion and Conclusions In conclusion, we have presented a method for systematic descriptor selection for MIS feature tracking and deformation recovery. Experimental results have shown that the proposed framework performed favorably as compared to the existing techniques and the method is capable of matching a greater number of features in the presence of large tissue deformation. To our knowledge, this paper represents the first comprehensive study of feature descriptors in MIS images. It represents an important step towards more effective use of visual cues in developing vision based deformation recovery techniques. This work has also highlighted the importance of adaptively selecting viable image characteristics that can cater for surgical scene variations.
Acknowledgments The authors would like to thank Adam James for acquiring in vivo data and Andrew Davison for constructive discussions.
References 1. Ginhoux, R., Gangloff, J.A., Mathelin, M.F.: Beating heart tracking in robotic surgery using 500 Hz visual servoing, model predictive control and an adaptive observer. In: Proc. ICRA, pp. 274–279 (2004) 2. Stoyanov, D., Mylonas, G.P., Deligianni, F., Darzi, A., Yang, G.Z.: Soft-tissue motion tracking and structure estimation for robotic assisted MIS procedures. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 139–146. Springer, Heidelberg (2005) 3. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(10), 1615–1630 (2005) 4. Abdel-Hakim, A.E., Farag, A.A.: CSIFT: A SIFT Descriptor with Color Invariant Characteristics. In: Proc CVPR, pp. 1978–1983 (2006) 5. Bay, H., Tuytelaars, H., Van Gool, H.: SURF: Speeded Up Robust Features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, Springer, Heidelberg (2006) 6. Ling, H., Jacobs, D.W.: Deformation invariant image matching. In: Proc. ICCV, pp. 1466– 1473 (2005) 7. Funt, B.V., Finlayson, G.D.: Color constant color indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(5), 522–529 (1995) 8. van de Weijer, J., Schmid, C.: Blur Robust and Color Constant Image Description. In: Proc. ICIP, pp. 993–996 (2006) 9. Gevers, T., Smeulders, A.W.M.: Color Based Object Recognition. Pattern Recognition 32, 453–464 (1999) 10. Koller, D., Sahami, M.: Towards optimal feature selection. In: Proc. ICML, pp. 284–292 (1996) 11. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997) 12. Hu, X.P.: Feature selection and extraction of visual search strategies with eye tracking (2005) 13. Yang, G.Z., Hu, X.P.: Multi-Sensor Fusion. Body Sensor Networks, 239–286 (2006) 14. Thiemjarus, S., Lo, B.P.L., Laerhoven, K.V., Yang, G.Z.: Feature Selection for Wireless Sensor Networks. In: Proceedings of the 1st International Workshop on Wearable and Implantable Body Sensor Networks (2004)
Precision Targeting of Liver Lesions with a Needle-Based Soft Tissue Navigation System L. Maier-Hein1 , F. Pianka2 , A. Seitel1 , S.A. M¨ uller2 , A. Tekbas2 , M. Seitel1 , 1 2 I. Wolf , B.M. Schmied , and H.-P. Meinzer1 1
2
German Cancer Research Center, Div. Medical and Biological Informatics, Im Neuenheimer Feld 280, 69120 Heidelberg
[email protected] University of Heidelberg, Dept. of General, Abdominal and Transplant Surgery Im Neuenheimer Feld 110, 69120 Heidelberg, Germany
Abstract. In this study, we assessed the targeting precision of a previously reported needle-based soft tissue navigation system. For this purpose, we implanted 10 2-ml agar nodules into three pig livers as tumor models, and two of the authors used the navigation system to target the center of gravity of each nodule. In order to obtain a realistic setting, we mounted the livers onto a respiratory liver motion simulator that models the human body. For each targeting procedure, we simulated the liver biopsy workflow, consisting of four steps: preparation, trajectory planning, registration, and navigation. The lesions were successfully hit in all 20 trials. The final distance between the applicator tip and the center of gravity of the lesion was determined from control computed tomography (CT) scans and was 3.5 ± 1.1 mm on average. Robust targeting precision of this order of magnitude would significantly improve the clinical treatment standard for various CT-guided minimally invasive interventions in the liver.
1
Introduction
Computer tomography (CT) guided minimally invasive procedures in the liver such as tumor biopsy and thermal ablation therapy frequently require the targeting of hepatic structures that are subject to breathing motion. Unfortunately, commercially available navigation systems are still restricted to applications for rigid structures, such as the skull and the spine. To allow application of existing navigation techniques to the liver, several research groups (e.g. [1,2,3,4,5,6,7]) are investigating methods for compensating organ motion during soft tissue interventions; however, a common approach for assessing the accuracy of the navigation systems developed in this context has not yet been established. Zhang et al. [5] implanted tumor models containing radio-opaque CT contrast medium into a silicon liver model mounted on
The present study was conducted within the setting of “Research training group 1126: Intelligent Surgery” funded by the German Research Foundation (DFG).
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 42–49, 2007. c Springer-Verlag Berlin Heidelberg 2007
Precision Targeting of Liver Lesions
43
Fig. 1. Our soft tissue navigation concept. i) The navigation aids are inserted in the vicinity of the target. ii) A planning computed tomography (CT) scan is acquired. iii) The navigation aids are registered with the planning CT image, and the tracking coordinate system is registered with the CT coordinate system. iv) The navigation target point is chosen, and a trajectory is planned. v) A real-time deformation model is used to continuously estimate the position of the target point from the current positions of the optically tracked navigation aids, and a navigation display supports the targeting process accordingly.
a motion simulator and also conducted experiments in swine with agar injections as nodular targets. Kahn et al. [6] evaluated their navigation system in human cadavers, with three different targets: a predefined position within the ascending aorta, a calcified plaque in an artery, and the tip of a port catheter. Fichtinger et al. [8] conducted experiments in ventilated swine cadavers and used stainless-steel staples as targets. Several other studies were performed with rigid phantoms and did not incorporate organ shift or deformation (e.g. [3,4]). In a previous report [7], we introduced a needle-based navigation system for minimally invasive interventions in the liver, in which a real-time deformation model is used to estimate the position of a navigation target point continuously from a set of optically tracked navigation aids (Fig. 1). The accuracy of tracking, CT registration, and target position estimation throughout the breathing cycle have already been evaluated [7,9]. We have also investigated suitable visualization schemes to support soft tissue targeting procedures in cooperation with clinicians [10]. In this study, we assessed the overall targeting accuracy of the system and present a general workflow for evaluating the performance of a liver navigation system in a realistic setting.
2
Material and Methods
Our approach for assessing the targeting precision of our liver navigation system is based on simulation of the clinical liver biopsy workflow for porcine livers mounted onto a respiratory motion simulator. We used injected agar nodules as tumor models and determined the targeting error from control CT scans. The following sections describe the workflow in detail and present the experimental conditions used in this study.
44
L. Maier-Hein et al.
a)
b)
Fig. 2. Agar nodule in a porcine liver (a) and in a control CT image (b)
2.1
Workflow
Each targeting procedure comprises four steps: preparation, trajectory planning, registration, and navigation, as well as a post-processing procedure. While the preparation step is conducted only once for each liver, the trajectory must be planned separately for each lesion, and the remaining steps have to be repeated for each trial. The evaluation procedure was designed specifically for our navigation system but could readily be adapted for other navigation methods. The detailed workflow used for this study was as follows: 1. Preparation: We prepared each porcine liver according to the following procedure: (a) Based on the method proposed by Zhang et al. [5], a 5% agar dilution was prepared and mixed with contrast agent (1:15 v/v dilution). (b) Three to four agar nodules of volume 2 ml were then injected into the liver (Fig. 2a). In case of a spherical lesion, a volume of 2 ml corresponds to a diameter of approximately 1.5 cm. R plate) (c) The liver was sewn to the diaphragm model (i.e., the Plexiglas of the motion simulator introduced in [9] (Fig. 3). (d) Two 5 Degrees-of-Freedom (5DoF) navigation aids [7] were inserted into the liver (“diagonal arrangement”, Fig. 4b). (e) A planning CT scan of the motion simulator with the integrated porcine liver was acquired (Somatom Sensation 16 multidetector row scanner; Siemens, Erlangen, Germany). A fine resolution (0.75 mm slices) was necessary because our evaluation relies on accurate computation of the center of gravity of the agar nodule in both the planning CT and the control CT. (f) The motion simulator was used to simulate several breathing cycles (cranio-caudal displacement of the liver ≈ 15 mm [9]) reflecting the fact that the patients cannot hold their breaths between acquisition of the planning CT and registration.
Precision Targeting of Liver Lesions
45
Fig. 3. Schematic view of the respiratory liver motion simulator
2. Trajectory planning: For each lesion, we planned a trajectory in the CT image as follows: (a) The tumor was segmented semi-automatically on the basis of the graphcut algorithm [11]. (b) The navigation target point was set to the center of gravity of the segmented tumor. (c) An insertion point was chosen on the skin. 3. Registration: On the basis of the planned trajectory, we performed the initial registration: (a) The navigation aid models were registered with the planning CT image by the semi-automatic algorithm described in [9]. (b) The tracking coordinate system was registered with the CT coordinate system. For this purpose, we used the optical markers on the navigation aids as fiducials to compute a landmark-based rigid transformation as described in [7]. 4. Navigation: We used an optically tracked applicator to target a given agar nodule with the navigation system. The targeting procedure was conducted at end-expiration because it represents the natural state of the motion simulator (with the artificial lungs relaxed). As we performed gated experiments and only two navigation aids were utilized for motion compensation we chose a rigid deformation model [9]. A navigation monitor provided the visualization for the targeting process: (a) A two-dimensional projection view and a tool tip camera guided the user through the three steps tip positioning, needle alignment, and needle insertion as described in [10]. (b) Once the target was reached, the current position of the applicator was recorded. Then, the tool was released and its position was recorded again. The resulting tip “offset” was stored in image coordinates. This step was necessary because of the lack of tissue between the skin of the motion simulator (the foam) and the liver (Fig. 4b); once the applicator was released, the elastic skin relaxed and potentially pulled the tool several millimeters out of the liver.
46
L. Maier-Hein et al.
a)
b)
Fig. 4. Navigation scenario: (a) Experimental setup for the targeting procedure, and (b) reconstructed three-dimensional view showing the liver (brown) with four injected agar nodules (yellow), the inserted applicator (green), the two navigation aids (blue R plate as diaphragm model (light blue), the artificial and turquoise), the Plexiglas skin (beige), the insertion point on the skin (white), and the target point (red)
5. Post-processing: The targeting accuracy was determined with a control CT: (a) A CT scan was acquired with the same settings as for the planning CT. (b) The tumor in the control CT image was segmented semi-automatically with the graph-cut algorithm [11]. (c) The navigation target point was set to the center of gravity of the segmented tumor as reference. (d) The applicator model was registered with the control CT image by the semi-automatic algorithm described in [7]. (e) The position of the applicator was corrected by the offset computed in the navigation step. (f) The distance between the computed target point and the (corrected) position of the applicator tip was recorded as the CT targeting error CT . 2.2
Experimental Conditions
In order to determine the overall targeting error of our navigation system, one technician (S1) and one fourth-year medical student (S2) conducted 20 targeting procedures in 10 tumor lesions following the workflow described above. Each participant simulated one biopsy from each lesion (Fig. 4a), and we recorded the following errors: – the fiducial registration error (FRE) which is the mean distance between the optical markers in image coordinates and the transformed optical markers originally located in tracking coordinates as described in [9]. – the virtual targeting error virtual , which is defined as the final distance between the applicator tip (given by the tracking system) and the estimated target point position (according to the deformation model). This error results
Precision Targeting of Liver Lesions
47
primarily from an inaccurate instrument insertion and depends crucially on the experience of the user. – the CT targeting error CT defined in section 2.1 (post-processing). It includes the registration error, the target position estimation error of the system, the tracking error, and the instrument insertion error. In addition, it is sensitive to changes in the applicator position between the instrument insertion step and the CT acquisition as discussed below.
3
Results
Our navigation system was successfully applied for simulating 20 liver biopsies according to the workflow described above. The applicator trajectory was generally non-parallel to the CT scanning plane, and the mean distance between the insertion point and the target point (±SD) was 11.6 ± 1.0 cm. Table 1. Virtual targeting error, virtual , and CT targeting error, CT , for participant S1, participant S2, and both participants (S1,S2) in mm. The mean error (μ), the standard deviation (σ), the root-mean-square (RMS ) error, the median error (median), and the maximum error (max ) for the entire set of lesions are listed. virtual (S1) virtual (S2) virtual (S1,S2) CT (S1) CT (S2) CT (S1,S2) μ±σ 0.5 ± 0.3 1.1 ± 1.1 0.8 ± 0.8 2.8 ± 0.6 4.1 ± 1.1 3.5 ± 1.1 0.6 1.5 1.1 2.9 4.3 3.6 RMS 0.4 0.7 0.6 3.0 4.2 3.3 median 1.3 4.0 4.0 3.8 5.4 5.4 max
The lesions were successfully hit in all trials with a mean fiducial registration error (±SD) of 0.6 ± 0.2 mm for computation of the coordinate transformation. The mean final distance CT μ (S1, S2) between the applicator tip and the center of gravity of the segmented agar nodule was 3.5 ± 1.1 mm averaged over all trials (Table 1). If we regard the first trial of subject S2 an outlier (virtual : 4.0 mm) and exclude it from consideration, the mean virtual targeting error was of the same order of magnitude for both participants (< 1 mm). The mean CT targeting error was, however, significantly larger for S2 (4.1±1.1 mm) than for S1 (2.8±0.6 mm). In addition, the virtual targeting error virtual estimated with our navigation system was generally significantly smaller than CT , averaging only 0.8±0.8 mm.
4
Discussion
We assessed the targeting precision of a novel soft tissue navigation system and obtained a mean error of 3.5 ± 1.1 mm. The proposed evaluation approach has three key features. First, we use agar nodules mixed with contrast agent as targets, as they are clearly distinguishable
48
L. Maier-Hein et al.
from the surrounding liver tissue and can thus be segmented easily. In addition, they can be prepared such that they resemble real tumors in terms of shape and size. A second key feature is the utilization of the motion simulator as body model allowing us to model organ movement due to respiration, the most challenging problem in soft tissue interventions. Finally, the evaluation is performed in-vitro allowing us to perform experiments in moving organs, without recourse to animal experiments, which are time-consuming and expensive. To our knowledge, we are the first to combine in-vitro experiments with simulation of respiratory motion. The main drawback of our evaluation approach is the suboptimal fixation of the applicator in the body model. In our experience, small movements of the tool can occur relatively easily once it has been released, because it is held in position only by a layer of foam, several millimeters of (elastic) liver tissue and the relatively soft agar nodule itself (Fig. 4b). In other words, there is no assurance that the applicator will not shift further after the offset correction which potentially leads to inaccurate determination of the final applicator position and hence to an inaccurate error calculation. We consider that the large deviation between the and the CT targeting error CT can be attributed virtual targeting error virtual μ μ to this phenomenon. Similarly, we consider that the relatively large difference was due to inaccurate determinabetween the two observers with regard to CT μ tion of the applicator tip offset. The technician (S1), who was more experienced in use of the system, released the applicator very carefully after each targeting and calculated the offset correction only after ensuring that the applicator had assumed its final position and showed no more movement. We assume that the other participant (S2) conducted the process less carefully, causing a less accurate offset computation. In order to overcome these limitations, we propose use of a real biopsy needle as the applicator and marking of the final tip position with injected material. It is worth noting, that the navigation aids were better affixed within the tissue than the instrument because they were generally inserted considerably deeper into the liver (Fig. 4) and were less effected by the resilience of the foam. Since the same planning CT scan was used for all trials in one liver and the axes of the needles were nonparallel to each other, a shift of the navigation aids during one targeting procedure would have increased the registration error of the next trial. We obtained a very low FRE of only 0.6 mm on average, which suggests that the fixation of the navigations aids was sufficient. Moreover, the CT targeting error did not increase over time. To avoid problems related to this issue, however, we propose attaching the navigation aids to the skin. Despite the technical problems discussed above, our accuracy is higher than that published in related work. Zhang et al. [5] reported a success rate of 87.5% (n = 16) in a silicon liver mounted on a motion simulator and a median targeting error of 8.3 ± 3.7 mm (n = 32) in swine. Other groups obtained mean errors of 8.4 ± 1.8 mm (n = 42) in human cadavers [6] and 6.4 ± 1.8 (n = 22) in ventilated swine [8]. We evaluated the targeting precision of our needle-based soft tissue navigation system in-vitro and obtained a mean error of 3.5 ± 1.1 mm. Our clinical
Precision Targeting of Liver Lesions
49
colleagues have commented that a robust targeting precision of this order of magnitude would improve the treatment standard for CT-guided minimally invasive interventions in the liver dramatically. In order to advance clinical application of our navigation method, we are currently planning experiments in swine.
References 1. Schweikard, A., Glosser, G., Bodduluri, M., Murphy, M.J., Adler, J.R.: Robotic motion compensation for respiratory movement during radiosurgery. Comp. Aid. Surg. 5, 263–277 (2000) 2. Khamene, A., Warzelhan, J.K., Vogt, S., Elgort, D., Chefd’Hotel, C., Duerk, J.L., Lewin, J.S., Wacker, F.K., Sauer, F.: Characterization of internal organ motion using skin marker positions. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3217, pp. 526–533. Springer, Heidelberg (2004) 3. Nagel, M., Schmidt, G., Petzold, R., Kalender, W.A.: A navigation system for minimally invasive CT-guided interventions. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 33–40. Springer, Heidelberg (2005) 4. Nicolau, S., Garcia, A., Pennec, X., Soler, L., Ayache, N.: An augmented reality system to guide radio-frequency tumour ablation. Comput. Animat. Virt. W 16, 1–10 (2005) 5. Zhang, H., Banovac, F., Lin, R., Glossop, N., Wood, B.J., Lindisch, D., Levy, E., Cleary, K.: Electromagnetic tracking for abdominal interventions in computer aided surgery. Comp. Aid. Surg. 11(3), 127–136 (2006) 6. Khan, M.F., Dogan, S., Maataoui, A., Wesarg, S., Gurung, J., Ackermann, H., Schiemann, M., Wimmer-Greinecker, G., Vogl, T.J.: Navigation-based needle puncture of a cadaver using a hybrid tracking navigational system. Invest. Radiol. 41(10), 713–720 (2006) 7. Maier-Hein, L., Maleike, D., Neuhaus, J., Franz, A., Wolf, I., Meinzer, H.P.: Soft tissue navigation using needle-shaped markers: Evaluation of navigation aid tracking accuracy and CT registration. In: SPIE Medical Imaging 2007: Visualization, Image-Guided Procedures, and Display, vol. 6509, p. 650926 (2007) 8. Fichtinger, G., Deguet, A., Fischer, G., Iordachita, I., Balogh, E., Masamune, K., Taylor, R.H., Fayad, L.M., de Oliveira, M., Zinreich, S.J.: Image overlay for CTguided needle insertions. Comp. Aid. Surg. 10(4), 241–255 (2005) 9. Maier-Hein, L., M¨ uller, S.A., Pianka, F., M¨ uller-Stich, B.P., Gutt, C.N., Seitel, A., Rietdorf, U., Meinzer, H.P., Richter, G., Schmied, B.M., Wolf, I.: In-vitro evaluation of a novel needle-based soft tissue navigation system with a respiratory liver motion simulator. In: SPIE Medical Imaging 2007: Visualization, Image-Guided Procedures, and Display, vol. 6509, p. 650916 (2007) 10. Seitel, A., Maier-Hein, L., Schawo, S., Radeleff, B.A., Mueller, S.A., Pianka, F., Schmied, B.M., Wolf, I., Meinzer, H.P.: In-vitro evaluation of different visualization approaches for computer assisted targeting in soft tissue. In: CARS. Computer Assisted Radiology and Surgery (to appear, 2007) 11. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE T Pattern Anal 26(9), 1124– 1137 (2004)
Dynamic MRI Scan Plane Control for Passive Tracking of Instruments and Devices S.P. DiMaio1 , E. Samset1,2 , G. Fischer3 , I. Iordachita3, G. Fichtinger3 , F. Jolesz1 , and C.M. Tempany1 1
Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA 2 Oslo University, Norway 3 Johns Hopkins University, Baltimore, MA, USA Abstract. This paper describes a novel image-based method for tracking robotic mechanisms and interventional devices during Magnetic Resonance Image (MRI)-guided procedures. It takes advantage of the multi-planar imaging capabilities of MRI to optimally image a set of localizing fiducials for passive motion tracking in the image coordinate frame. The imaging system is servoed to adaptively position the scan plane based on automatic detection and localization of fiducial artifacts directly from the acquired image stream. This closed-loop control system has been implemented using an open-source software framework and currently operates with GE MRI scanners. Accuracy and performance were evaluated in experiments, the results of which are presented here.
1
Introduction
Magnetic Resonance Imaging (MRI) is finding increased application for guiding clinical interventions, particularly percutaneous needle- and catheter-based procedures, due to its high soft-tissue contrast and multi-parametric imaging capabilities. In particular, applications of targeted ablation, biopsy and brachytherapy have been demonstrated for the management of breast and prostate cancer [1]. A variety of positioning devices and stereotactic templates have been developed for image-guided needle placement and efforts are currently underway to develop robotic assistants and focused ultrasound delivery systems for precise in-bore targeted therapy. Accurate calibration, tracking and navigation of such devices—as well as needles and catheters—are essential. This paper describes a novel image-based method for instrument tracking that makes use of the multi-planar imaging capabilities of MRI to dynamically servo the scan plane for optimal device localization and visualization1 . In prior work, device tracking in the MRI environment has been achieved using either active or passive markers. A variety of active tracking approaches have been presented in the past [2,3,4,5]. While typically fast and accurate, such methods can have drawbacks such as line-of-sight limitations, heating, sensitive 1
This publication was made possible by NIH grants R01-CA111288 and U41RR019703. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 50–58, 2007. c Springer-Verlag Berlin Heidelberg 2007
Dynamic MRI Scan Plane Control
51
tuning, complex calibration and expense. A well known active approach tracks small receiver coils using the MRI scanner’s readout gradients aligned along the coordinate axes [3,4]. Krieger et al. discuss their use of such active tracking coils for navigating a robotic device in [6]. Passive tracking approaches, in which devices (e.g., needles, catheters, and robotic guidance mechanisms) are detected and tracked directly from the images, provide an alternative solution [7, 8, 6]. The advantages of an image-based passive tracking approach are that needles and devices do not require expensive instrumentation, and that both the interventional device and the patient’s anatomy are observed together in the same image space, thus eliminating a critical calibration step. There is, however, a compromise between imaging speed and quality that can degrade localization accuracy and reliability. In addition, MRI systems have been designed primarily for diagnostic imaging and are typically not equipped for closed-loop adaptive imaging that is often required for interventional navigation and guidance. Contemporary MRI hardware and software designs are optimised for sequential batch imaging prescriptions, which create awkward interventional workflows. As a result, most clinical MRI-guided procedures follow an iterative imaging approach in which the patient is moved in and out of the scanner for imaging and intervention (e.g., see [6] and references). In this work we demonstrate a general-purpose image-based approach for localizing devices in the bore of the magnet in order to enable simultaneous imaging and navigation for true image-guided intervention. This technology has been implemented using an open-source software framework and is currently available for use in GE MRI scanners. It is currently being used to develop a system for robot-assisted navigation of MRI-guided prostate biopsy and brachytherapy [9], as described in greater detail in our companion paper [10].
2
Methods
The concept of closed-loop scan-plane control for device localization is demonstrated here using a fiducial frame constructed from acrylic plastic, with seven embedded glass cylinders filled with MR-visible fluid (Beekley, Bristol, CT). Each of the seven cylinders forms a 3mm diameter, 60mm long MR-visible line fiducial, with the entire Z-frame arranged as shown in Figure 1. The position and orientation of the Z-frame can be computed from a single intersecting 2D image—based on the coordinates of the seven fiducial points observed in the image—as described in [11], where a similar fiducial frame was used in CT. The Z-frame was placed in an MRI scanner (GE Signa EXCITE 3T), on a rotating platform with marked angle gradations, initially aligned at the isocentre. A continuous real-time pulse sequence was used to image a crosssection of the frame (Fast Gradient Recalled Echo, TR=14.1ms, TE=5.7ms, flip angle=45◦, bandwidth=31.25kHz, matrix=256×256, NEX=1, FOV=16cm, slice thickness=2mm). The intersection points of the seven line fiducials—visible as seven bright disks (see Figure 1)—were automatically detected by a fast k-space template matching algorithm and used to compute the position and orientation
52
S.P. DiMaio et al.
of the Z-frame relative to the scan plane. The frame was then manually rotated on the platform, while a closed-loop control system continuously and automatically adjusted the position and orientation of the imaging plane to align with the centre of the fiducial frame. This is illustrated in the series of images shown in Figure 2.
Fig. 1. (a) The Z-frame with 7 MR-visible line fiducials. (b) A sample MR image of a cross section of the Z-frame.
Fig. 2. The imaging plane is adapted to automatically follow the motion of the fiducial frame in the scanner
System Architecture The software architecture for this system is shown in Figure 3. The MR scanner acquires 2D images continuously and transfers k-space data to a Raw Data Server that allows us to access image data in real-time. The raw data is passed through an Image Reconstruction algorithm (at present, the image reconstruction algorithm does not account for gradient warping) before being processed by
Dynamic MRI Scan Plane Control
53
the Image Controller, which consists of algorithms for automatic fiducial detection, frame localization and scan plane control, as described below. The Image Controller passes images to a user interface for visualization and also closes the loop with the MRI scanner via the RSP Server, which provides the means to dynamically update pulse sequence parameters (RSPs), including those that determine scan plane position and orientation. Data interfaces between the tracking
Fig. 3. System architecture
application and the imaging system were developed using an extension of the OpenTracker framework [12] (these interfaces are indicated by “OT” annotations in Figure 3). This framework provides mechanisms for dynamic event passing between distributed computing systems over the MRI host’s TCP/IP network. The image visualization and graphical user interface was implemented using the Slicer Image Guided Navigator (SIGN), developed by Samset et al. [13]. Both OpenTracker and The SIGN are open-source software packages. Fiducial Detection and Localization The closed-loop fiducial detection and localization algorithm is detailed in the block diagram shown in Figure 4. Fiducials are detected by fast template matching (similar to that used in [7]), where the template mask m(u, v) is convolved with the latest MR image ii (u, v). In the spatial frequency domain (i.e., k-space) this corresponds to multiplication of M (ku , kv ) and Ii (ku , kv ), computed by Fast Fourier Transform of m and ii respectively. Fiducial matches are detected as local maxima in f (ku , kv ), with subpixel interpolation of peak coordinates (quadratic interpolation). The resulting fiducial pattern is validated against known geometric contraints of the Z-frame and the seven fiducial point matches are ordered as shown in Figure 1. The ordered set of fiducial point coordinates P are then used to compute the 6-DOF pose of the Z-frame with respect to the plane of image ii (for details of a closedform solution, see [11, 14]). Finally, the computed frame position/orientation is used to compute the new scan plane (i.e., for image ii+1 ) that passes through the centre of the Z-frame. Tracking accuracy and performance were measured in two sets of experiments: (a) tracking of freehand motion and (b) a calibrated accuracy study. Tracking of Freehand Motion The Z-frame was placed off-center on the rotating platform inside the scanner bore. With the closed loop tracking algorithm running, as shown in Figures 3
54
S.P. DiMaio et al.
and 4, the platform was manually rotated by approximately 10◦ increments from 0 − 70◦ . Accuracy Study In order to measure tracking accuracy, the Z-frame was fixed stationery within the scanner while varying the imaging position and orientation over three axes, namely x, z, and θ, as defined in Figure 1. For x- and z-axis displacements the images were aligned axially (i.e., with the x-axis in-plane and the zaxis normal to the imaging plane). In-plane motion (along the x-axis) was measured at approximately 1mm increments; out-of-plane motion (along the z-axis) was measured at approximately 2mm increments; rotational motion θ was measured at roughly 2◦ increments. For each axis, ten distinct positions/orientations were imaged, each ten times for a Fig. 4. Algorithm for detecting and localizing total of one hundred samthe fiducial frame ples per axis. For each sample, the known image position/orientation was compared against the estimated Z-frame position/orientation, computed with respect to the image. The position of the Zframe was initialized from an axial baseline image, such that all subsequent displacements were expressed with respect to this baseline.
3
Results
The closed-loop scan plane control system was able to follow continuous motion of the Z-frame, provided that it did not move out of the imaging plane or cause significant motion artifact during each 2-3s image acquisition period. Figure 5 shows the rotational motion component (θ) measured during the freehand motion experiment. Tracking performance was noticeably degraded for angles greater than 40 − 50◦ .
Dynamic MRI Scan Plane Control
55
Fig. 5. Dynamically controlled image plane orientation θ during freehand manipulation of the Z-frame. Closed-loop imaging does not currently include GradWarp correction.
Results of the accuracy study are shown in Figure 6. The detection of inplane motion (along the x-axis) and rotational motion (θ) are shown in plots (a) and (b). In each case the scan plane postion/orientation and estimated displacement/rotation are superimposed. Error statistics are summarized in Table 1.
Fig. 6. Accuracy study results: (a) detection of out-of-plane motion along the z-axis, (b) detection of rotational motion about θ. GradWarp correction included, results summarized in Table 1. Table 1. Z-frame Localization Accuracy Axis In-plane (x) Out-of-plane (z) Rotation (θ)
Average Error Standard Deviation RMS Error Samples 0.017mm 0.026mm 0.031mm2 N = 100 0.089mm 0.11mm 0.14mm2 N = 100 ◦ ◦ 0.28 0.23 0.37◦ 2 N = 90
56
4
S.P. DiMaio et al.
Discussion and Conclusions
Experimental results demonstrate surprisingly good sub-millimeter and subdegree accuracy when tracking the Z-frame from a single 2D image. While it is not quantified in this study, localization accuracy depends upon the pixel size, field of view which in our experiments is image dimension = 0.625mm. Real-time tracking was noticeably degraded for large scan plane angles with respect to the axial plane, presumably due to the absence of gradient warp (GradWarp) correction. This limitation will be addressed in future work. The results of the accuracy study— listed in Table 1—were measured using a non-real-time pulse sequence in order to include GradWarp correction. This highlights one of the major challenges experienced in such research work, namely the absence of MRI pulse sequences and data flow mechanisms optimized for closed-loop interventional navigation. GE product pulse sequences were used without modification; however, custom interfaces were designed to interact with the raw data server and RSP modification mechanism, neither of which are supported GE products. The interfaces implemented in this work make use of open-source software architectures and are now publically available (http://www.ncigt.org/sign/download). At the time of publication, this interface is available only for GE MRI scanners; however, due to the modular architecture of its design, interface drivers for other imaging systems can be integrated without significantly affecting the overall control architecture. The OpenTracker interfaces shown in Figure 3 constitute a complete abstraction of hardware; therefore, this software framework can easily be adapted to MRI systems from other vendors. Plans are already underway for this extension. A localization approach that does not rely upon additional instrumentation, and that is intrinsically registered to the imaging coordinate frame is highly desirable for navigating instruments in MRI-guided interventions. This work demonstrates that it is possible to use passive fiducial detection in 3T MRI images for dynamically locating and navigating targeted interventional devices with sufficient accuracy. The approach is primarily feasible for tracking relatively slow motion, as is the case with most clinical robotic assistants. In such applications [10], we are able to control motion in order to synchronize with image-based localization and tracking. However, the approach is not yet suitable for tracking rapid motions, such as may be found in free-hand applications. We are working to accelerate the image update rate—thereby reducing the effect of motion artifact—by means of parallel imaging techniques. In future work, we will develop custom pulse sequences that are further optimized for real-time tracking of fiducials and needles, by taking advantage of parallel imaging methods. This will help to reduce the effect of motion artifact and to increase the field of view. In this work, we did not explore whether localization accuracy is consistent throughout the imaging field of view. This may be an issue when imaging fiducials relatively far from the iso-center of the magnet, and needs to be studied further. The fiducial frame will be reduced in size and integrated with a robotic needle driver for targeted MRI-guided needle biopsy and brachytherapy applications
Dynamic MRI Scan Plane Control
57
[10]. The minimum size of the fiducial frame is governed by image resolution, signal-to-noise requirements, the maximum tolerable motion between imaging frames, and the number of degrees of freedom to be measured. For the application described in [10] the current fiducial frame design is conservative and will be made more compact. In addition, we are extending the approach for the tracking and visualization of needle artifacts [8]. Finally, new standards and open-interfaces for scanner control and adaptive real-time imaging are required to move MRI beyond its standing as a largely diagnostic imaging modality, in order to enable promising new interventional applications.
References 1. D’Amico, A.V., Tempany, C.M., Cormack, R., Hata, N., Jinzaki, M., Tuncali, K., Weinstein, M., Richie, J.P.: Transperineal magnetic resonance image guided prostate biopsy. Journal of Urology 164(2), 385–387 (2000) 2. Silverman, S.G., Collick, B.D., Figueira, M.R., Khorasani, R., Adams, D.F., Newman, R.W., Topulos, G.P., Jolesz, F.A.: Interactive MR-guided biopsy in an openconfiguration MR imaging system. Radiology 197(1), 175–181 (1995) 3. Dumoulin, C.L., Souza, S.P., Darrow, R.D.: Real-time position monitoring of invasive devices using magnetic resonance. Magnetic Resonance in Medicine 29, 411– 415 (1993) 4. Derbyshire, J.A., Wright, G.A., Henkelman, R.M., Hinks, R.S.: Dynamic scanplane tracking using MRI position monitoring. J. Mag. Res. Imag. 8(4), 924–932 (1998) 5. Hushek, S.G., Fetics, B., Moser, R.M., Hoerter, N.F., Russell, L.J., Roth, A., Polenur, D., Nevo, E.: Initial Clinical Experience with a Passive Electromagnetic 3D Locator System. In: 5t h Interventional MRI Symp., Boston MA, pp. 73–74 (2004) 6. Krieger, A., Fichtinger, G., Metzger, G., Atalar, E., Whitcomb, L.L.: A hybrid method for 6-dof tracking of mri-compatible robotic interventional devices. In: Proceedings of the IEEE Int. Conf. on Rob. and Auto., Florida, IEEE Computer Society Press, Los Alamitos (2006) 7. de Oliveira, A., Rauschenberg, J., Beyersdorff, D., Bock, W.S.M.: Automatic detection of passive marker systems using phase-only cross correlation. In: The 6th Interventional MRI Symposium, Leipzig, Germany (2006) 8. DiMaio, S., Kacher, D., Ellis, R., Fichtinger, G., Hata, N., Zientara, G., Panych, L., Kikinis, R., Jolesz, F.: Needle artifact localization in 3t mr images. In: Studies in Health Technologies and Informatics (MMVR), vol. 119, pp. 120–125 (2005) 9. DiMaio, S., Fischer, G., Haker, S., Hata, N., Iordachita, I., Tempany, C., Kikinis, R., Fichtinger, G.: A system for mri-guided prostate interventions. In: Int. Conf. on Biomed. Rob. and Biomechatronics, Pisa, Italy, IEEE/RAS-EMBS (2006) 10. Fischer, G., et al.: Development of a robotic assistant for needle-based transperineal prostate interventions in mri. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. LNCS, Springer, Heidelberg (2007) 11. Susil, R., Anderson, J., Taylor, R.: A single image registration method for ct-guided interventions. In: Taylor, C., Colchester, A. (eds.) MICCAI 1999. LNCS, vol. 1679, pp. 798–808. Springer, Heidelberg (1999)
58
S.P. DiMaio et al.
12. Reitmayr, G., Schmalstieg, D.: Opentracker-an open software architecture for reconfigurable tracking based on xml. In: IEEE Virtual Reality Conference, IEEE Computer Society Press, Los Alamitos (2001) 13. Samset, E., Hans, A., von Spiczak, J., DiMaio, S., Ellis, R., Hata, N., Jolesz, F.: The SIGN: A dynamic and extensible software framework for Image-Guided Therapy. In: Workshop on Open Source and Data for MICCAI, Insight-Journal (2006), http://hdl.handle.net/1926/207 14. Lee, S., Fichtinger, G., Chirikjian, G.S.: Numerical algorithms for spatial registration of line fiducials from cross-sectional images. Medical Physics 29(8), 1881–1891 (2002)
Design and Preliminary Accuracy Studies of an MRI-Guided Transrectal Prostate Intervention System Axel Krieger1 , Csaba Csoma1 , Iulian I. Iordachita1, Peter Guion2 , Anurag K. Singh2 , Gabor Fichtinger1 , and Louis L. Whitcomb1 1
Department of Mechanical Engineering, Johns Hopkins University, Baltimore 2 Radiation Oncology Branch, NCI - NIH-DHHS, Bethesda
Abstract. This paper reports a novel system for magnetic resonance imaging (MRI) guided transrectal prostate interventions, such as needle biopsy, fiducial marker placement, and therapy delivery. The system utilizes a hybrid tracking method, comprised of passive fiducial tracking for initial registration and subsequent incremental motion measurement along the degrees of freedom using fiber-optical encoders and mechanical scales. Targeting accuracy of the system is evaluated in prostate phantom experiments. Achieved targeting accuracy and procedure times were found to compare favorably with existing systems using passive and active tracking methods. Moreover, the portable design of the system using only standard MRI image sequences and minimal custom scanner interfacing allows the system to be easily used on different MRI scanners.
1
Introduction
Background and Motivation: Prostate cancer is the most common noncutaneous cancer in American men. For 2007, Jemal et al. [1] estimate 218,890 new cases of prostate cancer and 27,050 deaths caused by prostate cancer in the United States. The current standard of care for verifying the existence of prostate cancer is transrectal ultrasound (TRUS) guided biopsy. TRUS provides limited diagnostic accuracy and image resolution. In [2] the authors conclude that TRUS is not accurate for tumor localization and therefore the precise identification and sampling of individual cancerous tumor sites is limited. As a result, the sensitivity of TRUS biopsy is only between 60% and 85% [3,4]. Magnetic Resonance Imaging (MRI) with an endorectal coil affords images with higher anatomical resolution and contrast than can be obtained using TRUS [2]. Targeted biopsies of suspicious areas identified and guided by MRI could potentially increase the sensitivity of prostate biopsies. Moreover, once a lesion is confirmed as cancerous, MR-guided targeted treatment of the lesion with injections of therapeutic agents, cryoablation, or radio frequency (RF) ablation could be used.
The authors gratefully acknowledge support under grants NIH 1R01EB002963 and NSF EEC-9731748.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 59–67, 2007. c Springer-Verlag Berlin Heidelberg 2007
60
A. Krieger et al.
Previous Work in MRI Guided Prostate Interventions: MRI guided transperineal prostate biopsy has been demonstrated inside an open MRI scanner [5] and conventional closed configuration MRI scanner [6]. The transrectal approach is generally well tolerated by patients and is considered the standard approach for biopsies. The alternative, transperineal access, requires a longer needle path which may increase patient discomfort. It also generally requires the patient to be sedated for procedures. Beyersdorff et al. report a MRI guided transrectal needle biopsy system, which employs a passive fiducial marker sleeve coaxial with the biopsy needle [7]. In this system, the needle position is manually adjusted while the passive marker is imaged. This approach requires repeated volume imaging of high resolution that takes considerable time to acquire. An endorectal imaging coil can not be used with this system, which compromises the quality of the MR images. Krieger et. al. report a MRI compatible manipulator for transrectal needle biopsy, using an active tracking method comprised of three micro-tracking coils, rigidly attached to the end-effector of the manipulator providing real-time tracking [8]. The manipulator contains an endorectal imaging coil and uses two fixed angle needle channels for biopsies of distal and proximal parts of the prostate. However, Krieger et al. identified three disadvantages of this tracking method [9]: (a) The method requires custom scanner programming and interfacing which limits the portability to different scanners. (b) Each tracking coil occupies one receiving scanner channel, limiting the number of imaging coils that can be used simultaneously. (c) Frequent failures in the micro-coils and electrical circuit significantly degrade the reliability of the tracking method. In contrast to these approaches, we have developed a MR guided transrectal prostate interventional system, which employs (a) novel manipulator mechanics with a steerable needle channel in combination with an endorectal imaging coil, (b) a hybrid tracking method, with the goals of shortened procedure time and significantly simplified deployment of the system on different scanners, while achieving millimeter needle placement accuracy.
2
System Design
Manipulator Design: The needle manipulator assists the physician in inserting a needle to a predetermined target. A manual actuated design for the manipulator was chosen over an automated design, since the manual actuation reduces development time and approval time for clinical trials. There is a strong current need for an MRI-guided prostate intervention system as a research validation tool. In particular, MR spectroscopy (MRS) and dynamic contrast enhanced (DCE) MRI are two promising developing MR imaging modalities, whose capabilities in finding cancerous lesions in the prostate can be tested using this intervention system. Moreover, manual actuation for insertion of the needle is preferable to many physicians to obtain visual confirmation of the needle alignment before insertion and haptic feedback during the insertion of the needle. Automated insertion of the needle inside the MRI scanner could potentially allow for real-time visualization of the needle insertion and enable detection of
Design and Preliminary Accuracy Studies
61
Fig. 1. Left: Photograph of the MRI guided transrectal manipulator with the endorectal imaging coil placed in a prostate phantom. Biopsy gun, surface imaging coil and mounting arm are also visible. Right: Closeup photograph of the manipulator. Turning the knobs on the left rotate the endorectal sheath with hinge and needle channel and change the angle of the steerable needle channel respectively. An endorectal, single loop imaging coil is integrated into the sheath.
prostate deformation, misalignment, and deflection of the needle. However, the design for a fully automated manipulator for prostate biopsy would be further complicated by the fact that the tissue specimen has to be removed after each biopsy, which is hard to achieve automatically. In our design, the patient is pulled out of the MRI scanner on the scanner table for the physician to manually actuate the manipulator and insert the biopsy needle. Figure 1 on the left shows the manipulator with its endorectal imaging coil placed in a prostate phantom (CIRS Inc, Norfolk, VA). The position of the manipulator is secured using a mounting arm. The manipulator guides the needle tip of a standard MR compatible biopsy gun (Invivo Germany GmbH, Schwerin, Germany) to a predetermined target in the prostate. A surface imaging coil is placed under the phantom to enhance the MRI signal. Figure 1 on the right shows a close up photograph of the manipulator. The endorectal sheath is inserted in the rectum, such that the hinge is placed close to the anus of the patient. The endorectal sheath contains a single loop imaging coil, which is glued into a machined groove on the sheath. A steerable needle channel is integrated into the sheath. The three degrees of freedom (DOF) to reach a target in the prostate are rotation of the sheath, angulation change of the steerable needle channel, and insertion of the needle. Rotation of the sheath is achieved by turning the larger diameter knob on the left of the manipulator, which directly rotates the sheath with hinge and needle channel. An internal spring washer applies sufficient axial pre-load to avoid unintentional rotation of the sheath. The sheath can be rotated 360 degrees, thus allowing for a variety of patient positions including prone, supine and decubitus. This is further supported by the cone shape of the manipulator, which precludes obstructions for the biopsy gun at all rotational angles (except for the attachment to the mounting arm). In Figure 1, an outline of a prostate is sketched below the sheath, indicating prone positioning of the patient. Needle angle adjustment of the steerable needle channel is controlled by turning the smaller diameter knob on the left of the manipulator. Turning the knob causes an internal rod in the center of the
62
A. Krieger et al.
manipulator axis to be translated forward and backward via a two stage cumulative screw mechanism. The push rod is connected to the steerable needle channel, thus rotating the needle channel about the hinge axis and controlling the needle angle. A narrow slot on the bottom of the endorectal sheath allows the needle to exit the sheath at an angle between 17.5 and 40 degrees. The mounting arm consists of two parts: a slide and rail assembly (Igus Inc., E. Province, RI) for linear motion into and out of the scanner bore with an integrated locking mechanism and a custom designed passive arm. The passive arm is comprised of a rigid plastic rod connected with spherical joints to the slide and the manipulator respectively. A locking mechanism is built into the rod to simultaneously immobilize both joints, once the manipulator is placed at its desired location. The mounting arm is designed to be sufficiently strong and rigid to practically preclude deflection of the manipulator, thus allowing an initial registration of the manipulator position to hold during an interventional procedure. The endorectal sheath with the hinge and needle channel are cleaned and sterilized before every procedure. Medical grade heat shrink (Tyco Electronics Corporation, Menlo Park, CA) is fitted around the sheath to keep it cleaner during a procedure. A click-in mechanism, comprised of a flat spring and a small nylon ball provides fast and easy assembly of the sterilized parts to the manipulator prior to a procedure. The presence of a strong magnetic field inside an MRI scanner precludes the use of any ferromagnetic materials. Nonmagnetic metals can create imaging artifact, caused by a disturbance of the magnetic field due to difference in susceptibility of the metal and surrounding objects, and need to be minimized. The manipulator is constructed mostly of plastic materials, foremost of Ultem (GE Plastics, Pittsfield, MA), selected for its structural stability, machinability and low cost. The endorectal sheath is built out of medical grade Ultem, since it may contact patient tissue. Only very small nonmagnetic metallic components are placed closed to the field of view (FOV) of the prostate: a brass needle channel, a phosphor bronze flat spring for the click in mechanism of the sheath, and an aluminum hinge axle. Additional brass and aluminum parts located in the mounting arm are spaced sufficiently from the FOV. Imaging studies revealed that the device did not cause visual artifacts at the FOV. Hybrid Tracking Method: The hybrid tracking method is comprised of a combination of passive tracking and joint encoders - an approach similar to that reported in [9]. At the beginning of an interventional procedure, the initial position of the device in scanner coordinates is obtained by automatically segmenting fiducial markers placed on the device in MRI images. From this initial position motion of the device along its DOFs is encoded with fiber-optical and manual encoders. For initial registration, an attachment is placed concentrically over the needle channel of the manipulator (Figure 2, left). The attachment contains two tubular MRI markers (Beekley Corp., Bristol, CT). Two additional markers are placed into the main axis of the manipulator. Instead of acquiring axial image sets along the axes, which would take several minutes, a thin slab of 1 mm x 1 mm x 1 mm isotropic sagittal turbo spin echo (TSE) proton density images
Design and Preliminary Accuracy Studies
63
Fig. 2. Left: Photograph of manipulator during initial registration. An attachment is placed concentrically over the needle channel. The tube contains two tubular markers. Two additional markers are placed into the main axis of the manipulator. Right: Example of two binary reformatted MR images axial to a fiducial marker. The segmentation algorithm finds the best fitting circle center indicated by a big cross on both images. The algorithm is able to find the center, even when air bubbles in the marker on the left contaminate the image. Small crosses indicate the border of the marker.
in the plane of the markers is obtained. This reduces the imaged volume significantly and therefore reduces scan time of this high resolution image set to 2.5 minutes. In order to aid automatic segmentation of the markers, the sagittal images are reformatted using a custom targeting program as axial images along the main axis of the device and along the needle axis. The tubular markers appear on the reformatted axial images as circles. An algorithm was written based on the Hough transformation, which finds on each binary reformatted image the best fitting center of a circle with known diameter of the marker (Figure 2, right). This segmentation is very robust even on images containing air bubbles in the marker. Once both axes are calculated from the circle centers using a least square minimization, the 6-DOF position of the manipulator is defined. Rotation and needle angle change are redundantly encoded by MRI-compatible fiber-optic encoders and mechanical scales placed on the actuation knobs of the manipulator. The needle insertion depth is read manually using the scale on the needle. Although not present in our current design, it is possible to incorporate a translational optical encoder for the needle insertion. The fiber optic joint encoders consist of photoelectric sensors (Banner Engineering Corp., Minneapolis, Minnesota) placed in a box in the control room, adjacent to the shielded MRI scanner room. Each sensor is connected to two plastic optical fibers: one for sending of optical signal, and one for reception of optical signal. Only the plastic optical fibers are passed into the scanner room. Full MR compatibility of the joint encoding is achieved, since no electrical signal or power cables are passed into the scanner room. The optical fiber ends of each sensor are placed opposing each other through a code wheel for encoding rotation of the manipulator and through a code strip for encoding translation of the push rod, and thus indirectly encoding needle angle change. A two channel quadrature design with a third channel as index pulse is used for both encoders. Each sensor provides one channel, so six sensors are necessary to build the two encoders. Encoder
64
A. Krieger et al.
resolution for rotation of the manipulator is 0.25 degrees, and for needle angle less than 0.1 degrees at all needle angles. In our present design the resolution of the encoders is limited by the size of the core diameter (0.25 mm) of the plastic fiber, since the light is not columnated before passing through the code wheel. Targeting Program: The targeting program runs on a laptop computer located in the control room. The only data transfer between laptop and scanner computer are DICOM image transfers. The fiber optic encoders interface via a USB counter (USDigital, Vancouver, Washington) to the laptop computer. The targeting software displays the acquired MR images, provides the automatic segmentation for the initial registration of the manipulator, allows the physician to select targets for needle placements, provides targeting parameters for the placement of the needle, and tracks rotation and needle angle change provided by the encoders, while the manipulator is moved on target.
3
Experiments, Results, and Discussion
The system for MRI guided transrectal prostate interventions was tested in a phantom experiment on a 3T Philips Intera MRI scanner (Philips Medical Systems, Best, NL) using standard MR compatible biopsy needles and non artifact producing glass needles. The experimental setup is shown in Figure 1. Biopsy Needle Accuracies: The manipulator was placed in a prostate phantom and its initial position was registered. Twelve targets were selected within all areas of the prostate, from base to mid gland to apex, on T2 weighted axial TSE images (Figure 3, first row). For each target, the targeting program calculated the necessary targeting parameters for the needle placement. Rotation, needle angle and insertion depth to reach the target were displayed on a window which was projected onto a screen located next to the scanner. The phantom was pulled out of the MRI scanner on the scanner table, the physician rotated the manipulator, adjusted the needle angle and inserted the biopsy needle according to the displayed parameters. Since the fiber optic cables of our present prototype were too short to reach all the way from the control room into the scanner, this experiment was performed without the use of optical encoders. Instead, solely the mechanical scales on the manipulator were used to encode the rotation and needle angle. Compared to the respective resolution of the optical encoders of 0.25 degrees and 0.1 degrees, the mechanical scales feature slightly lower resolutions of 1.0 degrees and 0.5 degrees. The phantom was rolled back into the scanner to confirm the location of the needle on axial TSE proton density images which show the void created by the biopsy needle tip close to the target point (Figure 3, second row). The in-plane error for each of the twelve biopsies, defined as the distance of the target to the biopsy needle line was subsequently calculated to assess the accuracy of the system. The needle line was defined by finding the first and the last slice of the acquired confirmation volume, where the needle void is clearly visible. The center of the needle void on the first slice and the center of the void on the last slice define the needle line. The out of
Design and Preliminary Accuracy Studies
65
Fig. 3. Targeting images, biopsy needle confirmation images, glass needle confirmation images and in plane errors for twelve biopsies of a prostate phantom. First and fourth row: Two targets (cross hairs) per image are selected on axial TSE T2-weighted images. The dark cross hair represents the active target. Second and fifth row: The biopsy needle tip void is visualized in an axial TSE proton density image. The desired target approximately matches the actual position of the needle. Third and sixth row: The glass needle tip void is visualized in an axial TSE proton density image. The void for the glass needle is much smaller than for the biopsy needle and closer to the selected target. Numbers indicate the in-plane needle targeting error for the needle placement.
plane error is not critical in biopsy procedures due to the length of the biopsy core and was not calculated. Hence, from the purpose of accuracy, there is no need for a more precise motorized needle insertion. The average in-plane error for the biopsy needles was 2.1 mm with a maximum error of 2.9 mm. Glass Needle Accuracies: The void created by the biopsy needle is mostly due to susceptibility artifact caused by the metallic needle. The void is not concentric around the biopsy needle and depends on the orientation of the needle to the direction of the main magnetic field in the scanner (B0), and the direction of the spatially encoding magnetic field gradients [10]. Consequently, center of needle voids do not necessarily correspond to actual needle centers. And since the same imaging sequence and similar orientation of the needle is used for all targets in a procedure, a systematic shift between needle void and actual needle might occur, which introduces a bias in the accuracy calculations. To explore this theory, every biopsy needle placement in the prostate phantom was followed by a placement of a glass needle to the same depth. The void created by the glass needle is
66
A. Krieger et al.
purely caused by a lack of protons in the glass compared to the surrounding tissue, and is thus artifact free and concentric to the needle. The location of the glass needle was again confirmed by acquiring axial TSE proton density images (Figure 3, third row). The average in-plane error for the glass needles was 1.3 mm with a maximum error of 1.7 mm, compared to 2.1 mm and 2.9 mm for the biopsy needles, which is sufficient to target the minimal clinically significant foci size of 1/2 cc [11]. Analyzing the error reveals an average shift between glass needle void location and biopsy needle void location of only 0.1 mm in the L-R direction, but 0.9 mm in the A-P direction. This corresponds to the direction of the frequency encoding gradient of the TSE imaging sequence and is consistent with the findings of [10]. The procedure time for six needle biopsies not including the glass needle insertion was measured at 45 minutes. In summary, we reported the results of preliminary phantom experiments to evaluate the feasibility of performing prostate interventions with the proposed system. The phantom experiments show adequate coverage of the prostate gland and demonstrate accurate and fast needle targeting of the complete clinical target volume. The errors and procedure time compare favorably to reported results (average error 1.8 mm and average procedure times of 76 minutes) that Krieger et al. achieved with the active tracking method in initial clinical trials [4]. The hybrid tracking method allows this system to be used on any MRI scanner without extensive systems integration and calibration. The two connections required are connection of the endorectal imaging coil to a scanner receiver channel and the DICOM image transfer between scanner computer and laptop computer running the targeting program. The rigid construction of mounting arm and manipulator, optimized manipulator mechanics, and use of fast actuated biopsy guns suggest that reported phantom accuracies of the proposed system translate well to real anatomical accuracies in clinical studies. Institutional review board (IRB) approvals were granted at two clinical sites. Initial clinical results will be reported at the conference.
References 1. Jemal, A., Siegel, R., Ward, E., Murray, T., Xu, J., Thun, M.J.: Cancer statistics, 2007. CA Cancer J. Clin. 57(1), 43–66 (2007) 2. Yu, K.K., Hricak, H.: Imaging prostate cancer. Radiol. Clin. North. Am. 38(1), 59–85 (2000) 3. Norberg, M., Egevad, L., Holmberg, L., Spar´en, P., Norl´en, B.J., Busch, C.: The sextant protocol for ultrasound-guided core biopsies of the prostate underestimates the presence of cancer. Urology 50(4), 562–566 (1997) 4. Terris, M.K.: Sensitivity and specificity of sextant biopsies in the detection of prostate cancer: preliminary report. Urology 54(3), 486–489 (1999) 5. Hata, N., Jinzaki, M., Kacher, D., Cormak, R., Gering, D., Nabavi, A., Silverman, S.G., D’Amico, A.V., Kikinis, R., Jolesz, F.A., Tempany, C.M.: Mr imaging-guided prostate biopsy with surgical navigation software: device validation and feasibility. Radiology 220(1), 263–268 (2001)
Design and Preliminary Accuracy Studies
67
6. Susil, R.C., Camphausen, K., Choyke, P., McVeigh, E.R., Gustafson, G.S., Ning, H., Miller, R.W., Atalar, E., Coleman, C.N., M´enard, C.: System for prostate brachytherapy and biopsy in a standard 1.5 t mri scanner. Magn. Reson. Med. 52(3), 683–687 (2004) 7. Beyersdorff, D., Winkel, A., Hamm, B., Lenk, S., Loening, S.A., Taupitz, M.: Mr imaging-guided prostate biopsy with a closed mr unit at 1.5 t: initial results. Radiology 234(2), 576–581 (2005) 8. Krieger, A., Susil, R.C., Menard, C., Coleman, J.A., Fichtinger, G., Atalar, E., Whitcomb, L.L.: Design of a novel MRI compatible manipulator for image guided prostate interventions. IEEE Transactions on Biomedical Engineering 52(2), 306– 313 (2005) 9. Krieger, A., Metzger, G., Fichtinger, G., Atalar, E., Whitcomb, L.L.: A hybrid method for 6-DOF tracking of MRI-compatible robotic interventional devices. In: Proceedings - IEEE International Conference on Robotics and Automation, Orlando, FL, United States, vol. 2006, pp. 3844–3849. IEEE Computer Society Press, Los Alamitos (2006) 10. DiMaio, S.P., Kacher, D.F., Ellis, R.E., Fichtinger, G., Hata, N., Zientara, G.P., Panych, L.P., Kikinis, R., Jolesz, F.A.: Needle artifact localization in 3t mr images. Stud. Health. Technol. Inform. 119, 120–125 (2006) 11. Bak, J.B., Landas, S.K., Haas, G.P.: Characterization of prostate cancer missed by sextant biopsy. Clin. Prostate. Cancer 2(2), 115–118 (2003)
Thoracoscopic Surgical Navigation System for Cancer Localization in Collapsed Lung Based on Estimation of Lung Deformation Masahiko Nakamoto1 , Naoki Aburaya1, Yoshinobu Sato1 , Kozo Konishi2 , Ichiro Yoshino2 , Makoto Hashizume2 , and Shinichi Tamura1 1
Division of Image Analysis, Graduate School of Medicine, Osaka University, Japan 2 Graduate School of Medical Sciences, Kyushu University, Japan
Abstract. We have developed a thoracoscopic surgical navigation system for lung cancer localization. In our system, the thoracic cage and mediastinum are localized using rigid registration between the intraoperatively digitized surface points and the preoperative CT surface model, and then the lung deformation field is estimated using nonrigid registration between the registered and digitized point datasets on the collapsed lung surface and the preoperative CT lung surface model to predict cancer locations. In this paper, improved methods on key components of the system are investigated to realize clinically acceptable usability and accuracy. Firstly, we implement a non-contact surface digitizer under thoracoscopic control using an optically tracked laser pointer. Secondly, we establish a rigid registration protocol which minimizes the influence of the deformation in different patient’s positions by analyzing MR images of volunteers. These techniques were evaluated by in vitro and clinical experiments.
1
Introduction
The detection ratio of early small lung cancers has been improved by CT screening, and then resection of the small cancers by thoracoscopic surgery has recently become common as a minimally invasive technique. However, one problem is that localization of a small cancer often takes long time and sometimes even results in failure under thoracoscopic view. The lung is greatly deformed due to lung collapse by air suction to create a sufficient amount of workspace for surgical operation (Fig. 1). Thus the cancer position may change largely from its original position in a preoperative CT image. Furthermore, weak tactile feedback of surgical instruments makes the cancer localization difficult. Therefore, in order to narrow the extent of the existence possibility of a cancer in the collapsed lung, a system which predicts and indicates the cancer position during the surgery is highly desirable. To assist cancer localization, Shimada et al. developed a magnetic cancer tracking system [1]. In this system, a small magnetic marker is embedded near the tumor by CT-guided bronchoscopy just before the surgery, and then the N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 68–76, 2007. c Springer-Verlag Berlin Heidelberg 2007
Thoracoscopic Surgical Navigation System
69
Collapsed lung Visible surface
Workspace Cranial Mediastinum surface
Vertebral pleural surface Invisible surface of deeper anatomy Left Anterior
Fig. 1. Axial section of lung after collapse in the lateral position
Posterior
(a) Estimated lung shape (b) AR thoracoscopic imand cancer positions. age. Yellow circles: superimBlue points: actual po- posed cancers. sitions (gold standard). Red points: estimated positions. Fig. 2. Results of preliminary clinical experiment
tumor can be localized by tracking the embedded marker during the surgery. However, this approach requires an additional intervention procedure before the surgery. As an alternative approach that does not involve additional intervention, we have developed a surgical navigation system for cancer localization which estimates the collapsed lung deformation during the surgery [2]. In this system, the thoracic cage and mediastinum are localized using rigid registration between the digitized chest skin surface points and preoperative CT skin surface model, and then the lung deformation is estimated using nonrigid registration between the digitized and registered collapsed lung surface points and preoperative CT lung surface model. In our previous work [2], however, there are the following problems. (1) Physical contact of a long digitizing probe with the collapsed lung surface was necessary to acquire the 3D surface position data of the lung surface. (2) Difference in the patient’s position (supine during CT scanning and lateral during the surgery) was not considered in the rigid registration to localize the thoracic cage and mediastinum during the surgery. The former problem may cause risk of damaging the lung as well as degradation of positional accuracy due to surface deformation at a contact point or slight bend of the long probe. The latter may cause significant localization errors for the thoracic cage and mediastinum during the surgery due to skin deformation in different patient’s positions, which will finally affect cancer localization accuracy. In this paper, we describe improved methods which address the above mentioned problems. In order to solve the problems, we aim at the followings.
70
M. Nakamoto et al.
(1) Implementation and test of a non-contact surface digitizer that is compatible with thoracoscopy. (2) Establishment of the rigid registration protocol which minimizes the influence of the different patient’s positions. Previous studies investigated the use of laser systems for non-contact organ surface digitizing [3][4][5]. However, a dedicated endoscope in which a laser pointer is embedded, as described in [3], is not widely used, and a laser-scanner used in [5] is incompatible with thoracoscopy. In contrast, we combine a conventional laser pointer and thoracoscope both of which are tracked by an optical tracker so as to be widely available and compatible with thoracoscopy. To develop the rigid registration protocol, MR images of several volunteers in the supine and lateral positions are analyzed. The protocol is derived from the analysis results.
2 2.1
Methods System Overview
The thoracoscopic surgical navigation system consists of an optical 3D tracker (Polaris, Northern Digital Inc., Canada) , an oblique-viewing thoracoscope, a 3D position digitizer (i.e. stylus probe tracked by the optical 3D tracker) , a laser pointer (KOKUYO, IC-GREEN, Japan), and a PC (Xeon 3.0 GHz × 2, 2 GB memory). Optical markers are attached to the thoracoscope, 3D position digitizer, and laser pointer so that their position and orientation are measured by the optical tracker. We denote their position and orientation as Tscope , Tdigitizer and Tlaser , respectively. T is a 4 × 4 matrix representing rigid transformation defined by 3 × 3 rotation matrix R and translational 3D vector t. The oblique-viewing thoracoscope camera is calibrated by the method described in [6] beforehand. In this system [2], the thoracic cage and mediastinum are localized using ICP rigid registration between the digitized chest skin surface points and preoperative CT skin surface model, and then deformation field due to lung collapse is estimated by point-based nonrigid registration [7]. The preoperative CT image is deformed by the obtained deformation field, which is the continuous and smooth in the whole CT space, and then the collapsed lung shape and cancer position are estimated (Fig. 2(a)). The estimated cancer position is superimposed onto live thoracoscopic images, so that a surgeon can find a cancer from the neighborhood of the indicated position (Fig. 2(b)). In the preliminary clinical experiments, the estimation accuracy of the cancer positions were around 10 mm. The acquisition time of the digitized points was 5 minutes. Computation time for estimation of lung deformation was also 5 minutes.These results showed that potential usefulness of the system for narrowing the extent of the existence possibility of the cancer. Although the feasibility of our approach was confirmed, we also have found that accuracy was sometimes unstable (for examples, 20 mm or more), and thus the problems described in the previous section were clarified. 2.2
Digitizing Surface Points by Laser Pointer
To realize non-contact surface digitizing, we employed a commercially available laser pointer. While scanning the target surface using the laser pointer by freehand,
Thoracoscopic Surgical Navigation System
71
Optical axis of laser
Laser point
Detected point
Noise Optical markers
(a) Laser pointer.
Epipolar line
(b) Original image.
(c) Detected point.
Fig. 3. Detection of laser point from thoracoscopic image
the laser points on the surface are imaged by thoracoscope. Thoracoscopic images, Tlaser , and Tscope are recorded simultaneously during scanning, and then image processing for 3D coordinates measurement described below is performed in real-time. Figure 3 shows the appearance of the laser pointer and an example of laser point detection. We employ a green laser to obtain high contrast between the laser point and the organ surface. To detect the laser point region in the thoracoscopic image, color space conversion is performed and then the candidate regions are detected by p-tile thresholding. The center of gravity of each region is calculated, and then the region that is closest to the epipolar line is selected as the laser point. The 3D line passing through the focal point and the detected point, lview , is written as lview = sTscope ((px − cx )/fx , (py − cy )/fy , 1.0)T + tscope , where s is an arbitrary scalar, and (px , py ) is position of the detected point. (cx , cy ) and (fx , fy ) are image center and focal length, respectively. The 3D line of the laser, llaser , is written as llaser = sRlaser vlaser + Tlaser qlaser , where vlaser and qlaser are respectively the direction and position defining the laser line relative to the attached optical markers, which are obtained during calibration stage. The 3D position of the laser point is defined as the intersection point of lview and llaser . 2.3
Rigid Registration of Preoperative CT Model
Rigidity evaluation due to patient’s position change using MR images of volunteers. While the visible surface of the collapsed lung is digitized during surgery using the laser pointing system, subsequent nonrigid registration will be unstable without positional information on the invisible surface of deeper anatomy (invisible surface) attached to the mediastinum and the thoracic cage around the vertebra (see Fig. 1). In our previous system [2], the invisible surface is localized using rigid registration between actual patient and preoperative skin surface model assuming rigid relation between the invisible and skin surfaces. However, patient’s position changes from supine position during preoperative CT scanning to lateral position during surgery, and then the rigidity between the invisible and skin surfaces may not be kept after the position change.
72
M. Nakamoto et al.
We analyzed MR images of 6 volunteers to evaluate the rigidity. MR images of the same subject taken in lateral and supine positions were registered using regions around vertebral column by supposing that the thoracic cage is rigid and stable around the vertebral column in spite of the position change. Figure 4 shows axial and coronal sections of the registered images. Although misalignment over the skin, diaphragm, and mediastinal surfaces were around 10 mm or more due to deformation, misalignment over a wide range of backside of the thoracic cage as well as around the vertebral column (i.e. region enclosed with rectangle in Fig. 4(a)) was around 2 mm. As a result of observation of 6 subjects, the following findings were obtained: (1) A wide range of backside of the thoracic cage kept the rigidity with the invisible surface around the vertebral column (Hereafter, we call it the vertebral pleural surface). (2) The chest skin largely deformed due to the position change and did not keep rigidity with the invisible surface. (3) The median line on the sternum largely moved due to the position change but it was only along posterior-anterior direction in the median plane. Chest skin
Mediastinal surface Median lines Mediastinal surface
Median plane
Vertebral pleural surface
(a) Axial section.
Diaphragm
(b) Coronal section.
Fig. 4. Deformation caused by position change. Green: lateral position. Red: supine position.
Proposed registration protocol. Based on the findings described above, we derive the following rigid registration protocol. (1) Points on backside of the thoracic cage, which are intraoperatively acquired using the laser pointing system under thoracoscope, are registered to the preoperative CT lung surface. (2) Points on the skin along the median line, which are acquired using a 3D position digitizer, are registered to the preoperative CT median plane. Surface model of the thoracic cage Stc and the median plane Smp are reconstructed from patient’s CT image. The points on the skin along the median line {qi } were acquired and then digitized surface points on the thoracic cage {pi } were acquired after collapse under thoracoscope. ICP algorithm is performed by minimizing the following cost function:
Thoracoscopic Surgical Navigation System
E(Trigid ) =
M
|d(Stc , Trigid pi )| +
i=1
2
N
|d(Smp , Trigid qj )|2 ,
73
(1)
j=1
where Trigid is a rigid transformation from the patient space to the CT space. d(S, p) is a function representing the closest distance between a surface S and a 3D position p. Using estimated Trigid , the vertebral pleural surface is located.
3 3.1
Experimental Results Evaluation of Laser Surface Digitizing
Laboratory experiments. We compared accuracy of digitized surface points by the laser pointer with that by the conventional digitizing probe. The phantom was fixed in the training box, and then its surface points were acquired by the laser pointer and digitizing probe through a trocar under thoracoscopic control. The error was defined as an average distance between the digitized surface points and the gold standard surface. The gold standard surface was determined by ICP registration between densely acquired surface points and the surface model reconstructed from CT images of the phantom. The errors of the laser and probe digitizing were 3.6 ± 1.4 mm and 4.6 ± 2.0 mm, respectively, and thus the laser pointer was more accurate than the probe. Clinical experiments. We tested the laser pointing surface digitizer under thoracoscopic control for a real patient according to the IRB approved protocol. The conventional long probe were also employed to compare with the laser pointing digitizer. Using the non-contact digitizer, 90 points were acquired from 136 measurements. 21 measurements were failed to detect the laser points, and 15 measurements were rejected due to small parallax (less than 20 degrees) between the optical axis of the laser and the line of sight to the laser point. Average parallax was 34 degrees and acquisition time was around 4 minutes. Using the long probe, 53 points were acquired and acquisition time was around 3 minutes. Figure 5 shows distribution of acquired points. The extents of point datasets acquired by the two digitizers were comparable. 3.2
Simulation Experiments Using Volunteer MR Images to Validate Rigid Registration
The proposed rigid registration protocol was compared with intensity-based rigid registration and conventional ICP algorithm using the skin surface. The results of intensity-based method is regarded as ideal since rich information inside the body is available, which is unable to acquire without intraoperative MR or CT. Rigid registration of MR images between in the supine position at inspiration and in the lateral position at expiration was performed for six pairs of MR images by using the three protocols. Around 10 and 40 points were acquired from the median line and the backside of the thoracic cage, respectively. Registration error Ereg of estimated rigid transform Trigid was defined
74
M. Nakamoto et al.
14 12
Cranial
Error (mm)
10
Cranial
8 6 4 2
Anterior
Left
(a) Lateral view.
(b) Posterior-anterior view.
0
Intensity-based ICP (chest skin) method
ICP (proposed protocol)
Fig. 5. Results of clinical experiment for lung Fig. 6. Registration error of vertesurface digitizing. Red points: acquired by laser bral pleural surface pointer. Blue points: acquired by digitizing probe.
(a) Cranial-caudal view.
(c) Intensity-based.
(b) Lateral view.
(d) ICP (chest skin).
(e) ICP (proposed protocol).
Fig. 7. Results on accuracy evaluation for rigid registration. Blue and green points for rigid registration in the upper images are acquired from the median line and the backside of the thoracic cage, respectively. Lower images are axial sections of registered images in lateral position. Green: lateral position. Red: supine position.
Thoracoscopic Surgical Navigation System
75
as Ereg = 1/N N i=1 |d(S, Trigid qi )|, where {qi , i = 1, ..., N } is a point set acquired from the vertebral pleural surface in the lateral position, and S is the vertebral pleural surface in the supine position. As a result, the error of the proposed method was 2.3 ± 1.6 mm and it was comparable with the error of the intensity-based method (Fig. 6). The error of ICP algorithm using the skin was around 7 mm due to the large deformation caused by the position change. Figure 7 shows axial sections of the registered MR images. In the cases of the intensity-based and the proposed method, it was confirmed that misalignment on the vertebral pleural surface was sufficiently small.
4
Discussion and Conclusions
We have described improved methods for key components of our thoracoscopic surgical navigation system for cancer localization. We implemented a non-contact surface digitizer using a conventional laser pointer and thoracoscope. In the experiments, accuracy of the non-contact digitizer was better than that of the conventional long digitizing probe, and clinical feasibility was confirmed. As a result, the extent of measured points acquired by the non-contact digitizer was comparable to that of the conventional probe. According to the surgeon’s opinion, the non-contact digitizer was preferred since it allowed him to digitize lung surface without a special care for not damaging the lung by the conventional probe. We also established rigid registration protocol based on the evaluation using MR images of volunteers. In the simulation experiment, registration error of the proposed method was around 2 mm and it was comparable with that of the intensity-based method. Since the error of rigid registration affects accuracy of lung deformation estimation, the proposed method will improve overall accuracy of the system. Although evaluation of the system incorporating the non-contact digitizer and the proposed registration protocol currently have not been performed yet, improvements of the accuracy and usability of the system can be expected if they were employed. Future work includes validation study of the whole system incorporating the proposed methods and integration with biomechanical deformation model of the collapsed lung. Point datasets on the collapsed lung surface acquired by our techniques will utilized as constraints to estimate the biomechanical lung deformation. Our system can be used as a platform of the development and application of such biomechanical techniques.
References 1. Shimada, J., et al.: Intraoperative magnetic navigation system for thoracoscopic surgery and its application to partial resection of the pig lang. In: CARS 2004. Computer Assisted Radiology and Surgery: 18th International Congress and Exhibition, pp. 437–442 (2004) 2. Nakamoto, M., et al.: Estimation of intraoperative lung deformation for computer assisted thoracoscopic surgery. Int. J. Computer Assisted Radiology and Surgery 1(suppl. 1), 273–275 (2006)
76
M. Nakamoto et al.
3. Nakamura, Y., et al.: Laser-pointing endoscope system for natural 3D interface between robotic equipments and surgeons. Studies in health technology and informatics 81, 348–354 (2001) 4. Krupa, A., et al.: Autonomous 3-D positioning of surgical instruments in robotized laparoscopic surgery using visual servoing. IEEE Transactions on Robotics and Automation 19, 842–853 (2003) 5. Sinha, T.K., et al.: A method to track cortical surface deformations using a laser range scanner. IEEE Transactions on Medical Imaging 24, 767–781 (2005) 6. Yamaguchi, T., et al.: Development of camera model and calibration procedure for oblique-viewing endoscopes. Computer Aided Surgery 9, 203–214 (2004) 7. Chui, H., et al.: A unified non-rigid feature registration method for brain mapping. Medical Image Analysis 7, 113–130 (2003)
Clinical Evaluation of a Respiratory Gated Guidance System for Liver Punctures S.A. Nicolau1 , X. Pennec2 , L. Soler1 , and N. Ayache2 1
2
IRCAD-Hopital Civil, Virtual-surg, 1 Place de l’Hopital, 67091 Strasbourg Cedex {stephane.nicolau, luc.soler}@ircad.u-strasbg.fr INRIA Sophia, Epidaure, 2004 Rte des Lucioles, F-06902 Sophia-Antipolis Cedex
Abstract. We have previously proposed a computer guidance system for liver punctures designed for intubated (free breathing) patients. The lack of accuracy reported (1 cm) was mostly due to the breathing motion that was not taken into account. In this paper we modify our system to synchronise the guidance information on the expiratory phases of the patient and present an evaluation on 6 patients of our respiratory gated system. Firstly, we show how a specific choice of patient allows us to rigorously and passively evaluate the system accuracy. Secondly, we demonstrate that our system can provide a guidance information with an error below 5 mm during expiratory phases.
1
Introduction
CT/IRM guided liver puncture is a difficult gesture which can dramatically benefit from a computer guidance system [3,14,10,5]. Indeed, such systems can reduce the repetitive CT/MRI images needed for needle adjustment and the reinsertion attempts that lengthen the intervention duration and increase radiation exposure (when CT-guided). Moreover, it can improve the insertion accuracy that currently depends on the practitioner’s experience. In a previous work [9], we have introduced in the operating room a guiding system for radio-frequency thermal ablation (RFA) and showed that this system meets the sterility and cumbersomeness requirements. Then, the system accuracy was evaluated on patients with a passive protocol neglecting the breathing influence. The accuracy results around 1 cm were much larger that those obtained on a phantom (2 mm) [8](Wacker et. al. obtained an equivalent result on a freely breathing pig [14]). Indeed, liver displacement reaches 1 cm during shallow breathing [1,16]. A recent report shows that RFA ablation has to be performed on tumors which diameter is between 1 and 3 cm [11]. Thus, our radiologists consider that a guidance system has to provide an accuracy above 5 mm to avoid destroying too much healthy cells when the needle tip is not perfectly centered in the tumor.Consequently, to provide useful guidance information to the practitioner, we cannot neglect the breathing deformations. Several approaches are possible to take the breathing into account. Firstly, we can use the predictive model of N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 77–85, 2007. c Springer-Verlag Berlin Heidelberg 2007
78
S.A. Nicolau et al.
organ positions with respect to the breathing proposed by [4]. Unfortunately, it is not accurate enough for our application (error of prediction above 5 mm for the liver). Secondly we can synchronize the guidance system on a particular point of the breathing cycle i.e. the preoperative image and the guidance information are respectively acquired and provided at the same point of the respiratory cycle. This approach is motivated by several studies that evaluate the repositioning error of the liver between 1 and 2 mm [1,16,2,12,15]. Therefore, the cumulated error of the system components ( 3 mm) and the repositioning error ( 2 mm) should remain below 5 mm. This reasonable assumption has still not been demonstrated neither on animals nor patients: validation has been only performed on cadavers [3] or on living pigs without taking breathing into account [14]. In this paper, we report an in vivo accuracy evaluation of our system with a respiratory gating technique. After a presentation of the system principles, we explain how the choice of specific patients allows us to develop a riskless protocol to evaluate rigorously the system accuracy. Finally, we present the experimental results obtained on 6 patients and demonstrate that the system accuracy fits the clinical requirements when the guidance information is provided during expiratory phases.
2
System Components
In our setup, two jointly calibrated cameras are viewing the patient lying on the CT-table who is under general anesthesia and ventilated (70% of RFA are performed under general anesthesia in our local hospital). Radio-opaque markers with a ring shape are stuck on his abdominal skin and a black dot is printed inside each marker. Then, a preoperative CT acquisition is performed during an expiratory phase, the markers are removed and a 3D model of the patient (including his skin, liver, tumors and markers) is automatically obtained from the CT image (cf. top left Fig. 1) [13]. Then, this patient model is rigidly registered in the camera frame using radio-opaque markers, their position being extracted in both CT and video images. The marker extraction and matching is performed automatically and the registration is performed by minimisation of the Extended Projective Point Criterion (EPPC) (algorithmic details are further explained and validated in [7,6]). The needle position is also tracked in real-time by the cameras so that we can display on a screen its relative position with respect to the patient model to guide the practitioner (cf. right Fig. 1). The guidance information is provided only during the expiratory phases. These phases are automatically detected by the system using the reprojection error of CT markers in the video images. Indeed, this error computed in real-time is roughly sinusoidal and minimal during the expiration. We remind here the four error sources when the system is used by the practitioner. Three of them are due to the system only: needle tracking, patient registration and organ repositioning. The last error source is the ability of the practitioner to follow the guiding information provided by the system (we call it guidance error).
Clinical Evaluation of a Respiratory Gated AR System for Liver Punctures
79
Fig. 1. Illustration of the system principles
3
A Safe Evaluation Protocol with Specific Patients
Hepatic tumors sometimes need contrast agent to be injected in the patient to be visible in the CT modality. For these patients, the clinical protocol to target tumors in interventional CT is slightly different from the standard one. A preoperative CT acquisition of the abdomen is realized with contrast agent. To guide the needle, the practitioner performs a mental registration of interventional CT slices with the preoperative CT image (in which tumors are visible). When he thinks the needle is correctly positioned, a second CT acquisition with contrast agent of the patient abdomen is performed. This second CT acquisition allows the practitioner to check the needle position with respect to the tumor he targeted. The additional images available for these patients allow us to perform a passive evaluation of our system using the following data acquisition protocol: Experimental protocol Firstly, we stick homogeneously radio-opaque markers on the patient abdomen and a black dot is printed inside them. Then, a preoperative acquisition CT1 is realized in full expiration (it includes all the markers and the liver). The practitioner removes the markers and attaches to the needle a sterile pattern that allows its tracking. Then, he inserts the needle until he thinks he has correctly targeted the tumor. After the needle positioning, a stereoscopic video of the patient abdomen and needle is done during several breathing cycles. Finally, a second CT acquisition CT2 is done in full expiration, the needle remaining inside the patient (CT2 also includes the whole liver). This protocol does not change the information used by the practitioner to guide the needle: he realizes the intervention with his usual means (CT-slices) without any advice nor instruction from our system. From the acquired experimental data, we can not only evaluate the system accuracy but also check that the needle remains straight during the insertion and that the repositioning error of abdominal structures at expiratory phases is negligible. To perform these three studies, we realize the three following evaluation processes: Evaluation of the liver repositioning error (cf. Fig. 2) We extract the spine, liver and skin in both CT1 and CT2. Then, the spine from CT2 is rigidly registered on the spine in CT1 with the Iterative Closest Point
80
S.A. Nicolau et al.
algorithm and the computed transformation is applied to liver and skin from CT2. This registration allows us to compare the relative movement of liver and skin with respect to a common rigid structure. To quantify these movements, we compute the distance between the liver (resp. skin) surface in CT1 with the liver (resp. skin) surface extracted from CT2 and registered in CT1.
Fig. 2. To evaluate the repositioning error of liver and skin we firstly register the spines from CT1 and CT2. Then we apply the found rigid transformation to liver and skin surfaces and measure the distance between both surfaces. First half
Second half
Evaluation of the needle curvature (Fig. 3) alpha The needle in CT2 is extracted and we estimate Needle deflection orientations of the first and second half of its length. Then, we compare both orientations using the angular deviation α and the needle deFig. 3. Evaluation of the needle flection. curvature
Evaluation of the system accuracy (cf. Fig. 4) Liver and needle surfaces are extracted from CT2. The liver surface in CT2 is rigidly registered (using ICP) on the liver surface in CT1 and the computed transformation is applied to the needle extracted in CT2. This registration provides the final needle position in the CT1 image. Then, we register the patient model from CT1 (with the needle) in the camera reference frame using the video image of the patient at full expiration. Finally, we evaluate the euclidean distance between the needle tip tracked by the camera (at expiration phase) and the needle tip in CT1 registered in the camera frame. We call this distance system accuracy and emphasize it is an evaluation of the cumulated errors of the needle tracking, the patient model registration and the organ repositioning. It does not include the guidance error (defined in Sec. 2). Consequently, our experimental protocol allows us to evaluate all the error sources that only depend on the system and not on practitioner ability 1 . Alternatively, the measured error corresponds to the final system error if the needle insertion is robotized (in that case the guidance error is negligible). 1
In fact, we measure a slight over-estimation of the system error: the needle registration from CT2 to CT1 is not perfect (we check its high accuracy in Sec. 4).
Clinical Evaluation of a Respiratory Gated AR System for Liver Punctures
81
Fig. 4. Illustration of the passive protocol to evaluate the system accuracy
4
Evaluation of the System on Six Clinical Cases
Six patients (5 males and 1 female, age between 50 and 60) have participated in our experiments (they signed an agreement form). They all had tumors the diagnosis of which led to a RF thermal ablation. Resolution of CT images was 1 × 1 × 2 mm3 . Below are presented the results obtained for the three experimental evaluations described in the previous section. Verification of the needle rigidity assumption. One can see in Tab. 1 that the needle deflection is not negligible in 30% of cases as it can reach 2.5 mm. Since the system assumes that the needle remains straight, the needle tip position provided by the system is systematically biased when there is an important deflection. Visual illustrations of a deflection are provided on Fig. 5. We are aware the practitioner sometimes bends the needle on purpose to avoid a critical structure. For all the reported cases, the practitioner estimated that this was Fig. 5. Lateral and not the case. Consequently, we have measured here the axial views of the needle (patient 2) uncontrollable bending of the needle. Table 1. Left table: Evaluation of the needle curvature after its positioning in the patient. Right table: Distance between the registered surfaces of spine, liver and skin.
Patient Patient Patient Patient Patient Patient
1 2 3 4 5 6
angular deviation needle alpha (o ) deflection (mm) 1.0 0.85 2.8 2.5 0.5 0.4 0.6 0.5 1.1 1.0 1.8 1.82
d(S1 ,S2 ) in mm Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6 Average
Spine 0.8 0.8 0.9 1.1 0.9 1.2 0.95
Liver 1.5 1.2 1.4 1.5 1.7 1.82 1.6
Skin 1.6 1.8 1.9 3.2 1.8 1.7 2.0
82
S.A. Nicolau et al.
Evaluation of the organ repositioning error. To quantify the distance between two registered surfaces S1 and S2 , we compute the average of the distance between each point on Si to the surface Sj : 2 2 Mi ∈S1 d(Mi , S2 ) + Pi ∈S2 d(Pi , S1 ) d(S1 , S2 ) = 2 · (card(S1 ) + card(S2 )) where the distance d(Mi , S) between a point Mi and a surface S is interpolated from the 3 closest points of Mi belonging to S. One can see in Tab. 1 (right) that the distance between liver surfaces is within 2 mm for each patient which is of the same magnitude as the segmentation uncertainty. To check that the measured distances are not due to a pure translation, we display the relative position of both surfaces. Fig. 6 (left columns) shows clearly that surfaces are closely interlaced for all patients. This means that the observed distance is essentially due to the segmentation error in the CT acquisitions and that the repositioning error of the liver is about 1 mm.
Fig. 6. Visual check of liver and skin repositioning errors on 3 patients (resp. left and right columns) . Two opposite views of registered surfaces are provided for each patient.
Oddly, distances between skin surfaces are not very low for each patient. A visual check (see right columns in Fig. 6) of registered surfaces shows that for these patients the skin of the lower part of the abdomen has moved between the two CT acquisitions. An inspection of both CT indicates that a gas movement in the stomach and bowels was responsible for this deformation. We highlight that this skin deformation highly disturbs the system if we take the radio-opaque markers on the deformed zone into account to compute the patient model registration. Indeed, the system implicitly assumes that the relative position of the liver w.r.t. the markers remains rigid during the intervention. Consequently, the skin deformation can lead to a wrong estimation of the liver position.
Clinical Evaluation of a Respiratory Gated AR System for Liver Punctures
83
We notice that this phenomenon of gas movement essentially happened when the practitioner used the US probe. This means that the system should be carefully used if a US probe is manipulated. To avoid this problem, we can position the radio-opaque markers on the upper part of the abdomen only, which is a zone not influenced by gas movements. In the following we did not use radio-opaque markers on a deformed zone to evaluate the system accuracy. Influence of breathing on the system accuracy. During the 6 interventions, the needle and the patient were video tracked along several breathing cycles. In Tab. 2, one can read for each patient the system accuracy, the 3D/2D reprojection error of CT markers in the video images and the 3D/3D registration error between CT markers and markers reconstructed from video images. These values were averaged on expiratory phases that were video recorded. Additionally, we report in Fig. 7 a sample for patients 1 and 2 during 4 breathing cycles of the system accuracy, the 3D/2D and the 3D/3D registration errors.
Fig. 7. Sample of system accuracy and registration errors reported during several breathing cycles with patients 1 and 2
Results in Tab. 2 indicate clearly that for all patients the system accuracy during expiratory phases is between 4 and 5 mm. The two worst results have been obtained for patients whose abdominal zone has been deformed between the preoperative and control CT acquisitions. Indeed, in those cases, less markers could be used to compute the patient model registration. Note that including markers that had moved between the CT acquisitions in the registration computation leads to much worse accuracy (above 1 cm). For patients whose needle was bent, we have evaluated the system accuracy after having taken the observed curvature into account. This showed that if the rigidity assumption of the needle was true the system accuracy would be slightly better (about 0.5 mm). One can see in Fig.7 that RMS errors evolve cyclically, as expected, and are always minimal in expiration phases. The system accuracy also evolves cyclically but is not always minimal in expiration phases. Indeed, since the patient model registration is not perfect, the system can register the needle extracted from the CT at a position that corresponds to an intermediate phase of the breathing cycle (whereas it should be registered at expiratory position).
84
S.A. Nicolau et al.
Table 2. Average for each patient of the system error, 3D/2D and 3D/3D registration errors during expiration phases. The system provides an average guiding information during expiratory phases with an accuracy below 5 mm. Values in brackets correspond to the results obtained when the markers on an abdominal zone deformed by gas motion are used for the patient model registration. Values in square brackets correspond to the system accuracy re-evaluated after a compensation of the important needle curvature (only for patients 2 and 6).
Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6 Average
5
Number of RMS 3D/2D RMS 3D/3D System markers used (pixel) (mm) accuracy (mm) 15 1.3 1.5 4.0 13 1.0 1.7 4.2 [3.5] 6(15) 1.2 (2.2) 1.4 (2.5) 5.2 (14.5) 12 1.5 1.2 4.1 8(13) 0.9 (2.0) 1.5 (2.4) 4.8 (12.3) 14 1.2 1.2 4.3 [3.9] 11.5 1.18 1.44 4.3
Conclusion
We have developed a computer system to guide liver percutaneous punctures in interventional radiology. To tackle the breathing motion issue that induces a movement of the liver (above 1 cm), we propose to use a respiratory gating technique. We synchronize the preoperative CT acquisition and the guidance step with the expiratory phases of the patient. Then, we assume pseudo static conditions and rigidly register the patient model. To assess rigorously the system accuracy and the pseudo static assumption on real patients, we propose a passive protocol on carefully chosen patients that allows us to obtain a ground truth CT at the end of the needle insertion. Experimental results show firstly that the liver repositioning error is about 1 mm whereas it is sometimes much more important for the skin because of gas movement in the bowels. This phenomenon can dramatically decrease the system accuracy if markers on the deformed zone are used to compute the patient model registration. Therefore, to avoid this problem, markers have to be positioned only around the ribs. Secondly, we have evaluated that the needle curvature can cause a needle tracking error above 2 mm (although the practitioner thought it was not bent). Despite these uncertainties, we have finally showed that our system accuracy during the patient expiratory phases is about 4.5 mm, which fits the medical requirements. We investigate now the integration of an electromagnetic tracker in the current system so that we will be able to directly track the needle tip (although this is still a challenge due to ferromagnetic object presence in the operating room). Last but not least, a validation step including the needle manipulation by the practitioner is planned for next year.
Clinical Evaluation of a Respiratory Gated AR System for Liver Punctures
85
References 1. Balter, J.M., et al.: Improvement of CT-based treatment-planning models of abdominals targets using static exhale imaging. IJROBP 41(4), 939–943 (1998) 2. Dawson, L., et al.: The reproducibility of organ position using active breathing control (ABC) during liver radiotherap. IJROBP 51, 1410–1421 (2001) 3. Fichtinger, G., et al.: Image overlay guidance for needle insertion in ct scanner. IEEE Transaction on Biomedical Engineering 52(8), 1415–1424 (2005) 4. Hostettler, A., Nicolau, S.A., Forest, C., Soler, L., Remond, Y.: Real time simulation of organ motions induced by breathing: First evaluation on patient data. In: Harders, M., Sz´ekely, G. (eds.) ISBMS 2006. LNCS, vol. 4072, pp. 9–18. Springer, Heidelberg (2006) 5. Mitschke, M., et al.: Interventions under video-augmented X-ray guidance: Application to needle placement. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 858–868. Springer, Heidelberg (2000) 6. Nicolau, S., et al.: An augmented reality system to guide radio-frequency tumor ablation. Jour. of Computer Animation and Virtual World 16(1), 1–10 (2005) 7. Nicolau, S., Pennec, X., Soler, L., Ayache, N.: An accuracy certified augmented reality system for therapy guidance. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 79–91. Springer, Heidelberg (2004) 8. Nicolau, S., Schmid, J., Pennec, X., Soler, L., Ayache, N.: An augmented reality & virtuality interface for a puncture guidance system: Design and validation on an abdominal phantom. In: Yang, G.-Z., Jiang, T. (eds.) MIAR 2004. LNCS, vol. 3150, pp. 302–310. Springer, Heidelberg (2004) 9. Nicolau, S.A., Pennec, X., Soler, L., Ayache, N.: A complete augmented reality guidance system for liver punctures: First clinical evaluation. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 539–547. Springer, Heidelberg (2005) 10. Patriciu, A., et al.: Robotic assisted radio-frequency ablation of liver tumors:a randomized patient study. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 526–533. Springer, Heidelberg (2005) 11. Pereira, P.L.: Actual role of radiofrequency ablation of liver metastase. In: European Radiolology (February 15, 2007) 12. Remouchamps, V., et al.: Significant reductions in heart and lung doses using deep inspiration breath hold with active breathing control and intensity-modulated radiation therapy for patients treated with locoregional breast irradiation. IJROBP 55, 392–406 (2003) 13. Soler, L., et al.: Fully automatic anatomical, pathological, and functional segmentation from CT scans for hepatic surgery. Computer Aided Surgery 6(3) (2001) 14. Wacker, F., et al.: An augmented reality system for MR image-guided needle biopsy: Initial results in a swine model. Radiology 238(2), 497–504 (2006) 15. Wagman, R., et al.: Respiratory gating for liver tumors: use in dose escalation. Int. J. Radiation Oncology Biol. Phys. 55(3), 659–668 (2003) 16. Wong, J., et al.: The use of active breathing control (ABC) to reduce margin for breathing motion. Int. J. Radiation Oncology Biol. Phys. 44(4), 911–919 (1999)
Rapid Voxel Classification Methodology for Interactive 3D Medical Image Visualization Qi Zhang, Roy Eagleson, and Terry M. Peters Imaging Research Laboratories, Robarts Research Institute, Biomedical Engineering, University of Western Ontario, London, Ontario, N6A 5K8, Canada {qzhang, eagleson, tpeters}@imaging.robarts.ca Abstract. In many medical imaging scenarios, real-time high-quality anatomical data visualization and interaction is important to the physician for meaningful diagnosis 3D medical data and get timely feedback. Unfortunately, it is still difficult to achieve an optimized balance between real-time artifact-free medical image volume rendering and interactive data classification. In this paper, we present a new segment-based post color-attenuated classification algorithm to address this problem. In addition, we apply an efficient numerical integration computation technique and take advantage of the symmetric storage format of the color lookup table generation matrix. When implemented within our GPUbased volume raycasting system, the new classification technique is about 100 times faster than the unaccelerated pre-integrated classification approach, while achieving the similar or even superior quality volume rendered image. In addition, we propose an objective measure of artifacts in rendered medical image based on high-frequency spatial image content.
1
Introduction
The dramatically increased capabilities of computerized radiology equipment, such as CT or MRI, have made 3D and 4D (space and time) medical images ubiquitous in surgical planning, diagnosis and therapy. Direct volume rendering has proven to be an effective and flexible method for clinical dataset visualization [1]. However, in practice, the lack of an efficient strategy to map the scalar voxel value to the appropriate optical properties limits its wide applications in medicine [2]. The transfer function (TF) provides the mapping from scalar values to emitted radiant colors and extinction coefficients, thus rendering the scalar data visible and can isolate specific features in the rendered medical data. According to the sampling theorem, a continuous signal can be correctly reconstructed from its values at discrete sampling points with a sampling rate higher than the Nyquist frequency. The TF can either be applied directly to the discrete voxel points before the data interpolation step, or alternatively to the sampling points derived from the interpolation of the nearby voxel points, i.e., after the data interpolation. These processes are called pre- and post-classification respectively [3]. However, both algorithms introduce high spatial frequency artifacts N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 86–93, 2007. c Springer-Verlag Berlin Heidelberg 2007
Rapid Voxel Classification Methodology
0
1
sf
< < <
255
0
sb
1
<<<
d
1
<<<
(a)
87
255
(b)
(c)
Fig. 1. The schematic of two numerical integration computation methods and the corresponding volume rendered clipped cardiac images: (a) two numerical integration calculation approaches with one marked approximating step; (b) volume rendered image with undersampling artifacts whose pre-integrated classification LUT is approximated using the middle Riemann sum with k set to 8; (c) artifact-free volume rendered image whose post color-attenuated classification LUT is approximated using the trapezoidal rule based recursive formula with the sampling step set to 1
into the original signal. Pre-classification (PreC) simply cuts off the high frequencies, making the rendered image fuzzy, while the post-classification (PosC) takes them into account, resulting in observable undersampling artifacts in the final images. To address this problem, R¨ ottger et al. [4] and Engel et al. [5] proposed a pre-integrated classification (PIC) algorithm, which renders the volume slab-by-slab, instead of slice-by-slice. Because of the high computational cost in the color lookup table (LUT) generation, Engel et al. also proposed a hardware acceleration technique for a texture-mapping based volume rendering [5]. However, this dependent texture based acceleration approach is not suitable for our raycasting system that is implemented on a graphics processing unit (GPU) [6], which usually produces superior images to those generated by texture mapping. Moreland and Angel [7] presented a partial PIC method to accelerate the unstructured data classification. However, this algorithm needs to load a very large integration table into the graphics memory during the raycasting process, which seriously degrades the rendering efficiency.
2 2.1
Method Pre- and Post-classification
In our application, the classification is implemented through a TF editor, which creates a LUT. The editor is designed graphically, allowing the user to interact with the four one-dimensional piecewise curves with the range [0, L] (L = 255) in the x-axis representing the scalar values, and [0, L] in the y-axis describing the mapped optical properties. In the process of PreC, each scalar value in the 3D volume is first cast to the range [0, L], and then mapped to the color and opacity by the LUT. Next, the classified data are loaded into the GPU fragment shader as a 3D texture, which is utilized in the raycasting computation. In the PosC application, the generated
88
Q. Zhang, R. Eagleson, and T.M. Peters
LUT is first loaded into the GPU as a 1D texture, and then the volumetric data are transformed to texture space and also loaded into the GPU as a 3D texture with a single alpha component storing the scalar value. At each sampling point along the casting ray on the fragment shader, a 3D texture lookup is employed to fetch the interpolated scalar value, and then a 1D texture lookup is applied to map it to color and opacity. 2.2
Pre-integrated Classification
In order to form an objective comparison to our new algorithm, we have characterized a generic pre-integrated classification method as follows. For two integers sf and sb (sf , sb ∈ [0, L]) on the x-axis of the TF editor, we calculate the integrated opacity associated color and opacity using Eqs. (1) and (2), where c(x) (x ∈ [sf , sb ]) is the non-associated color derived from the TF, and τ (x) is the corresponding extinction coefficient. All the possible combinations of sf and sb are calculated, resulting in a 2D LUT. As shown in the left part of Fig.1 (a), the middle Riemann sum is used to approximate the integrals expressed by Eqs. (1) and (2), and the sampling step d is set to |sb − sa |/n, where n = k + |sb − sa |. Based on our experiment, k should be usually chosen to be larger than 15 to ensure sufficient accuracy, and to avoid the undersampling artifacts, as shown in Fig.1(b). 1 τ ((1 − ζ)sf + ζsb )dζ (1) α(sf , sb ) ≈ 1 − exp − ˜ f , sb ) ≈ C(s 0
0
τ (1 − ζ)sf + ζsb )c((1 − ζ)sf + ζsb
1
× exp −
ζ
(2)
τ ((1 − ζ )sf + ζ sb )dζ dζ
0
The generated LUT is then loaded into the GPU fragment shader as a 2D texture for classification. Because no high frequency components are introduced by the nonlinear TF, the PIC has fewer undersampling artifacts and higher image quality than the PreC and PosC algorithms. However, since we need to reconstruct the LUT whenever the TF changes, it is difficult to perform classification interactively. 2.3
Post Color-Attenuated Classification
Considering the noninteractivity in the pre-integrated LUT generation, we propose a new post color attenuated classification (PCAC) algorithm to evaluate the segment optical properties rapidly. We first calculate the non-associated color integration for integers sf and sb on the x-axis of the TF editor using Eq.(3), where c(x) is the non-associated color as described in subsection 2.2. sb 1 sf 1 c((1 − ζ)sf + ζsb dζ = c(ω)dω − c(ω)dω (3) C(sf , sb ) ≈ sb − sf 0 0 0
Rapid Voxel Classification Methodology
(a: pre-class.)
(b: post-class.)
(c: pre-inte.)
89
(d: pos. col. att.)
Fig. 2. Image quality comparisons of four volume classification algorithms. The visualization engine is a raycasting volume renderer implemented on a GPU, and the data is a 3D cardiac MR image.
Next we calculate the corresponding opacity using an alternative form of α: sf
1 sb α(sf , sb ) ≈ 1 − exp − τ (ω)dω − τ (ω)dω (4) sb − sf 0 0 Using the procedures described in the following subsection 2.4 and 2.5, we can build a 2D LUT from Eqs. (3) and (4), and then load it into the GPU fragment shader as a 2D texture. During the raycasting process on the fragment shader, for every two consecutive sampling points x(i) and x(i + 1) on the casting ray x(t), first fetch the scalar values sf = s(x(i)) and sb = s(x(i + 1)) through 3D texture lookup, which are then utilized as texture coordinates to address the 2D LUT to get the non-associated color Ci = C(sf , sb ) and opacity αi = α(sf , sb ) of the segment x(i)x(i + 1). According to the model of optical light emission and absorption, when colored light passes through a segment, its intensity is attenuated because of the segment’s opacity. Now on the casting ray, we compute the color intensity attenuation segment by segment. Therefore, the opacity associated color of the segment x(i)x(i + 1) is calculated by the ˜ f , sb ) = αi (sf , sb )Ci (sf , sb ), and the volume rendering equation formula C˜i = C(s n i−1 ˜ i=0 Ci j=0 (1 − αj ) is utilized to compute the final rendered pixel. Because we calculate the color attenuation of the whole ray segment after determining the color accumulation within it, we refer to our algorithm as segment-based post color-attenuated classification. Our PCAC algorithm is different from the accelerating method proposed in [5], which uses c(ω)τ (ω) to submit the c(ω) in Eq.(3) to calculate the opacity ˜ f , sb ) of the segment x(i)x(i + 1), or directly employs the associated color C(s
90
Q. Zhang, R. Eagleson, and T.M. Peters
voxel’s associated color c˜(ω) to calculate the segment’s opacity associated color ˜ f , sb ). Here, the c(ω) is the non-associated color and τ (ω) is its extinction C(s coefficient, which are derived from the scalar value ω (ω ∈ [sf , sb ]) through the TF. This algorithm therefore computes the color attenuation voxel by voxel within the ray segment. However, this computed voxel-based associated color of the segment x(i)x(i + 1) cannot produce a correct rendering result when it is used in the above described volume rendering equation. 2.4
Efficient Numerical Integration Computation
From subsection 2.2, we note that the most time-consuming part in the preintegrated LUT generation is the numerical integration calculation, since for every different segment, its sampling step is changed. Therefore, when we compute a new integration, the previous results cannot be used. As demonstrated by Fig. 1(a), when estimating a nonlinear TF, a piecewise linear function is usually more accurate than a piecewise constant function. Here, we propose a recursive formula for efficient and accurate calculation of the numerical integrals. Each channel of our TF can be considered as a continuous function f (x) passing through the evenly separated points f (0), f (1), · · ·, f (L). We use the trapezoidal rule, a linear approximation of the Simpson’s rule, to approximately calculate the definite integral from i to i + 1 (i = 0, 1, · · ·, L − 1).
i+1 1 1 1 f (x)dx ≈ f (i) + 4f i + + f (i + 1) ≈ f (i) + f (i + 1) (5) 6 2 2 i When |f (x)|≤K for all x∈[i, i + 1], the approximating error is less than K/12. Since each channel our TF is designed to be smooth between every two consecutive sampling points, we can therefore ensure that the K is sufficiently
N small. The integral Φ(N )= 0 f (x)dx (N = 0, 1,· · · L) can then be evaluated with the following recursive formula:
N 1 1 Φ(N ) ≈ f (i) + f (i + 1) = Φ(N − 1) + f (N ) + f (N + 1) (6) 2 i=0 2 Here, f (L + 1) = 0, and when N = 0, Φ(N − 1) is equal to zero. We can see that in our integral calculation, only L+1 calculations are needed to get all the integrals required by Eqs.(3) and (4) for the LUT generation. 2.5
Symmetric Matrix Filling
Since the two endpoints of every segment are symmetric, we only need to compute the numerical integrals of C(sf , sb ) and α(sf , sb ) with sf ≤ sb using Eqs.(3) s s and (4), in which the color and opacity integrals 0 c(x)dx and 0 τ (x)dx are calculated using Eq.(6). This approach creates a symmetric matrix Mi,j,k (i, j = 0, · · ·, L, k = 0, · · ·, 3) for color and opacity mapping, and we fill the entries M (i, j, k) and M (j, i, k) synchronously with the same value, reducing the matrix generation time. This matrix is then loaded into the GPU fragment shader as a 2D texture LUT for scalar value classification.
Rapid Voxel Classification Methodology
91
Table 1. Performance comparisons of volume rendering (Rend. - frames per second) and classification (Clas. - millisecond per classification) using five medical datasets. Vol.
Dataset
Voxel Size
Pre-Class. Post-Class. Pre-Integ. P. Col. Att.
No.
Dimension
(mm)
Rend. Clas. Rend. Clas. Rend. Clas. Rend. Clas.
D1 (115, 170, 75) (1.48, 1.48, 1.50) 54.2
60
50.0 0.05 54.8
420
55.5
4.1
D2 (160, 144, 208) (0.12, 0.12, 0.08) 17.3
132
16.3 0.05 18.1
420
17.9
4.1
D3 (200, 200, 281) (1.00, 1.00, 1.00) 38.2 D4 (181, 217, 181) (1.00, 1.00, 1.00) 26.8
280 195
37.1 0.05 39.1 25.0 0.05 28.4
420 420
39.5 28.6
4.1 4.1
D5 (394, 394, 181) (0.50, 0.50, 0.50) 43.0
685
49.5 0.05 54.2
420
54.5
4.1
3
Experimental Results
We have implemented the new classification algorithm in our medical image visualization system that comprises a Pentium IV 3.2GHz CPU, an Nvidia GeForce 7900 GTO GPU, and 2GB main memory, using C++, OpenGL, OpenGL Shading Language (GLSL) and Qt. 3.1
Performance Comparisons
We employed five datasets from three different imaging modalities to test the algorithm performance (Table 1). D1 is a MR volume of human heart (Fig. 2), D2 is a cardiac 3D US of the same subject as D1 , D3 is a CT of a human heart within the thorax, D4 is a CT brain and D5 is a CT of a pig heart (Fig. 3). As illustrated in Table 1, our PCAC algorithm takes 4.1 ms for per volume classification, about 100 times faster than the PIC method. The PosC takes the shortest time, since it only needs to calculate a 1D LUT and transfer it to the GPU as a 1D texture. Unlike the other algorithms, which take the same time to classify different dataset, the PreC requires a classification time that is almost directly proportional to the volume size, since it needs to classify each voxel and update the whole volume to the GPU. Table 1 also compares the performance of different classification algorithms on the same volume renderer, from which we note that our PCAC algorithm has almost the same efficiency as the PIC, and about 10% faster than the PreC and PosC. Usually the average speed difference is less than 15% of the four algorithms, depending on the rendered dataset. We believe that this is because some classifications map the same scalar value to a higher opacity than the others, and the application of the early ray termination in our system causes the casting rays to terminate earlier in higher opacity volumes than those in lower opacities. 3.2
Image Quality Evaluation
We evaluate the quality of the rendered images both subjectively and objectively. The image qualities are compared in an image-guided cardiac surgery planning
92
Q. Zhang, R. Eagleson, and T.M. Peters
Fig. 3. Volumetric medical images rendered with the GPU-based raycasting system, utilizing the post color-attenuated classification algorithm Image Power Spectrum (II)
Image Power Spectrum (I) SUHíFODVVLILFDWLRQ
50
40
30
20
SRVWíFODVVLILFDWLRQ SUHíLQWHJUDWLRQ post color attenunation
250
200
150
100
50
10
0
SUHíFODVVLILFDWLRQ
300
SRVWíFODVVLILFDWLRQ SUHíLQWHJUDWLRQ post color attenunation
Relative Image Power
Relative Image Power
60
0.03
0.04
0.05
0.06
0.07
0.08
Spatial Frequency (cycles/pixel)
0.09
0.1
0
0.02
0.04
0.06
0.08
0.1
0.12
Spatial Frequency (cycles/pixel)
Fig. 4. The radial plots noise power spectrums of four sets (20 images) of images: (I) specific frequency range description; (II) large frequency range illustration
process, during which one experienced cardiologist and two cardiac imaging specialists interactively adjust the transfer function to capture the target anatomical structure separately. As shown in Fig. 2, they give the images in the column (c) and (d) the highest rank based on the facility, clarity and accuracy of the target structure navigation, while rating the images in the column (a) the lowest. To quantitatively assess the visual performance, we employed noise power spectrum (NPS) analysis [8] to quantify the artifacts. Using a standard sphere as a test object, we acquired four image sets, each generated with different classification algorithms and including twenty random views of the volume-rendered image. We then computed the magnitude of the mean image power spectrum (IPS) of each image set. Since the structure in the ideal image is contained mainly in the low spatial frequency (SF) components (< 0.01 cycles/pixel), components in the higher frequency region reveal the SF of the introduced artifacts. In Fig. 4, the 2D spectral images have been reduced to radial plots of the average IPS in an annulus at specific SF (distance) from the center (zero frequency). When the SF is lower than 0.052 cycles/pixel, the PreC has the highest noise power, while the PosC has the second highest. When the SF is higher than this threshold, the PosC has the highest artifact energy. Throughout the entire SF range, our classification algorithm and the PIC have similar low noise performance, while our algorithm generated ∼10% lower artifact levels than the PIC. Therefore, when applied in the same raycasting based medical image volume rendering system,
Rapid Voxel Classification Methodology
93
the PreC introduces significant low frequency noise, while the PosC creates high frequency noise, and our PCAC algorithm introduces the lowest noise along the entire frequency range, producing the highest rendered image quality.
4
Conclusions
In this paper, we present a new post color-attenuated volume classification algorithm, which calculates the color attenuation on a segment basis, and has been implemented in our GPU-based medical image real-time 3D visualization system for interactive volume classification. We also introduce an efficient numerical integration approximation technique to accelerate the LUT generation, during which the symmetric data storage format of the optical mapping matrix is utilized. Besides high rendered image quality, our algorithm takes approximately 4 milliseconds, about 100 times faster than an unaccelerated pre-integrated classification approach, making this technique compatible with screen rate realtime constants. We believe our new interactive classification and real-time volume rendering techniques will facilitate the interactive visualization of the medical image datasets in both diagnostic and therapeutic applications.
References 1. B¨ uhler, K., Neubauer, A., Hadwiger, M., Wolsfberger, S., Wegenkittl, R.: Interactive 3D techniques for computer aided diagnosis and surgery simulation tools. In: Hruby, W. (ed.) Digital Revolution in Radiology - Bridging the Future of Health Care, 2nd edn., Springer, Heidelberg (2005) 2. Kniss, J., Kindlmann, G., Hansen, C.: Multidimensional transfer functions for interactive volume rendering. IEEE Transactions on Visualization and Computer Graphics 8(3), 270–285 (2002) 3. Rezk-Salama, C.: Volume Rendering Techniques for General Purpose Graphics Hardware. PhD thesis, University of Siegen, Germany (2001) 4. R¨ ottger, S., Kraus, M., Ertl, T.: Hardware-accelerated volume and isosurface rendering based on cell-projection. In: Proceedings of IEEE Visualization 2000, pp. 109–116. IEEE Computer Society Press, Los Alamitos (2000) 5. Engel, K., Kraus, M., Ertl, T.: High-quality pre-integrated volume rendering using hardware-accelerated pixel shading. In: HWWS 2001. Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, pp. 9–16. ACM Press, New York, NY, USA (2001) 6. Zhang, Q., Eagleson, R., Peters, T.M.: Real-time visualization of 4D cardiac MR images using graphics processing units. In: Biomedical Imaging: Macro to Nano, ISBI 2006. 3rd IEEE International Symposium, pp. 343–346. IEEE Computer Society Press, Los Alamitos (2006) 7. Moreland, K., Angel, E.: A fast high accuracy volume renderer for unstructured data. In: VolVis, pp. 9–16 (2004) 8. Siewerdsen, J., Antonuk, L., el-Mohri, Y., Yorkston, J., Huang, W., Cunningham, I.: Signal, noise power spectrum, and detective quantum efficiency of indirect-detection flat-panel imagers for diagnostic radiology. Medical Physics 25(5), 614–628 (1998)
Towards Subject-Specific Models of the Dynamic Heart for Image-Guided Mitral Valve Surgery Cristian A. Linte1,2 , Marcin Wierzbicki2 , John Moore2 , Stephen H. Little3,4 , G´erard M. Guiraudon1,4 , and Terry M. Peters1,2 1
Biomedical Engineering Graduate Program, University of Western Ontario 2 Imaging Research Laboratories, Robarts Research Institute 3 Division of Cardiology, University of Western Ontario 4 Canadian Surgical Technologies & Advanced Robotics London ON Canada {clinte,mwierz,jmoore,tpeters}@imaging.robarts.ca
Abstract. Surgeons need a robust interventional system capable of providing reliable, real-time information regarding the position and orientation of the surgical targets and tools to compensate for the lack of direct vision and to enhance manipulation of intracardiac targets during minimally-invasive, off-pump cardiac interventions. In this paper, we describe a novel method for creating dynamic, pre-operative, subjectspecific cardiac models containing the surgical targets and surrounding anatomy, and how they are used to augment the intra-operative virtual environment for guidance of valvular interventions. The accuracy of these pre-operative models was established by comparing the target registration error between the mitral valve annulus characterized in the pre-operative images and their equivalent structures manually extracted from 3D US data. On average, the mitral valve annulus was extracted with a 3.1 mm error across all cardiac phases. In addition, we also propose a method for registering the pre-operative models into the intra-operative virtual environment.
1
Introduction
Minimally-invasive cardiac procedures can potentially reduce complications arising from surgical interventions by minimizing the size of the incision required to access the heart, while employing medical imaging to visualize intracardiac targets without direct vision [1,2]. Although less invasive, these procedures often involve myocardial arrest and use of cardiopulmonary bypass, which contribute to patient morbidity [3]. To further reduce invasiveness, techniques have been developed to perform interventions on the beating-heart [4,5]. This paper focuses specifically on mitral valve surgery. As part of an ongoing project, we have developed a robust interventional system to allow the surgeon to safely access cardiac chambers in the beating heart [6], as well as to visualize and deliver therapy to the mitral valve without compromising the quality of the procedure. The navigation system integrates trans-esophageal echocardiography (TEE) for real-time intracardiac visualization, a virtual reality (VR) N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 94–101, 2007. c Springer-Verlag Berlin Heidelberg 2007
Towards Subject-Specific Models of the Dynamic Heart
95
environment [7] that augments the TEE with pre-operative subject images and geometric models, and surgical instruments tracked in real time using a magnetic tracking system (MTS) [8]. The guidance of beating-heart mitral valve replacement requires detailed information regarding the dynamic behaviour of the valve annulus and surrounding anatomy. This information is not clearly portrayed by the TEE images, as 2D ultrasound (US) images lack anatomical context and adequate representation of the surgical tools. Although 3D TEE US might become a potential future solution to this problem, its narrow field of view may impose further challenges in visualizing the tools and target in the same volume. To address these limitations, we include pre-operative, patient-specific models derived from MR images, which incorporate a dynamic representation of the gross cardiac anatomy (eg. myocardium) and surgical target (mitral valve annulus — MVA) within the intra-operative VR environment. As a result, the intra-operative TEE information can be interpreted within a rich, high-quality 3D context [9] for improved procedure planning and navigation of surgical tools, while on-target positioning and detailed manipulations are performed under realtime US guidance. To better illustrate our research goals, we follow the patient through the proposed mitral valve procedure work-flow. First, pre-operative cine MR images of the subject are acquired. During procedure planning, a prior high-quality heart model containing the MVA and surrounding anatomy is registered to the subjectspecific image dataset, to obtain a pre-operative, subject-specific cardiac model. After accessing the heart, the intra-operative MVA is defined interactively using tracked 2D TEE, and displayed within the intra-operative subject space. To facilitate tool navigation and improve spatial orientation, the 2D US images are complemented by 3D anatomical models obtained from pre-operative MR images, and combined with virtual representations of the tracked surgical tools. The pre-operative cardiac models are incorporated within the intraoperative space by aligning the pre-operative and intra-operative annuli using registration.
2
Methodology
Our objective was to assesses the accuracy of integrating pre-operative, subjectspecific cardiac models as part of the clinical routine. We used a prior highresolution heart model constructed from multiple-subject 4D MRI datasets [10] to segment the surgical target (MVA) and other relevant cardiac anatomy, by registering the prior model to a subject-specific pre-operative MR image dataset. Note that this latter image dataset was not employed in the construction of the prior model. The resulting subject-specific MVA was then compared to its true location, obtained by manual segmentation of 3D full-volume US images acquired throughout the cardiac cycle. The accuracy with which the MVA could be identified using the prior model was assessed by quantifying the target registration error (TRE) between the model-predicted annuli, and those extracted manually from US images.
96
2.1
C.A. Linte et al.
Prior High-Resolution Heart Model
MRI is often considered the gold-standard modality for cardiac imaging, given its superior soft tissue contrast and 4D imaging capabilities. However, clinical MR images may exhibit low spatial resolution, low signal-to-noise ratio (SNR), and motion artifacts. Consequently, surgical targets extracted directly from these clinical images may not be sufficiently accurate for the planning and guidance of the proposed mitral valve surgery. To address this concern, we used a high-quality prior heart model to characterize the surgical targets in the low-quality subject images. This model consisted of image and geometry components, and was built from low-resolution MR images of 10 subjects (6 mm slice thickness). Various anatomical features (i.e. left ventricular myocardium, right ventricle and atrium, etc.) were manually segmented from each image and the resulting data was then co-registered into a common high-resolution reference image (1.5 mm slice thickness) [11]. The image component of the prior model represents a measure of the MR appearance of the heart, and was obtained by performing a principal component analysis on the co-registered data. The geometry component consists of the previously described anatomical features segmented in the reference image, and corrected to fit the average shape of the population [10]. The prior model is used to segment anatomical features from clinical quality MR data, by fitting the image and geometry components simultaneously to a subject-specific dynamic image dataset. A similar approach was undertaken by Lorenzo-Vald`es et al. [12], who constructed and segmented an average heart model based on population images, and registered it to target images to automate segmentation. Our final models specific to the left ventricular myocardium (LV), left atrium (LA) and right atrium and ventricle (RAV) were previously shown to be accurate within 5.4 ± 0.8 mm [10], despite the low resolution of the subject data. For this work, we expanded the prior model by including the MVA as part of its geometry (Fig. 1a). 2.2
Image Acquisition
MR Imaging. Clinically feasible coronal images of one healthy subject were acquired using a 1.5T CVi scanner (GE Medical Systems, Milwaukee, USA). The imaging protocol employed an ECG-gated gradient echo pulse sequence, a 256 × 128 image matrix, two signal averages (NEX), 20◦ flip angle, 7.6 ms TR, and 4.2 ms TE. The dataset consisted of 20 3D images throughout the cardiac cycle, with an in-plane resolution of 1.5 × 1.5 mm2 , 6.0 mm slice thickness. To minimize breathing artifacts, 20 sec breath-holds were employed during the acquisition of each slice, for a total scan duration of ∼ 20 min. US Imaging. 3D US images of the same subject were acquired throughout the cardiac cycle on a Philips SONOS 7500 scanner. Full-volume apical images of the heart were acquired with a 19 Hz frame-rate and a 14 cm depth-of-focus, with the subject in the left lateral decubitus position. Breath-holds of 5-10 sec were employed to minimize artifacts due to respiratory motion.
Towards Subject-Specific Models of the Dynamic Heart
2.3
97
Anatomical Feature Extraction
MR Image Segmentation. In addition to the anatomical features already present, we augmented the prior model introduced in section 2.1 with a representation of our surgical target (MVA). The annulus was segmented manually under the assistance of an experienced cardiologist, by interactively selecting points on the 3D image of the prior model depicting the heart at mid-diastole (MD). We employed a custom-developed spline-based segmentation tool similar to that available for clinical application within the TomTec 4D MV Assessment Software (Unterschleissheim, Germany). The new prior model consisted of a high-quality image component, gross cardiac anatomy, and MVA (Fig. 1a).
Fig. 1. a) Prior high-resolution cardiac model at MD, containing segmented LV, LA, RAV, and MVA; b) Clinical quality subject MR image at MD; c) Clinical quality subject MR image segmented using the prior model; d) Subject US image at MD showing the manually segmented MVA
This model was then registered to the low-quality, mid-diastole MR image of the subject (Fig. 1b). The initial model-to-subject registration was performed using an affine transformation, which was then refined using a non-rigid transformation, to account for the remaining morphological differences between the source and target images. Registration was achieved by maximizing the mutual information (MI) between the model and subject image, while ensuring the prior geometry remained consistent with user-selected points in the subject image [10]. The resulting subject-specific model at MD (Fig. 1c) was then animated over the cardiac cycle by non-rigidly registering the MD frame to the remaining frames in the 4D dataset, and using these transformations to deform the MD subject-specific model throughout the cardiac cycle [13]. US Image Segmentation. The segmentation of the 3D US volumes was performed manually under the guidance of an experienced echocardiographer. The same spline-based technique as for the MR model was employed to outline the MVA contour at various time points throughout the cardiac cycle (Fig. 1d). In addition, the LV geometry was manually segmented from the MD image, and used to drive the registration of US data to the MR image space. This latter
98
C.A. Linte et al.
US to MR transform consisted of aligning geometric features (i.e. left ventricular surface) predefined in both the source and target image using an affine transformation, followed by further refinement using a non-rigid transformation.
3
Results
Our main goal was to determine the accuracy of our model-based segmentation approach when predicting the location of dynamic surgical targets. The accuracy was assessed by computing the root-mean-squared (RMS) TRE between the model-based MVA, and the MVA extracted manually from 3D US images and registered to the MR image space. The TRE was quantified at four different time points in the cardiac cycle: end-diastole (ED), mid-systole (MS), end-systole (ES), and mid-diastole (MD). In addition, we also estimated the perimeter for both the model-extracted and gold-standard annuli at each of these cardiac frames. A summary of these results is presented in Table 1. Table 1. Mitral valve annulus perimeter and TRE values of the model-predicted MVA geometry and the gold-standard MVA, quantified at four phases throughout the cardiac cycle, using motion information extracted from both MR and US image datasets Cardiac MR-extracted Motion Phase Mean Perimeter (mm) TRE Model Gold-Std. (US) (mm) ED 119.9 111.4 2.6 MS 121.3 118.0 7.9 ES 118.5 117.4 10.5 MD 114.3 109.7 2.9
US-Extracted Motion Mean Perimeter (mm) TRE Model Gold-Std. (US) (mm) 114.4 111.4 2.6 113.0 118.0 3.5 113.5 117.4 3.3 114.3 109.7 2.9
According to these results, the perimeter of the model-based MVA was consistently within 4.8 % of that extracted from 3D US images, throughout all cardiac frames. Moreover, it was also observed that the annulus perimeter did not significantly change throughout the cardiac cycle. Nevertheless, a poor TRE of the two annuli sets was observed during the systolic phases, when the dynamic pre-operative surgical targets were obtained using motion information extracted from the 4D MR image dataset. These inconsistencies occurred predominantly on the mitral-aortic valve boundary. The main motion observed in this region of the MVA was caused by the systolic thrust, which is physiologically counteracted by the tension generated within the chordae tendinae by the papillary muscles. However, due to the limited information provided in this region by the thick-slice MR data, these intricate MVA motion patterns could not be correctly reconstructed using the MR images. On the other hand, 3D US images possess a much higher resolution, especially in the valve region, allowing for a clear identification of the valve leaflets. Therefore, we also employed the 4D US images to extract valvular motion using the same nonrigid image registration approach. As a result, the systolic TRE was significantly improved (Table 1 and (Fig. 2)).
Towards Subject-Specific Models of the Dynamic Heart
99
Fig. 2. Subject-specific MVA extracted manually from US images (black), model-based MVA animated using MR motion extraction (grey), and model-based MVA animated using US motion extraction (white). All annuli were registered to the MR space and displayed at four cardiac phases: a) ED; b) MS; c) ES; d) MD. Note how systolic inaccuracies caused by MR motion extraction (grey vs. black annuli) were significantly improved using US motion extraction (white vs. black annuli).
4
Discussion
This work constitutes the first steps in investigating the feasibility of employing pre-operative, subject-specific, dynamic models for enhancement of planning, and navigation of valvular procedures using a VR environment. Specifically, we determined the location of the surgical targets predicted by the pre-operative, subject-specific, dynamic models to be accurate within 3.1 mm with respect to their gold-standard location identified from 3D US images. This study clearly identified weaknesses regarding the extraction of accurate valvular motion patterns from clinically feasible MR images, and highlighted the need to acquire a set of full-volume 3D US images of the subject, to assist in building accurate pre-operative dynamic heart models. Our results were successful, despite the small variations in anatomical structures identified in the MR and US images, as well as any subject-specific physiological variations between the times at which the images were acquired. These cardiac models will be used to augment the surgical virtual environment during image-guided mitral valve interventions. While enhancing procedure visualization by complementing the intra-operative space with 3D anatomical context, these models constitute a significant navigation aid. According to our collaborating cardiac surgeon, a misalignment on the order of approximately 5 mm is tolerable, as these models will be used to facilitate the navigation of instruments towards the surgical targets, whereas on-target positioning and fine tuning will be performed under real-time US guidance. In addition, the relative accuracy of the tracked surgical tools is on the order of 1-2 mm [14], leading to an accurate virtual tool to US navigation. To demonstrate the usefulness of the pre-operative models, we performed a preliminary experiment in which we used these results to augment the intraoperative VR environment (Fig. 3). The registration consisted of aligning the
100
C.A. Linte et al.
Fig. 3. a) Surgical VR environment consisting of 2D US probe and image, valveguiding tool, and valve-fastening tool; b) Pre-operative subject-specific model displaying LV, LA, and RAV surfaces registered to the intra-operative space for visualization and navigation enhancement (note alignment of ventricular septum)
pre-operatively defined aortic valve annulus (AVA) and MVA with those identified intra-operatively, using a two-stage approach. First, we determined unit vectors normal to the pre- and intra-operative AVA and MVA. An initial alignment between the pre-operative and intra-operative annuli was obtained by minimizing the distance between both their centroids, as well as the tip of their corresponding unit vectors. In addition, the downhill simplex optimizer [13] was used to further minimize the distance between the two sets of annuli. Although the registration targets (AVA and MVA) are remote from the majority of the cardiac anatomy, this method nevertheless registers the pre-operative model to the patient in a manner sufficient to provide anatomical context for interpreting the US image.
5
Conclusions
In this paper we showed that accurate, pre-operative, dynamic representations of the surgical targets and surrounding anatomy can be generated and imported into our virtual surgical environment. We have demonstrated that our modelbased segmentation approach can successfully extract subject-specific dynamic representations of the mitral valve annulus throughout the cardiac cycle with a 3.1 mm accuracy. As part of our future research, we plan to include a larger number of subjects in this study, and to ultimately extend this work towards in vivo animal studies, to show how employing models to augment the intra-operative VR environment can enhance surgical navigation. Acknowledgments. The authors thank Dr. Daniel Bainbridge and Dr. Doug Jones for clinical consultation, and Dr. Usaf Aladl, Louis Estey, and Chris Wedlake for technical support. We also acknowledge funding for this work provided by the following Canadian agencies: NSERC, CIHR, ORDCF, OIT and CFI.
Towards Subject-Specific Models of the Dynamic Heart
101
References 1. Kypson, A.P., Felger, J.E., Nifong, L.W., Chitwood, W.R.: Robotics in valvular surgery: 2003 and beyond. Curr. Opin. Cardiol. 19, 128–133 (2004) 2. Vahanian, A., Acar, C.: Percutaneaous valve procedures: what is the future? Curr. Opin. Cardiol. 20, 100–106 (2005) 3. Edmunds, L.H.: Why cardiopulmonary bypass makes patients sick: strategies to control the blood-synthetic surface interface. Adv. Card. Surg. 6, 131–167 (1995) 4. McVeigh, E.R., Guttman, M.A., Lederman, R.J., Li, M., Hunt, T., Kozlov, S., Horvath, K.A.: Real-time interactive MRI-guided cardiac surgery: Aortic valve replacement using a direct apical approach. Magn. Reson. Med. 56, 958–964 (2006) 5. Suematsu, Y., Marx, G.R., Stoll, J.A., Dupont, P.E., Cleveland, R.O., Howe, R.D., Triedman, J.K., Mihaljevic, T., Mora, B.N., Savord, B.J., Salgo, I.S., del Nido, P.J.: Three-dimensional echo-guided beating-heart surgery without cardiopulmonary bypass: a feasibility study. J. Thorac. Cardiovasc. Surg. 128, 579–587 (2004) 6. Guiraudon, G.M.: Universal cardiac introducer. Patent Application, US 2005/0137609 A1, Appl. No. 10/736,786 (2005) 7. Vosburgh, K.G., San Jos´e Est´epar, R.: Natural orifice transluminal endoscopic surgery (notes): An opportunity for augmented reality guidance. In: Stud Health Technol. Inform., vol. 125, pp. 485–490 (2007) 8. Linte, C.A., Wiles, A.D., Moore, J., Wedlake, C., Guiraudon, G.M., Jones, D.L., Bainbridge, D., Peters, T.M.: An augmented reality environment for imageguidance of off-pump mitral valve implantation. In: SPIE. Medical Imaging: Visualization and Image-Guided Procedures, vol. 6509, pp. 65090N–12 (2007) 9. Sauer, F.: Image registration: Enabling technology for image-guided surgery and therapy. In: IEEE, 27th Annual Conference of the Engineering in Medicine and Biology Society — EMBS, pp. 7242–7245. IEEE Computer Society Press, Los Alamitos (2005) 10. Wierzbicki, M.: Subject-specific models of the heart from 4D images. PhD Dissertation, Medical Biophysics, The University of Western Ontario, Canada (2006) 11. Moore, J., Drangova, M., Wierzbicki, M., Barron, J., Peters, T.M.: A highresolution dynamic heart model based on averaged MRI data. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, pp. 549–555. Springer, Heidelberg (2003) 12. Lorenzo-Vald´es, M., Sanchez-Ortiz, G.I., Mohiaddin, D., Rueckert, D.: Atlas-based segmentation and tracking of 3D cardiac mr images using non-rigid registration. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 642–650. Springer, Heidelberg (2002) 13. Wierzbicki, M., Drangova, M., Guiraudon, G.M., Peters, T.M.: Validation of dynamic heart models obtained using non-linear registration for virtual reality training, planning, and guidance of minimally invasive cardiac surgeries. Med. Image. Anal. 8, 387–401 (2004) 14. Wiles, A.D., Guiraudon, G.M., Moore, J., Wedlake, C., Linte, C.A., Jones, D.L., Bainbridge, D., Peters, T.M.: Navigation accuracy for an intracardiac procedure using virtual reality-enhanced ultrasound. In: SPIE. Medical Imaging: Visualization and Image-Guided Procedures, vol. 6509, pp. 61410W–10 (2007)
pq-space Based Non-Photorealistic Rendering for Augmented Reality Mirna Lerotic, Adrian J. Chung, George Mylonas, and Guang-Zhong Yang Institute of Biomedical Engineering, Imperial College, London SW7 2AZ, UK {mirna.lerotic, a.chung, george.mylonas, g.z.yang}@imperial.ac.uk
Abstract. The increasing use of robotic assisted minimally invasive surgery (MIS) provides an ideal environment for using Augmented Reality (AR) for performing image guided surgery. Seamless synthesis of AR depends on a number of factors relating to the way in which virtual objects appear and visually interact with a real environment. Traditional overlaid AR approaches generally suffer from a loss of depth perception. This paper presents a new AR method for robotic assisted MIS, which uses a novel pq-space based nonphotorealistic rendering technique for providing see-through vision of the embedded virtual object whilst maintaining salient anatomical details of the exposed anatomical surface. Experimental results with both phantom and in vivo lung lobectomy data demonstrate the visual realism achieved for the proposed method and its accuracy in providing high fidelity AR depth perception.
1 Introduction Augmented reality (AR) is becoming a valuable tool in surgical procedures [1-3]. Providing real-time registered preoperative data during a surgical task removes the need to refer to off-line images and aids the registration of these to the real tissue. The visualization of the objects of interest becomes accessible through the see-through vision that AR provides. In recent years, medical robots are increasingly being used in Minimally Invasive Surgery (MIS). With robotic assisted MIS, dexterity is enhanced by microprocessor controlled mechanical wrists, allowing motion scaling for reducing gross hand movements and the performance of micro-scale tasks that are otherwise not possible. The unique operational setting of the surgical robot provides an ideal platform for enhancing the visual field with pre-operative/intra-operative images or computer generated graphics. The effectiveness and clinical benefit of AR has been well recognized in neuro and orthopedic surgeries. Its application to cardiothoracic or gastrointestinal surgery, however, remains limited as the complexity of tissue deformation imposes significant challenges to the AR display. Seamless synthesis of AR depends on a number of factors relating to the way in which virtual objects appear and visually interact with a real scene. One of the major problems in AR is the correct handling of occlusion. Although the handling of partial occlusion of the virtual and real environment can be achieved by accurate 3D reconstruction of the surgical scene, particularly with the advent of recent techniques for real-time 3D tissue deformation recovery [4], most surgical AR applications involve N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 102–109, 2007. © Springer-Verlag Berlin Heidelberg 2007
pq-space Based Non-Photorealistic Rendering for Augmented Reality
103
the superimposition of anatomical structures behind the exposed tissue surface. This, for example, is important for coronary bypass for which improved anatomical and functional visualization permits more accurate intra-operative navigation and vessel excision. In prostatectomy, 3D visualization of the surrounding anatomy can result in improved neurovascular bundle preservation and enhanced continence and potency rates. Whilst providing a good in plane reference in stereo vision environments, traditionally overlaid AR suffers from inaccurate depth perception. Even if the object is rendered at the correct depth, the brain perceives the object as floating above the surface [5, 6]. For objects to be perceived as embedded in the tissue, our brains expect some degree of occlusion. To address the problem of depth perception in AR, a number of rendering techniques and display strategies have developed to allow for accurate perception of 3D depth of the virtual structures with respect to the exposed tissue surface. A recent study has also compared the perceptual fidelity of different visualization techniques designed to overcome misleading depth perception cues [7]. The purpose of this paper is to investigate a new AR method for robotic assisted MIS. The method is based on pq-space based Non-Photorealistic Rendering (NPR) for providing a see through vision of the embedded virtual object whilst maintaining salient anatomical details of the exposed anatomical surface. Detailed user experiments demonstrate the effectiveness of the technique in making object appear embedded in the tissue with accurate depth perception due to accentuated ridges that “occlude” the object. This inversion of realism due to NPR has shown to correctly bring the perceptual focus to the object whilst maintaining the relative depth to the real environment.
2 Method The key task of the proposed technique is to render the exposed anatomical surface as a translucent layer while keeping sufficient details to aid navigation and depth cueing. To this end, surface geometry based on pq-space representation is first derived, where p and q represent the slope of the surface along the x, y axes, respectively. They can be solved with photometric stereo by introducing multiple lighting conditions. For deforming tissue, however, the problem is ill posed and the introduction of multiple light sources in vivo is not feasible. Nevertheless, the problem can be simplified for cases where both camera and a light source are near to the surface [8, 9], such as bronchoscopes and endoscopes. In such cases, the value of image intensity at coordinates x , y for a near point light source is given by
E (x , y ) =
s 0 ρ(x , y ) cos θ r2
(1)
where s 0 is the light source intensity constant, ρ(x, y ) is the albedo or reflection coefficient, r is the distance between the light source and the surface point (x , y, z ) , and θ is the angle between the incident light ray and the normal to the surface nˆ . In gradient space, the normal vector to the surface is equal to
104
M. Lerotic et al.
nˆ =
( p, q, − 1)
(2)
1 + p2 + q 2
where p and q represent surface slopes in directions x and y respectively. For a smooth Lambertian surface in the scene, image irradiance given by equation Eq. 1 can be reduced to 3
E (x , y ) = s 0 ρaverage
(1 − p0x − q 0y ) 1/ 2 3/2 2 Z 02 (1 − p0x 0 − q 0y 0 ) (1 + p02 + q 02 ) (1 + x 2 + y 2 )
(3)
and it defines the relationship between the image irradiance E (x , y ) at the point (x, y ) and scene radiance at the corresponding surface point (x 0Z 0 , y 0Z 0 , Z 0 ) with surface normal (p0 , q 0 , −1) , where ρaverage denotes the average albedo in a small neighborhood of the surface. By utilizing partial derivatives of the irradiance in equation Eq. 3, the following set of linear equations can be derived [9]
⎛ ⎞⎟ p0 1 ∂E x ⎜ ⎟⎟ = −3 ⎜⎜ + ⎜⎝ (1 − p0x − q 0y ) (1 + x 2 + y 2 )⎟⎠⎟ E ∂x ⎛ ⎞⎟ q0 1 ∂E y ⎜ ⎟⎟ = −3 ⎜⎜ + ⎜⎝ (1 − p0x − q 0y ) (1 + x 2 + y 2 )⎟⎠⎟ E ∂y
(4)
They can be solved for surface gradients (p0 , q 0 ) at each point (x , y ) of the image. This technique generally provides normal estimation errors smaller than 0.1 radians [10]. In this study, pq values of the surface provide 3D details of the exposed anatomical structure and are used to accentuate salient features while making smoothly varying background semi-transparent. To create the desired visual clues, surfaces of the scene that are parallel to the viewing plane are rendered as transparent whilst the sloped structures are accentuated. A measure of the surface slope is generated from the pqvalues for each image point (x, y ) by
S (x , y ) = log(abs(p0 ) + abs(q 0 )) ,
(5)
where high values of S (x , y ) correspond to large gradients. The smooth background image B is created by applying a wide 6 × 6 Gaussian filter (σ = 7) on S . The two textures are then combined by using a mask mask (x, y ) defined by a Catmull-Rom spline.
3
Experimental Setup and Results
In order to assess the accuracy of the proposed method in providing AR stereo-depth perception, a user experiment was designed to compare the traditional AR overlay with the new NPR AR rendering. To provide 3D stereo vision, a da Vinci robotic
pq-space Based Non-Photorealistic Rendering for Augmented Reality
105
surgical console (Intuitive Surgical, Inc., Mountain View, USA) was used. Ten subjects with experiences in surgical AR were recruited for the study. After a short training period, the subjects were asked to locate prescribed virtual targets with an Omni™ Phantom® device (The SensAble Technologies, Woburn, MA, USA). After positioning the tool on the target, subjects recorded the position of the Omni™ Phantom® device by pressing a button. In this study, two experiments were conducted. The first experiment required the users to locate eight small virtual spheres positioned at different depths of a real thoracic model viewed from the stereo laparoscopic camera of the da Vinci system. Both the standard AR overlay and the proposed NPR AR rendering were provided. In the second experiment, the subjects were asked to define the tumor margin with the Omni™ Phantom® device by using a recorded da Vinci lung lobectomy procedure. Lung lobectomy is the most common surgery performed to treat lung cancer, and Video-Assisted Thoracoscopic Lobectomy (VATS) has significant advantages over conventional surgery due to reduced blood loss and shorter recovery time [11]. In all experiments, the subjects had to rely on their perceived depth of the virtual object in relation to the real anatomical structure to navigate the instrument. No tactile feedback was given to the subjects about the relative position the instrument tip to the target. Fig. 1 illustrates three examples of the rendering results by using the proposed pqspace based NPR scheme for the thoracic model and robotic assisted lung lobectomy procedure used for the two user experiments. It is evident that salient surface details are preserved whilst the overall surface now appear as semi-transparent. Under stereo vision, the overall 3D structure of the exposed anatomical site appeared extremely realistic. a)
b)
c)
d)
e)
f)
Fig. 1. Example AR rendering results rendered using the proposed pq-space based NPR method for robotic assisted lung lobectomy (a-b) and the thoracic phantom model used for the user experiments (c) and (f) are shown on (d-e) respectively
106
M. Lerotic et al.
Fig. 2. Error analysis of the perceived depth for the ten subjects studied by using the traditional AR (gray bars) and new NPR AR (black bars) methods, as well as the corresponding time required to locate the targets for the thoracic phantom model. In both graphs, median values and 95% confidence intervals (CI) are given.
Fig. 2 summarizes the results of the first experiment for the ten subjects studied. In Fig. 2(a), the perceived versus the actual depths of the object for the standard and the proposed NPR AR environments are provided. It is evident that there is a marked improvement in the accuracy of the depth perception when NPR AR is used. Nonparametric Wilcoxon signed rank test has given a p-value <0.001 (the data was not normally distributed as determined by Kolmogorov-Smirnov test), confirming that the error in perceived depth is significantly smaller in NPR AR as compared to that of traditional AR. The overall time taken by the subjects in reaching to the virtual spheres for the first experiment is shown in Fig. 2(b), which shows a marginal improvement in performance speed. The Wilcoxon signed-rank test gives a p-value of 0.16, which highlights the fact that the significant impact of the proposed NPR AR scheme is in depth perception accuracy rather than the speed of instrument navigation. In Fig. 3, the left and right stereo views of the lung lobectomy procedure are provided, where the original da Vinci images are shown in Figs. 3(a-b), the traditional AR overlay in Figs. 3(c-d), pq-space NPR AR rendering in Figs. 3(e-f), and finally pq-space NPR AR fused with the original laparoscopic view in Figs. 3(g-h). It is apparent that in the traditional AR view, the object appears as if floating above the surface due to the effect of mixed occlusion. The pq-space based NPR AR rendering provided an effective “seethrough” vision of the exposed anatomical surface and under stereo viewing, the virtual object appeared correctly embedded into the lung parenchyma. The fused views in Figs. 3(g-h) provided a strikingly realistic viewing experience with which the achieved visual realism and depth correspondence are of high quality. The errors in perceived depths of this experiment for the ten subjects studied are summarized in Table 1. As in the first experiment, the subjects were able to judge the depth more accurately by using the proposed NPR AR rendering when compared to the traditional AR overlay. Similar results were observed for the time it took for each subject to reach the target. Tool trajectories of a representative subject are shown in Fig. 4 to demonstrate the improved depth perception and instrument targeting by using the proposed NPR AR rendering technique. After the experiment, the users were asked to provide a subjective measure of the quality of the two AR environments based on a Likert scale (1-poor, 5-excellent).
pq-space Based Non-Photorealistic Rendering for Augmented Reality
107
They were also asked to score the overall visual impression and depth perception of the object. The mean visual score for the traditional AR was 2.5, as compared to 4.4 for the proposed pq-space based NPR AR method. The mean depth perception score was 2.6 for the traditional AR, as compared to 4.7 for the proposed technique. Another AR technique based on a “virtual window” [7] uses occlusion as a depth clue to achieve improved depth perception.
a)
b)
c)
d)
e)
f)
g)
h)
Fig. 3. (a-b) Stereo views of the original robotic assisted lung lobectomy, (c-d) traditional AR overlay, (e-f) pq-space NPR AR rendering, and (g-h) fused NPR AR with the original video. The virtual object is rendered at the same depth in all cases.
108
M. Lerotic et al. Time (second) 40
Subject 5
Overlaid AR
20
10
0
-10
NPR-AR
Distance from target (mm)
30
-20
-30
-40
0
5
10
15
20
25
30
35
40
45
50
Time (second)
Fig. 4. Tool trajectories of a representative subject of the study showing improved depth perception and instrument targeting by using the proposed NPR AR rendering scheme. For comparison, overlaid AR is shown on the positive and NPR AR on negative axis. More oscillation and variation can be seen when the subject was provided with the traditional overlaid AR overlay. Table 1. Comparison of median errors in depth perception [in mm] and median task completion times [in seconds] for the ten subjects studied by using the overlaid AR and the new NPR AR methods. The 95% Confidence Interval of each measure is shown in parentheses.
Subject
1
2
3
4
5
6
7
8
9
10
Overlaid AR depth error NPR AR depth error Overlaid AR task timing NPR AR task timing
2.3 (1)
19.3 (5)
29.7 (4)
8.4 (6)
4.1 (5)
10.6 (3)
5.4 (2)
4.9 (4)
10.1 (2)
10.2 (3)
1.6 (1)
19.1 (5)
27.6 (5)
6.3 (8)
1.7 (2)
7.7 (4)
3.4 (2)
5.1 (3)
2.6 (1)
7.1 (2)
3.4 (4)
3.4 (10)
6.0 (4)
5.8 (5)
5.2 (12)
9.4 (5)
4.4 (3)
9.0 (11)
13.6 (6)
13.2 (5)
4.2 (3)
4.2 (4)
3.6 (4)
2.4 (4)
4.2 (2)
4.4 (4)
4.6 (8)
5.0 (5)
2.8 (3)
9.6 (5)
4 Conclusion In this paper, we have demonstrated the benefits of pq-space based non-photorealistic rendering for augmented reality. Experiments with both phantom and in vivo lung lobectomy data were used to evaluate the accuracy of the proposed method for AR depth perception. One of the most promising advances in surgical technology in recent years is the introduction of robotic assisted MIS which allows the performance of procedures that are otherwise prohibited by the confines of the operating environment. The use of robotic assisted MIS provides an ideal environment for integrating pre-operative data of the patient for performing image guided surgery. In this regard, the proposed technique is expected to have a significant role in the future deployment of AR to robotic assisted MIS as it is perceptually accurate and computationally efficient due to the linear form of the pq-vector derivation and light computational/graphics loading on the rendering engine.
pq-space Based Non-Photorealistic Rendering for Augmented Reality
109
Acknowledgments The authors would like to thank Danail Stoyanov for the help in stereo camera calibration.
References [1] Bajura, M., Fuchs, H., Ohbuchi, R.: Merging virtual objects with the real world: seeing ultrasound imagery within the patient 19th annual conference on Computer graphics and interactive techniques, pp. 203-210 (1992) [2] Edwards, P.J., King, A.P., Maurer, J.C.R., De Cunha, D.A., Hawkes, D.J., Hill, D.L.G., et al.: Design and evaluation of a system for microscope-assisted guided interventions (MAGI). Medical Imaging, IEEE Transactions 19(11), 1082–1093 (2000) [3] Sauer, F., Khamene, A., Vogt, S.: An Augmented Reality Navigation System with a Single-Camera Tracker: System Design and Needle Biopsy Phantom Trial. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 116–124. Springer, Heidelberg (2002) [4] Stoyanov, D., Darzi, A., Yang, G.-Z.: A Practical Approach Towards Accurate Dense 3D Depth Recovery for Robotic Laparoscopic Surgery. Computer Aided Surgery 10(4), 199– 208 (2005) [5] Johnson, L.G., Edwards, P., Hawkes, D.: Surface transparency makes stereo overlays unpredictable: the implications for augmented reality. Studies in Health Technology and Informatics 94, 131–136 (2003) [6] Swan II, J.E., Jones, A., Kolstad, E., Livingston, M.A., Smallman, H.S.: Egocentric Depth Judgments in Optical, See-Through Augmented Reality. IEEE Transactions on Visualization and Computer Graphics 13(3), 429–442 (2007) [7] Sielhorst, T., Bichlmeier, C., Heining, S.M., Navab, N.: Depth Perception - A Major Issue in Medical AR: Evaluation Study by Twenty Surgeons. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 364–372. Springer, Heidelberg (2006) [8] Okatani, T., Deguchi, K.: Shape reconstruction from an endoscope image by shape from shading technique for a point light source at the projection center. Computer vision and image understanding 66(2), 119–131 (1997) [9] Rashid, H.U., Burger, P.: Differential algorithm for the determination of shape from shading using a point light source. Image and Vision Computing 10(2), 119–127 (1992) [10] Deligianni, F., Chung, A., Yang, G.-Z.: pq-Space Based 2D/3D Registration for Endoscope Tracking. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, pp. 308–311. Springer, Heidelberg (2003) [11] Shigemura, N., Akashi, A., Funaki, S., Nakagiri, T., Inoue, M., Sawabata, N., et al.: Long-term outcomes after a variety of video-assisted thoracoscopic lobectomy approaches for clinical stage IA lung cancer: A multi-institutional study. J. Thorac. Cardiovasc. Surg. 132(3), 507–512 (2006)
Eye-Gaze Driven Surgical Workflow Segmentation A. James, D. Vieira, B. Lo, A. Darzi, and G.-Z. Yang Royal Society/Wolfson Medical Image Computing Laboratory & Department of Biosurgery and Surgical Technology, Imperial College London, London, United Kingdom {a.james, d.vieira, benny.lo, a.darzi, g.z.yang}@imperial.ac.uk
Abstract. In today’s climate of clinical governance there is growing pressure on surgeons to demonstrate their competence, improve standards and reduce surgical errors. This paper presents a study on developing a novel eye-gaze driven technique for surgical assessment and workflow recovery. The proposed technique investigates the use of a Parallel Layer Perceptor (PLP) to automate the recognition of a key surgical step in a porcine laparoscopic cholecystectomy model. The classifier is eye-gaze contingent but combined with image based visual feature detection for improved system performance. Experimental results show that by fusing image instrument likelihood measures, an overall classification accuracy of 75% is achieved. Keywords: Eye-tracking, surgical workflow, neural networks, minimal invasive surgery, surgical simulation.
1 Introduction Over the past decade, the operating room has undergone significant transformation to evolve into a highly complex and technologically rich environment. Increasingly, surgeons find themselves immersed amidst the sophistication of surgical workflow and procedures. Minimally Invasive Surgery (MIS) is an important advance in surgical technology with proven benefits to the patient and care providers. They include reduced hospitalization, shorter rehabilitation, less pain and decreased hospital costs. However, as the procedures and devices used have become more specialized, the need for a greater range of supporting equipment has arisen. The increase in equipment not only obstructs the area around the surgical field but also increases the burden on the surgeon who has to adapt to and integrate with these systems during the operation. For a surgeon to assimilate all of the components during a surgical procedure, it adds further pressure above and beyond the clinical demands encountered during MIS. In addition, there is continual pressure for surgeons to demonstrate their surgical competence so as to improve standards and patient safety, in part by recognizing and reducing errors [1]. Thus far, the existing training paradigm has not adapted to the dynamic, technologically rich environment [2]. Existing work in the field has focused on improving training through the use of virtual reality trainers and simulated operating rooms [3,4]. Subsequent work has focused on recovering instrument metrics N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 110–117, 2007. © Springer-Verlag Berlin Heidelberg 2007
Eye-Gaze Driven Surgical Workflow Segmentation
111
in an attempt to classify the underlying surgical repertoire and evaluate surgical skill levels [5,6]. Time series analysis has also been used to classify the overall surgical workflow [7,8]. Current work has made progress in classifying and automating the recognition of surgical performance and surgical workflow. However, no one framework has the means of objectively assessing surgical skills and recovering surgical workflow with sufficient levels of detail to tease out subtle behaviors that lead to poor outcome. Existing research has shown that the main focus in MIS should be to establish quantitative methods of assessing not only manual dexterity [9,10] but also the cognitive process that leads to potential surgical errors. In practice, the understanding of the cognitive and perceptual process that governs the surgical decision is a challenging problem. Existing research [11] has shown that by the use of eye-gaze tracking, it is possible to elucidate certain aspects of this process. Eye movements can provide a number of clues to the underlying cognitive processes and their interactions with the outside world [12]. During MIS, surgeons process a wide range of sensory information and certain aspects are derived through overt changes in gaze direction. Even for simple procedures, the myriad of eye movements in response to the underlying visual attention is so complex that most of them are sub-consciously applied. In this case, post recall of the exact process involved is difficult, and this represents the most challenging part of eliciting surgical workflow and the surgical sequence in MIS.
Fig. 1. Eye-gaze data captured from an in-house mobile eye-tracker system during a laparoscopic cholecystectomy. The eye movement amplitude has a syntactic structure that is indicative of underlying surgical behavior. Section A = operating theatre interaction, B = primary surgical maneuvers (B1 = Grasp Hartmann’s pouch, B2 = dissect cystic artery/duct, B3 = Clip cystic artery/duct, B4 = Cut cystic artery/duct and B5 = dissect gall bladder from liver) and C = tool transition periods.
To demonstrate this effect, Fig. 1 shows an example eye tracking trace for a laparoscopic cholecystectomy procedure performed in the operating theatre, demonstrating how visual attention changes throughout different stages of the procedure, including the interaction in the operating theatre, primary surgical steps and instrument transitions. It is apparent that basic characteristics of the eye
112
A. James et al.
movement are different. Sequences of focused attention with prolonged fixation and smooth pursuit movements are purposeful but may not be consciously registered by the surgeon. Current research has shown that spatio-temporal recognition of eyemovements can provide a means of quantifying attention selection. The identification of characteristic maneuvers, however, is a prerequisite for quantifying surgical skills and identifying the underlying idiosyncrasy of different operators. The purpose of this paper is to propose an eye-gaze driven MIS workflow segmentation scheme. We will demonstrate the accurate segmentation of a primary surgical step from within a porcine laparoscopic cholecystectomy model. By combining this with traditional video sequence analysis techniques, the proposed method has shown to give significantly improved performance in workflow segmentation and the potential to accurately segment surgical sequences.
2 Materials and Methods 2.1 Experiment Setup Three senior specialist registrars were recruited to participate in the study and perform a full laparoscopic cholecystectomy (gall bladder removal) procedure on a porcine model. Each of the three registrar’s performed five procedures forming a total data set of fifteen procedures. It is well documented that approximately 95% of all gall bladder removals are performed laparoscopically. The frequency of the procedure in practice lends itself to the implementation as a model for skill and workflow analysis [8]. In a typical acute patient with normal anatomy, there are six primary steps to the surgical workflow once the surgeon has entered the peritoneal cavity. Due to the nature of the porcine anatomy, only five out of the six steps are achievable, due to the absence of the cystic artery. The five surgical steps of the porcine cholecystectomy are illustrated in Fig.2.
Fig. 2. The five main steps: (a) 1 = Grasp the Hartmann’s pouch. (b) 2 = Dissect the cystic duct. (c) 3 = Clip the cystic duct (d) 4 = Transect the cystic duct (e) 5 = Dissect the gallbladder from the liver bed (f) Represents the primary surgical step classified in the experiment where each participant has to navigate with an endoclip instrument and clip the cystic duct in three places, one clip placed proximally and two distally.
Eye-Gaze Driven Surgical Workflow Segmentation
113
The focus of the study was to develop a framework to classify and automate the recovery of the critical surgical step, step 3(clipping of the cystic duct). The selection of this step is based on the fact that most iatrogenic injuries occur during this step, as the common bile duct (CBD) is often mistaken for the cystic duct, resulting in transection of the CBD and occasionally excision of the CBD and most of the hepatic duct. During the procedure eye-gaze was recorded using an eye tracker (Tobii ET 1750). This is an infra-red video-based binocular eye-tracking system recording the position of gaze in the work plane (screen) at up to 38 Hz. With this device, a fixed infra-red light source is beamed towards the eye whilst a camera records the position of the reflection on the cornea surface relative to the pupil centre. The infra-red images are real-time digitized and processed. Following a calibration procedure, the point of regard can then be determined with an accuracy of one degree across the work plane [13].
Fig. 3. (a) Tobii ET 1750 eye-tracker integrated into laparoscopic stack. (b) Video frame captured from the ET 1750 demonstrating the gaze transition (blue point). (c) Box trainer used to simulate laparoscopic conditions and house the porcine model.
Workflow segmentation was performed using eye-gaze vectors and image features. It is well know that video processing is computational expensive so our approach employs eye-gaze to drive image feature selection to guide the search in the image space towards regions containing salient information. The outline of the classifier is illustrated in Fig. 4.
Fig. 4. Eye-gaze driven segmentation framework that calls on selected image features to classify surgical step 3 (clip cystic duct) in the procedure
Instrument transitions were selected to define the time window to segment the surgical step from the workflow. It is typical for the surgeon to exchange instruments between the primary steps of the procedure and the underlying gaze behavior is
114
A. James et al.
indicative of the transitions. Gaze deviates away from the work plane (monitor), introducing a perturbation in the eye signal that can be detected and classified. Gaze off-screen periods can also be brought about by other factors. However, our work classifies the off-screen periods to determine their likelihood of instrument transitions, thus automatically defining the recognition window for surgical step 3. The eye gaze data was processed and classification likelihood was given based on the off-screen duration and the length of the time window between two off-screens, to define Classifier 1 in Fig.4. For Classifiers 1 and 2, it is common to deal with a small data set, so a more robust learning machine, based on small sample theory is implemented. 2.2 Machine Learning with a Small Training Data Set One of the challenges of MIS workflow analysis is the size of the training data set available for machine learning. Given a training set S = {(x 1 , y1 ), …, (xT , yT )} , where x t is the input and yt is the corresponding response, the main problem of machine learning is to find the optimal machine parameter w that maps x t , i.e., f (x t , w ) to the supervisor’s response yt . One possible way to find is to use the empirical risk minimization (training error minimization): T
Remp = ∑ L (yt , f (x t , w )) .
(1)
t =1
It was observed in [14] that the empirical risk minimization is not consistent given problems when the data sample is small. For these problems the minimization of the structural risk is more adequate. The minimization of structural risk is a bi-objective optimization problem in which two conflicting objectives should be minimized. They are related to the empirical training error (2), and the machine complexity [15].
⎧⎪ f1 = Remp min ⎪⎨ . ⎪⎪ f2 = Ω ⎩
(2)
Mathematically, f1 represents the minimization of some empirical risk function (2) and f2 the minimization of the complexity, i.e., the VC dimension, function gradient and high-frequencies energy. Using this formulation it can be theoretically proved that the upper bound of the learning machine error is minimized. In this study, a neural network called Parallel Layer Perceptron (PLP) was used to classify the off-screen periods [16]. The PLP was trained using a bi-objective formulation that takes into account the minimization of the training error and machine complexity [17]. Decreasing the machine complexity is fundamental to achieve reliable results when the training set is small and/or is composed of many outliers. The problem this paper proposes to solve is affected by both, small sample sizes and outliers. 2.3 Video Analysis for Instrument Detection In a laparoscopic cholecystectomy procedure, a select number of instruments are required for different surgical steps. As such, the segmentation of surgical sequences
Eye-Gaze Driven Surgical Workflow Segmentation
115
from the workflow can be enhanced by identifying and detecting the specific instrument in use. In this study, we incorporated a simple visual based instrument detection method to recognize the change of instrument. To achieve motor control fidelity, surgeons generally position the laparoscopic instruments on the left and right parts of the laparoscopic view. To detect the change of surgical instruments, a region was defined empirically on the laparoscopic video, and a Sobel edge segmentation technique was applied to segment the instrument from the predefined region. Instrument likelihood measures were determined by using the threshold values related to the presence or absence of the instrument in a predefined target window. These likelihood values are augmented into the classification process through Classifier 2.
3 Results For all experimental data collected for this paper, the off-screen periods were divided in two classes, where the 1st class (C 1) has the off-screen periods representing instrument transitions and the 2nd class (C 2) represents off-screen periods with no instrument transition. Kurtosis and the skewness were measured for the two classes. The Kurtosis of the class C , k (C ), measures the Gaussianity of the distribution, where for Gaussian distributions k (C ) = 0 and k (C ) > 0 indicates that the distribution of C is outlier-prone. The Skewness measures the asymmetry of the data around the sample mean, where S (C ) = 0 represents symmetrical data S (C ) > 0 means that the data is distributed to the left and S (C ) < 0 that it is distributed to the right. For the classes C 1 and C 2 the following was found: S (C 1) = 0.9, k (C 2) = 16.16, S (C 2) = 3.4. These values discourage the using of Naïve Bayes classifiers since the distributions cannot be well represented by Gaussians. One PLP, PLP 1, with one Gaussian non-linear neuron in parallel with a 1st order polynomial was trained using the Minimum Gradient Method (MGM) described in Reducing the machine complexity is fundamental to achieving a reliable result when the training set is small and/or it is composed of many outliers [17]. In the present case, both situations held since C 1 is composed of few samples and the skewness of the classes indicates that they are susceptible to outliers. The positive observations are the same as the sample set, fifteen. A second PLP, PLP 2, was used to train the time length between the two off-screen periods that define step 3. This class C 1 is composed of fifteen positive examples and 223 negative observations. A voting machine was used to consider the results of C 1 and C 2 for the final classification. The assembly of PLP 1 and 2 defines Classifier 1 in Fig.4. This distribution is also asymmetric, thus, outlier-prone. As the data set is small the error was evaluated using the leave-one-out cross validation method, which is an unbiased statistical measure that is recommended with small sample data. A recognition accuracy of 66% was obtained with eye-gaze and an accuracy of 75% with eye-gaze and instrument detection.
116
A. James et al.
Fig. 5. The bar chart outlines the segmentation of surgical step 3. Step 3 was automatically recovered bounded by two off screen instrument transition periods (blue). All other steps 1, 2, 4 and 5 were manually segmented. The black bars highlight the false positive identification of off screen periods.
4 Discussion and Conclusions Accurate segmentation of a surgical workflow is important for improving the general surgical standards and reducing errors in MIS. Thus far, little research has been conducted for investigating the cognitive processes that lead to surgical errors. In this study, we have demonstrated the promise of eye-gaze driven surgical workflow segmentation for a primary surgical step in a procine laparoscopic cholecystectomy model. Experimental results have shown that the critical step can be recognized with an accuracy of 66% with eye-gaze alone and 75% with eye-gaze combined with image based workflow analysis. The use of PLP’s and other associated neural networks techniques to elicit subtle behavior in inter- and intra-operative performance from eye-gaze has remained largely unexplored. The proposed method is novel in that it seamlessly integrates the perceptual behavior that is intrinsically linked to a fully developed workflow with the conventional video processing approach. This is essential for resolving surgical steps that are ambiguous in terms of visual feature representation, but cognitively carries a very different mental process. Acknowledgements. We would like to thank our colleagues: George Mylonas and Marios Nicolaou and Smith & Nephew and Tyco Healthcare for their valuable support enabling us to advance our work.
Eye-Gaze Driven Surgical Workflow Segmentation
117
References 1. Rattner, W.D., Park, A.: Advanced devices for the operating room of the future. Seminars in Laparoscopic Surgery 10(2), 85–89 (2003) 2. Taffinder, N., Smith, S., Darzi, A.: Assessing operative skill. BMJ 318, 887–888 (1999) 3. Gallagher, A.G., Satava, R.M.: Virtual reality as a metric for the assessment of laparoscopic psychomotor skills. Surgical Endoscopy 16, 1746–1752 (2002) 4. Aggarwal, R., Undre, S., Moorthy, K., Vincent, C., Darzi, A.: The simulated operating theatre: comprehensive training for surgical teams. Qual Saf Health Care 13, 27–32 (2004) 5. Rosen, J., Macfarlane, M., Richards, C., Hannaford, B., Sinanan, M.: Surgeon-tool force/torque signatures evaluation of surgical skills in minimally invasive surgery. Medicine meets virtual reality- the Convergence of Physical & Informational technologies: options for a New Era in Healthcare 62, 290–296 (1999) 6. Rosen, J., Solazzo, M., Hannaford, B., Sinanan, M.: Task decomposition of laparoscopic surgery for objective evaluation of surgical residents’ learning curve using hidden markov model. Comput. Aided Surg. 7, 802–810 (2002) 7. Sielhorst, T., Blum, T., Navab, N.: Synchronizing 3d movements for quantitative comparison and simultaneous visualization of actions. In: ISMAR. Proc. IEEE and ACM International on Mixed and Augemented Reality (2005) 8. Ahmadi, S.-A., Sielhorst, T., Stauder, R., Horn, M., Feussner, H., Navab, N.: Recovery of surgical workflow without explicit models. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 420–428. Springer, Heidelberg (2006) 9. Martin, J., Regehr, G., Reznick, R., MacRae, H., Murnaghan, J., Hutchinson, C., Brown, M.: Objective structured assessment of technical skill (OSATS) for surgical residents. Br.J.Surg. 193(5), 479–485 (2001) 10. Datta, V., Mackay, S., Mandalia, M., Darzi, A.: The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J. Am.Coll.Surg. 193(5), 479–485 (2001) 11. James, A., Tchalenko, J., Darzi, A., Yang, G.-Z.: Proceedings ECEM, vol. 12 (2003) 12. Schmid, R., Zambarieri, D.: Strategies of eye-head coordination. In: Schmid, R., Zambarieri, D. (eds.) Oculomotor control and cognitive processes, pp. 229–248. North Holland, Amsterdam (1991) 13. Tobii technology. User Manual (2003), http://www.tobii.se 14. Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998) 15. Caminhas, W.M., Vieira, D.A.G., Vasconcelos, J.A.: Parallel Layer Perceptron. Neurocomputing (55), 771–778 (2003) 16. Vieira, D.A.G, Vasconcelos, J.A., Caminhas, W.M.: Controlling the parallel layer perceptron complexity using multiobjective learing algorithm. Neural Computing & Applications 16(4/5), 317–325 (2007) 17. Vieira, D.A.G., Takahashi, R.H.C., Palade, V., Vasconcelos, J.A., Caminhas, W.M.: The Q-norm complexity measure and the Minimum Gradient Method: a novel approach to the machine learning structural risk minimization problem. IEEE Transactions on Neural Networks (2005)
Prior Knowledge Driven Multiscale Segmentation of Brain MRI Ayelet Akselrod-Ballin1 , Meirav Galun1 , John Moshe Gomori2 , Achi Brandt1 , and Ronen Basri1, 1,
Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot 2 Dept. of Radiology, Hadassah University Hospital, Jerusalem, Israel
Abstract. We present a novel automatic multiscale algorithm applied to segmentation of anatomical structures in brain MRI. The algorithm which is derived from algebraic multigrid, uses a graph representation of the image and performs a coarsening process that produces a full hierarchy of segments. Our main contribution is the incorporation of prior knowledge information into the multiscale framework through a Bayesian formulation. The probabilistic information is based on an atlas prior and on a likelihood function estimated from a manually labeled training set. The significance of our new approach is that the constructed pyramid, reflects the prior knowledge formulated. This leads to an accurate and efficient methodology for detection of various anatomical structures simultaneously. Quantitative validation results on gold standard MRI show the benefit of our approach.
1
Introduction
Segmentation of anatomical structures in brain magnetic resonance images (MRI) is crucial for medical image analysis. It includes a wide range of applications such as therapy evaluation, image guided surgery and neuroimaging studies [1,2,3]. The challenge in brain MRI segmentation is due to issues such as noise, intensity non-uniformity (INU), partial volume effect, shape complexity and natural tissue intensity variations. Under such conditions, incorporation of a priori medical knowledge, commonly represented in anatomical brain atlases by state-of-the-art studies [1,2,3,4] is essential for robust and accurate automatic segmentation. Automatic segmentation of brain structures in MRI has been extensively studied in scientific literature (see [2,3]). A popular approach is to utilize deformable models in a variational formulation. In [5] the templates were initialized by nonlinear registration of an MRI atlas to the input and then modified to minimize an energy based on expected textural and shape properties. Alternatively,
Research was supported in part by the Israel Institute of Technology. Research was supported in part by the Binational Science foundation, Grant No. 2002/254. Research was supported in part by the European Commission Project IST-2002506766 Aim Shape. Research was conducted at the Moross Laboratory for Vision and Motor Control at the Weizmann Institute of Science.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 118–126, 2007. c Springer-Verlag Berlin Heidelberg 2007
Prior Knowledge Driven Multiscale Segmentation of Brain MRI
119
[6] performed segmentation of several anatomical structures using a level set formulation. Numerous classification approaches have been proposed, including supervised techniques such as artificial neural networks [1], k-nearest neighbors (kNN) and unsupervised clustering techniques such as k-means and fuzzy cmeans. An adaptive fuzzy c-means (AFCM) algorithm was developed in [7] using a multigrid algorithm. Statistical-based methods have been widely used. These approaches typically model the intensity distribution of brain tissues by a Gaussian mixture model and classify voxels according to the intensity distribution of the data. Given the distribution, the optimal segmentation can be estimated by a maximum a posteriori (MAP) or a maximum likelihood (ML) formulation. The expectation maximization (EM) is a popular algorithm for solving the estimation problem. It was pioneered by [8] to simultaneously perform brain segmentation and estimate the INU correction and was further extended by many others to incorporate spatial considerations. The authors in [9] applied the EM approach to a hierarchical segmentation model. Their idea of combining a hierarchical graph pyramid with atlas information in a statistical model, is closely related to our work. A Bayesian approach that uses manually labeled data for brain segmentation was presented by [10]. The core idea, of using a manual training set for incorporation of prior statistics and class conditional densities resembles our approach. Yet, instead of using Markov random fields (MRFs) our novel multiscale approach allows us to capture both the local and global geometric relations between the structures. A recent multiscale segmentation and classification approach (ISCA) [11] integrated segmentation with a second classification stage and applied it to tissue classification. Our novel probabilistic multiscale segmentation avoids the second stage by incorporating prior information into the framework. The work reported here differs from past work in several aspect: First, to the best of our knowledge this is the first application of a multiscale algorithm derived from a fast multi-level solver called algebraic multi-grid (AMG) for segmentation of deep brain structures. The benefits of using a multiscale approach include, the reduction in time computation, the development of a framework that can adopt a rich set of multiscale measurements and the hierarchical representation which as noted by [9] is indeed a powerful and flexible representation useful for many applications. Second, the incorporation of prior information, represented by a probabilistic atlas and a likelihood function (based on a manually labeled training set), into the multiscale segmentation using Bayes formulation. Finally, in contrast with other approaches that are typically tuned to a particular set of structures or tasks, we propose a general framework for segmenting the three main brain tissue classes and their substructures simultaneously, by using the same parameter values for all structures. Our approach can be generalized to other tasks and modalities, because any probabilistic atlas and registration scheme can be used and due to the ability to define a likelihood function based on any training set. The restriction is that the probability information of the training data (atlas and likelihood function) represent the specific population. The remainder of the paper is organized as follows. In section 2 our probabilistic multiscale approach is introduced. Section 3 presents the experiments
120
A. Akselrod-Ballin et al.
and compares our results to other approaches on a gold standard data base. Section 4 summarizes with conclusions.
2
Methods
Our aim is to perform segmentation of anatomical structures, by incorporating prior anatomical information into a multiscale segmentation framework, through a Bayesian formulation. We utilize a probabilistic atlas where each region of image space is assigned a prior probability of belonging to a variety of anatomical structures. This prior probability is aggregated and used throughout the segmentation process. The likelihood in the model is computed based on an automatic learning process derived from labeled training data. Consequently, the posterior probability (the chance) that two neighboring aggregates reside in the same segment is estimated by integrating the Bayesian formulation into the multiscale segmentation algorithm. This section describes the segmentation framework together with the probability-based criteria computed for the aggregates. 2.1
Segmentation Methodology
Our method is based on the Segmentation by Weighted Aggregation (SWA) [12], which was extended to handle 3D multi-channel and anisotropic data [11]. The algorithm uses a graph representation of the images and constructs a ”pyramid”, i.e., a sequence of progressively smaller (”coarser”) graphs (”levels” or ”scales”), which adaptively represents progressively larger aggregates of voxels of similar properties. The nodes and the couplings (edge values) of the initial graph are the voxels of the given images and similarity measures between neighboring voxels, respectively. The algorithm recursively coarsens the graph, level after level, by softly aggregating several similar nodes of a finer level into a single node of the next coarser level. The couplings of the coarser graph are based on tunable statistical measures, called aggregative features which are scale dependent properties computed along with the segmentation process. Features obtained for small aggregates at a fine level affect the aggregation formation of larger aggregates at coarser levels, according to features similarity. This work employs as aggregative features both the average intensity of voxels at an aggregate i, ¯ and the average atlas prior probabilities of an atlas structure denoted by I(i) at an aggregate i, denoted for example by P (i ∈ W M ) for finding the white matter (WM) in an aggregate i. The scheme provides a recursive mechanism for calculating the aggregative features (see [12]). 2.2
Incorporating the Probabilistic Model into the Segmentation
Let L = {l1 , . . . , lν , . . . , lK } be a collection of K anatomical structures. In our experiment we use twelve structure classes (K = 12) referring to white matter (WM), gray matter (GM), cerebrospinal fluid (CSF), Caudate(CN), Putamen(Pu), Thalamus(Th), Pallidum(GP), Brainstem(Bs), Ventral Diencephalon
Prior Knowledge Driven Multiscale Segmentation of Brain MRI
121
(VDC), Hippocampus(H), Amygdala(Am) and ”Other”. The prior probability P (i ∈ lν )[s=0] of a voxel i at the finest graph level (s = 0) is defined by the spatial distribution of the probability atlas aligned with the test data set where the atlas construction is performed based on the training set (see Sec. 3.1). At coarser levels of the segmentation pyramid the prior probability P (i ∈ lν )[s] of an aggregate i is accumulated as an aggregative feature, so that at level s it is modelled by the average prior probabilities of its sub-aggregates obtained at level s − 1 . The posterior probability of a structure being present at an aggregate i can be obtained using Bayes formula as follows (Z is a normalization factor): ¯ [s] = 1 [P (i ∈ lν )[s] P (I(i)|i ¯ P (i ∈ lν |I(i)) ∈ lν )[s] ], (1) Z ¯ ∈ lν )[s] represented by the second term in the right The likelihood P (I(i)|i hand side of Eq. 1 reflects the conditional intensity probability for an aggregate i given that the structure class is lν . It is computed for each structure lν , based on the voxel intensity and the ground-truth structure, as determined by the manual labeling of the training data. The histograms for each structure, are then averaged over all training subjects and used as the likelihood for the test set.We decided to model the voxel intensity probability with a nonparametric distribution, since the real distribution can differ from the commonly used Gaussian distribution model. In addition, the ability to define a likelihood function based on the training set allows generalization to other tasks and clinical population.The average intensity of voxels in aggregate i, is accumulated as an aggregative feature. Thus, at every level of the pyramid, we find the histogram ¯ and determine the likelihood value for each aggregate bin corresponding to I(i) and structure accordingly. The role of the posterior probabilities (Eq. 1) in the segmentation is to determine if two neighboring aggregates at level (s − 1) reside in the same aggregate at the level s. The algorithms goal is to accurately segment a set of anatomical structures which consist of both clearly and weakly defined boundaries. Subcortical structures (e.g., the thalamus) are commonly defined by weakly visible boundaries, since their intensity pattern is often similar to their neighbors. In such cases, the atlas prior is critical. However, in the case of the main tissue classes, the borders are more visible and the smooth atlas prior can impede the delineation accuracy. The multiscale framework allows us to adjust the influence of the aggregative features on the coarse graph couplings across scale during the pyramid construction. Once reaching an intermediate scale the aggregates have gathered sufficient statistics, therefore the probability criteria can fully control the segmentation process. Yet, at finer scales (s ≤ 4), we have experienced that for the three main tissues, the probability criteria needs to be regulated by intensity. Accordingly, the coupling weights of aggregates i, j are modified based on their aggregative feature similarity denoted by exp(−ζaij ) and defined as follows: 1 ¯ − I(j)| ¯ λ(ΔP ij ) 2 + (1 − λ)|I(i) if s ≤ 4 aij = (2) 1 (ΔP ij ) 2 otherwise , 1
where λ = (ΔP ij ) η (η = 10,ζ = 7 in our implementation), and
122
A. Akselrod-Ballin et al.
ΔP ij =
K−1
2 ¯ ¯ (P (i ∈ lν |I(i)) − P (j ∈ lν |I(j))) .
(3)
ν=4
The expression for the finer scales (s ≤ 4) combines an intensity term and posterior probability term based on the internal structures. λ is derived from the posterior probability itself and controls the relative weight of the two terms. When the posterior difference between two aggregates is high, then the first term controls the expression, otherwise the average intensity controls the process. The algorithm’s solution in terms of voxels is computed as follows. First, voxel occupancy of the aggregates is determined by projection of all aggregates onto the data voxels using the interpolation matrix (see [12]). A voxel is associated with the aggregate for which it has the maximal interpolation weight. Then, each aggregate in the coarsest scale of the pyramid is matched to one structure based on its maximal a-posteriori probability. Computational cost is linear in the size of the data. Our implementation on a standard Xeon 1.7GHz PC takes 5 min for the segmentation of all 12 structures on a 150 × 150 × 60 region of interest. This does not include the atlas construction which takes about 8min using 5 brains and likelihood histogram computation which is done in advance.The efficiency of our approach is superior to previously reported results, for instance [13] requires 5min for segmentation of the caudate structure on a pentium 4 2Ghz and [5] takes 6min for four structures on a pentium 3, 1GHz (the training phase took about 20 hours).
3
Experiments
The methods performance was assessed on a gold standard database. The MR brain data sets and their manual segmentations were provided by the Center for Morphometric Analysis at Massachusetts General Hospital and are available at http://www.cma.mgh.harvard.edu/ibsr/. The data set contains 18 real T1weighted normal MR brain scans and the manual segmentation of 43 structures, performed by a trained expert. The selection of the 12 structures that commonly appear in literature, was motivated by memory considerations. The MR scans are 256 × 256 × 128 volumes acquired with 1.5mm coronal slice thickness resolution and pixel dimension going from 0.84mm to 1mm on each slice. The approach was tested on the same 150 × 150 × 60 region of interest in all data sets, which contains the entire volume of all internal structures. Since we are not interested in the skull or the scalp, the images were skull stripped using the automatic BET procedure [14], and present validation results for the structures. 3.1
Atlas Construction
Similarly to other MRI brain segmentation methods [1,2,3,4] we employ probabilistic anatomical atlases to determine the prior information. The atlas is constructed based on affine co-registrations of the manually labeled training set to the test subject, implemented using the publicly available AIR5.0 [15] registration algorithm with 12-parameter affine transformation. First, the training sets
Prior Knowledge Driven Multiscale Segmentation of Brain MRI
123
are aligned with the test set and then the atlas is created by voxel-wise averaging of the neuroanatomical structures over the manually labeled training data sets. The atlas prior is formed by the frequency that each structure occurred at a voxel across the training sets. Thus, the atlas represents the prior probability of each voxel in the test set to belong to a particular structure. 3.2
Results
We applied the algorithm to the data set and performed a ”leave one out” learning strategy, leading to eighteen separate experiments. In each experiment one subject is removed from the training set and considered as the test set. The probabilistic atlas is constructed from five data sets randomly selected from training sets with K = 12 labels. Table 1 presents our segmentation results on all structures in comparison to other approaches. Only the four first rows in the table report on the same data set used here. The first and second rows present our novel approach with a baseline comparison based on the atlas priors. The third row presents the ISCA [11] applied on the same data set, whereas the lower rows are based on published results. Denote by S, the set of voxels automatically segmented as a specific structure, and by R the set of voxels labeled as the same structure in the ’ground truth’ reference. The similarity between S and R is evaluated using the following validation measures: – Dice similarity statistics κ: 2|S ∩ R|/(|S| + |R|) – Hausdorff distance Hf : The maximum of the minimum Eucledian distance between S and its closest voxel on R. H(S, R) = maxv∈S (minw∈R dEuclidean (v, w)) The distance is symmetrized by taking the maximum of both symmetric measures. Due to outliers H95 measures the f = 95% percentile of the Hausdorff distance. – Mean distance M : the mean distance between the boundarys of S and R.
Quantitative analysis shows that the algorithm can be considered as a promising platform for segmentation of anatomical structures. Figure 1 demonstrates our results comparing the manual and automatic segmentation in 2D and 3D views. It is important to note that the same parameter values were used on all structures. Our results compare favorably with results reported on the same gold standard data set. They are better than [6] on all structures and validation measures, except for the M measure on Pu. The significant difference (p ≤ 0.05) between [11] and our novel approach on all structures except Pu, shows that the latter which is faster and less complex, is also more accurate. A significant difference to the Naive approach was found on 8 of the 11 structures. The results reported in [10] are better. However, the analysis in Fischl et al. is based on 7 subjects whereas our work presents results on 18 subjects of a gold-standard database. Comparison to other approaches applied on different data sets, shows that our H95 results for the Hippocampus and caudate(CN) are inferior compared to [5], but our M is better, indicating that the error is due to a small percent of outliers. The results for CN, in terms of κ are lower than [9,10,16], yet either M or H95 are better than [5,6,13]. This may be explained by the
124
A. Akselrod-Ballin et al. Table 1. Segmentation scores for brain structures by various algorithms
Method: Multiscale Bayesian:-κ: -M: -H95 : Naive Prior:-κ: ISCA [11] -κ: -M: -H95 : C. Ciofolo et al. [6] -κ: -M: B. Fischl et al. [10]-κ : K. Pohl et al. [9]-κ : A. Pitiot et al. [5] -M : -H95 : B.M. Dawant et al. [16]-κ : D. Nain et al. [13]-H95 :
CN 0.80 1.44 3.07 0.65 0.74 1.84 4.46 0.65 1.71 0.88 0.866 1.6 2 0.86 3.16
Pu 0.79 1.6 3.36 0.77 0.78 1.72 3.89 0.70 1.46 0.86 -
Th 0.84 1.44 2.9 0.83 0.80 1.59 3.41 0.77 1.70 0.87 0.894 -
GP 0.74 1.43 2.75 0.72 0.70 1.55 3.2 0.62 1.51 0.78 -
H 0.69 1.88 4.57 0.62 0.64 1.91 4.44 0.80 2.1 3 -
Am 0.63 1.67 3.38 0.65 0.58 1.78 3.89 0.67 -
Bs 0.84 1.62 3.42 0.81 0.62 2.18 5.26 0.89 -
VDC 0.76 1.43 2.84 0.77 0.66 1.7 3.75 -
WM 0.87 0.69 0.84 0.87 -
GM 0.86 0.72 0.84 0.9 -
CSF 0.83 0.53 0.79 0.7 -
small volume of the CN, since in such volumes, small differences in placement of boundaries between S and R can have a large effect on κ. In sum, comparing the results obtained on several deep cortical structures to other approaches, we conclude that the results are not as high as the results reported by [9,10], but comparable or superior to results reported by other algorithms. We believe that the results are lower than the results reported by [9,10], which are the only study reporting results on both the tissues and their substructures, mainly due to the need to include additional features in the framework.
(a) 3D structures (b) Slice #1 (c) Segmentation (d) Slice #2 (e) Segmentation Fig. 1. Manual and automatic segmentation (upper, lower row). Presented in a 3D view (a) and super imposed on two coronal 2D slices (c,e) corresponding to their original input (b,d) respectively.
Prior Knowledge Driven Multiscale Segmentation of Brain MRI
4
125
Discussion
An automatic multiscale probabilistic method for segmentation of anatomical structures in MRI is introduced. The inclusion of prior information is a critical feature of the algorithm. Additional imaging contrasts protocols, and multiscale features can be easily incorporated into the framework. Future work will extend this work by enforcing shape, texture, spatial and neighborhood relations between the structures. Such constraint can be modelled implicitly and explicitly during the pyramid construction [9,5,10]. Also we plan to further improve our approach by using more sophisticated registration techniques. In sum, the method’s strength is demonstrated on gold standard real MRI, by performing accurate and efficient segmentation of many structures simultaneously, including both subcortical structures and brain tissues.
References 1. Zijdenbos, A., Forghani, R., Evans, A.: Automatic pipeline analysis of 3D MRI data for clinical trials: application to MS. IEEE TMI 21(10), 1280–1291 (2002) 2. Pham, D., Xu, C., Prince, J.: Current methods in medical image segmentation. Annual Review of Biomedical Engineering 2, 315–337 (2000) 3. Sonka, M.M., Fitzpatrick, J.M. (eds.): Handbook of Medical Imaging. SPIE (2000) 4. Van-Leemput, K.: Probabilistic brain atlas encoding using bayesian inference. MICCAI 1, 704–711 (2006) 5. Pitiot, A., Delingette, H., Thompson, P.M., Ayache, N.: Expert knowledge guided segmentation system for brain MRI. NeuroImage 23(1), S85–S96 (2004) 6. Ciofolo, C., Barillot, C.: Brain segmentation with competitive level sets and fuzzy control. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 333–344. Springer, Heidelberg (2005) 7. Pham, D., Prince, J.: Adaptive fuzzy segmentation of magnetic resonance images. IEEE TMI 18, 737–752 (1999) 8. Wells, W.M., Grimson, W., Kikinis, R., Jolesz, F.A.: Adaptive segmentation of MRI data. IEEE TMI 15, 429–442 (1996) 9. Pohl, K., Bouix, S., Kikinis, R., Grimson, W.: Anatomical guided segmentation with non-stationary tissue class distributions in an expectation-maximization framework. IBSI, 564–572 (2004) 10. Fischl, B., Salat, D., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S.: Whole Brain SegmentationAutomated Labeling of Neuroanatomical Structures in the Human Brain. Neuron 33(3), 341–355 (2002) 11. Akselrod-Ballin, A., Galun, M., Gomori, J.M., Basri, R., Brandt, A.: Atlas guided identification of brain structures by combining 3D segmentation and SVM classification. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, Springer, Heidelberg (2006) 12. Galun, M., Sharon, E., Basri, R., Brandt, A.: Texture segmentation by multiscale aggregation of filter responses and shape elements. In: ICCV, pp. 716–723 (2003) 13. Nain, D., Haker, S., Bobick, A., Tannenbaum, A.: Shape-driven 3D segmentation using spherical wavelets. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, Springer, Heidelberg (2006)
126
A. Akselrod-Ballin et al.
14. Smith, S.: Fast robust automated brain extraction. Human Brain Mapping 17(3), 143–155 (2002) 15. Woods, R., Grafton, S., Holmes, C., Cherry, S., Mazziotta, J.: Automated image registration: I. general methods and intrasubject, intramodality validation. Journal of Computer Assisted Tomography 22, 139–152 (1998) 16. Dawant, B.M., Hartmann, S.L., Thirion, J., Maes, F., Vandermeulen, D., Demaerel, P.: Automatic 3D segmentation of internal structures of the head in MRI using a combination of similarity and free-form transformations. IEEE TMI 18 (1999)
Longitudinal Cortical Registration for Developing Neonates Hui Xue1,2, Latha Srinivasan1,2,3, Shuzhou Jiang1, Mary Rutherford1, A. David Edwards1,3, Daniel Rueckert2, and Joseph V. Hajnal1 1 2
Imaging Sciences Department, Imperial College, London, Du cane Road, W12 0NN, UK Department of Computing, Imperial College, London, 180 Queen's Gate, SW7 2BZ, UK 3 Department of Paediatrics, Imperial College, London, Du cane Road, W12 0NN, UK {hui.xue, l.srinivasan, shuzhou.jiang, mary.rutherford, david.edwards, d.rueckert, jo.hajnal}@imperial.ac.uk
Abstract. Understanding the rapid evolution of cerebral cortical surfaces in developing neonates is essential in order to understand normal human brain development and to study anatomical abnormalities in preterm infants. Several methods to model and align cortical surfaces for cross-sectional studies have been developed. However, the registration of cortical surfaces extracted from neonates across different gestational ages for longitudinal studies remains difficult because of significant cerebral growth. In this paper, we present an automatic cortex registration algorithm, based on surface relaxation followed by non-rigid surface registration. This technique aims to establish the longitudinal spatial correspondence of cerebral cortices for the developing brain in neonates. The algorithm has been tested on 5 neonates. Each infant has been scanned at three different time points. Quantitative results are obtained by propagating sulci across multiple gestational ages and computing the overlap ratios with manually established ground-truth.
1 Introduction Clinical studies have shown delayed cortical folding and white matter (WM) related macro- and micro-structural changes in preterm infants at term equivalent age [1]. By analyzing changes in the neonatal cortex during early phases of brain development, it may be possible to detect precursors of cerebral abnormalities prior to term equivalent age, which would allow treatment options to be tested during the neonatal period. Also, the rapid anatomical and functional evolution of neonatal cortex itself presents a major mystery for evolutionary biologists and neuroscientists. Cortical development during the third trimester of pregnancy is extensive with noticeable increase in the cortical folding. In addition there is significant cortical variability across infants. Thus, the precise localization and tracking of principal anatomical features, i.e. central sulcus and sylvian fissure, is difficult. Several researchers have presented algorithms to unfold and align the cerebral cortex in cross-sectional studies in adulthood [2, 3, 4, 5]. Methods based on cortex unfolding aim to inflate the highly folded surfaces and map the whole or a piece of cortex to some standard representations like a flat surface or a sphere. The inflation N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 127–135, 2007. © Springer-Verlag Berlin Heidelberg 2007
128
H. Xue et al.
process is normally regularized by ensuring that several constraints, like rigidity between neighboring points, or the local area and angle are minimally distorted during the unfolding procedure [3]. The alignment of corresponding anatomical features is partly achieved by identifying these features manually and then normalizing the sphere into a standard coordinate space [4, 5]. This requirement of maintaining strict point correspondences was recently relaxed by Tosun et al. [6]. They applied a rigid surface registration to remove global misalignment between two cortical surfaces before applying a conformal mapping to transform them to a spherical representation. They also showed that using a normalized spherical coordinate system the four main sulci can be aligned across individuals. Although some measurements, such as surface area and distance can be computed with the spherical coordinate normalized by a rigid body transformation, this representation tends to smooth out fine-grain details of complex cortical anatomy. However, in the developing brain, specific sulci can experience significant morphometric development during the third trimester of pregnancy. Also, a global rigid body transformation clearly is not able to capture local non-rigid deformation. Instead a non-rigid registration is required to follow the growth of specific sulci across gestational ages (GAs). The aim of this study is to develop a methodology which is able to track and quantify cortical development in neonates longitudinally and to evaluate its ability to identify cerebral abnormalities related to preterm birth. To enable the tracking of cortical development we have developed a cortical registration algorithm based on two stages: In the first stage the more mature cortex (e.g. the later time point) is progressively smoothed. This smoothing process is repeated until the more mature cortex is maximally similar to the less mature cortex. In the second stage any residual misalignment of the cortex is corrected by performing a non-rigid surface registration using free-form deformations (FFDs). A quantitative evaluation of the cortical registration is performed by propagating sulci across multiple gestational ages and computing the overlap ratios with manually established ground-truth.
2 Methods 2.1 Cortical Segmentation from Neonatal MR Images The automatic segmentation of cortical grey matter in neonatal MRI is more challenging than the segmentation of adult brains. A particular confounding factor is the inverted white matter (WM) and gray matter (GM) contrast compared to the adult pattern. This leads to mislabeled voxels at the interface between the cerebrospinal fluid (CSF) and GM. Because CSF has the highest intensity in neonatal T2w images and the image resolution of neonatal MRI is usually no more than 0.9mm3, many voxels between CSF and GM will have similar intensities to WM which is brighter than GM and darker than CSF (Fig. 1a). These voxels can be incorrectly classified as WM by conventional intensity-based segmentation approaches (Fig. 1b). We have developed an automated cortical segmentation algorithm addressing these difficulties [7]. Specifically, a modified expectation-maximization (EM) scheme is used in combination with a Markov Random Field (MRF) model to ensure spatial homogeneity in the tissue classification. The detection and removal of mislabeled partial volume voxels (MLPV) is based on a knowledge-based rule. The MLPV are
Longitudinal Cortical Registration for Developing Neonates
(a)
(b)
(c)
129
(d)
Fig. 1. An illustration of neonatal cortex segmentation. (a) An enlarged neonatal T2w image; Note the WM is brighter than GM. (b) Partial volume voxels (highlighted by arrows) on the CSF-GM and CSF-non-brain boundaries are incorrectly classified by the original EM method; (c) The segmentation results after the 4-th iteration; (d) The final results after 14 iterations.
identified after every EM iteration and the MRF priors are adjusted to favor the correct classification classes. Once the modified EM algorithm converges most of the MLPV are eliminated (Fig. 1c). 2.2 Cortical Surface Reconstruction Starting from a probabilistic tissue classification generated by the automated segmentation algorithm [7], we reconstructed the inner, central and outer cortical surfaces for neonates using the reconstruction algorithm proposed in [8]. Specifically, the cortical surfaces are implicitly represented by the zero level-set and surface evolution is driven by solving the standard level-set partial differential equation. 2.3 Cortical Surface Registration Each reconstructed cortex is represented as a polygon mesh consisting of between 35,000 and 90,000 triangles. We do not introduce an explicit unfolding step to map the cortical surfaces to spherical representation. On the contrary, the registration method is designed to establish point correspondences on the original surfaces. As the first step we remove any global affine misalignment between the two cortical surfaces. We have found that direct alignment of the cortical surfaces by the simple iterative closest point (ICP) method often converges to local minima, whereas voxel-based image registration [9] is much more robust and can deal with significant global affine differences between the neonatal brains. Thus, the corresponding anatomical T2 weighted images are firstly registered and the optimal affine transformation obtained is then applied to the polygon meshes. Neonatal cerebral cortex normally undergoes rapid changes and the complexity of cortical folding increases noticeably during the gestational ages from between ~27 to 45 weeks. This rapid development process makes it difficult to establish correspondences directly via non-rigid surface registration. We found that smoothing the more mature cortex by reducing the complexity of the folding patterns generates cortical surfaces which are much more similar to the cortex obtain from early scans during cortical development. Thus, our assumption is that the performance of a cortical surface registration algorithm for aligning longitudinal data can be improved by exploiting this fact. We therefore perform an adaptive surface relaxation step
130
H. Xue et al.
before performing the non-rigid surface registration. It is also inspired by the observation in [6], where it was reported that partially smoothed surfaces can lead to more consistent spherical maps. Adaptive surface relaxation. Surface relaxation has been originally designed to smooth reconstructed polygon surfaces and to reduce artifacts which often appear as abrupt or stair-step meshes. The relaxation process is helpful for improving visualization [10, 11]. To facilitate any non-rigid surface registration, we here employ surface relaxation prior to the surface registration to inflate the more mature cortex. One iteration step of this relaxation process is defined as follows [11, 12, 6]: vti+1 = (1 − λ ) ⋅ vti + λ ⋅ vti
(1)
where vti is the position of vertex i at the iteration t . λ ∈ [0,1] is a pre-defined smoothing factor. vti is the average vertex position of all polygons sharing vertex i : vti =
1
∑A
j∈N i
∑A
j j∈ N i
j
(2)
⋅Cj
where N j is the set of polygons using the vertex i . A j and C j are the surface area and centre of polygon j . It is necessary to define a stopping criterion for surface relaxation, so that the cortical folding complexity of the more mature surface is comparable to the folding complexity seen at the earlier gestational age. We have tested various cortical folding measures. In this paper we have decided to use a criterion that is based on the computation of the intrinsic curvature index (ICI) and mean curvature L2 norm (MLN). Both measures are dimensionless and measure different aspects of cortical folding complexity. The former is originally defined in [13], measuring the local intrinsic convexity of surface. The MLN is the L2 norm of the mean curvature of cortical surfaces which takes a minimum value for a sphere and is called bending energy [14]. ICI = ∫ K + dA S
MLN =
∫H S
2
dA
where S is the whole cortical surface. K and H are Gaussian and mean curvature. K + equals K if K > 0 and otherwise it is zero. Both measures are integrated over the whole cortical surface. The relaxation will stop when both ICI and MLN of inflated surface fall below the corresponding values for the less mature cortex. An illustration of this surface relaxation is given in Figure 2. Non-rigid surface registration based on free-form deformations (FFDs). The output of the adaptive surface relaxation is finally registered to the less mature cortical surface using a non-rigid surface registration algorithm. We use an algorithm based on free-form deformations (FFDs) which is a powerful tool for modeling 3-D deformable objects [15]. The basic idea of FFDs is to deform an object by manipulating an underlying mesh of control points. The resulting deformation controls the shape of the 3-D object and remains as a C2 continuous transformation, which smoothly deforms the cortical surfaces.
Longitudinal Cortical Registration for Developing Neonates
131
Fig. 2. An illustration of cortex surface relaxation and non-rigid registration. A neonate was scanned for three times. The inner cortical surface of the first scan (GA: 29.86 weeks) is shown in (c). The cerebral cortex has undergone noticeable development by the scan at term equivalent age (GA: 39.86 weeks), as shown in (a). The inflated surface after adaptive relaxation is shown in (b), where the cortical folding complexity is substantially decreased. Non-rigid surface registration is performed to align the less mature cortex (c) and inflated surface (b). The deformed surface of (c) is shown in (d). (e) renders (b) and (d) together. The zigzag pattern shows these two surfaces are spatially very close.
To define a FFD for a cortical surface S , we define the spatial domain occupied by this surface as follows: Ω S = {( x, y , z ) 0 ≤ x ≤ X ,0 ≤ y ≤ Y ,0 ≤ z ≤ Z } and φ s denotes a nx × n y × nz grid of control points φi,j,k . The spacing between adjacent control points is
uniform in all coordinate directions. The deformation of a vertex vi = ( x, y, z ) is represented as the 3D tensor of the 1-D cubic B-splines [15]: 3
3
3
Tlocal (vi ) = ∑ ∑∑ Bl (u )Bm (v )Bn (w)φi + l,j + m,k + n
(3)
l = 0 m = 0 n =0
where i = ⎣x / nx ⎦ − 1 , j = ⎣y / n y ⎦ − 1 , k = ⎣z / nz ⎦ − 1 , u = x / nx − ⎣x / nx ⎦ , v = y / n y − ⎣y / n y ⎦ and w = z / nz − ⎣z / nz ⎦ . Bl represents the l-th basis function of the B-spline. The basis functions of cubic B-splines have limited support. Therefore changing a control point in the grid affects only a 4×4×4 region around that control point. Surface registration is achieved by specifically moving the control points to minimize a surface similarity. The similarity which we try to optimize is the average symmetric spatial distance f : f ( S ,W , Tlocal ) =
1 NS
NS
∑v i =1
i
− l(vi , Tlocal (W )) 2 +
1 NW
NW
∑w j =1
j
− l(w j , S )
(4) 2
where S and W are two cortical surfaces which are being registered. N S is the number of vertexes in surface S . NW is the number of vertexes in surface W . For
132
H. Xue et al.
every vertex vi ∈ S , l(vi ,Tlocal (W )) defines the closest vertex of vi on the transformed surface Tlocal (W ) . Similarly, for every vertex w j ∈ Tlocal (W ) , l(w j , S ) defines the closest vertex of w j on the surface S . The purpose of adding the second item is to force the registration of deep sulci. To ensure that the spatial transformation defined by the FFD is smooth, a standard second order regularization penalty should be minimized [16]. This penalty is added to the surface similarity to produce the final cost function.
3 Results and Evaluation We applied our method to 15 images acquired from 5 neonates. These infants are selected from a longitudinal MR study of cerebral development of premature neonates. Every subject has had three longitudinal scans. The initial scans were performed between 27 weeks and 33 weeks gestational age. The second scans were performed at a mean GA of 35 weeks and the final images were acquired at the term equivalent age (mean of 41 weeks). MR images were acquired on a 3T Philips Intera system (Best, Holland). The MR sequence parameters were as follows: T2-weighted fast spin echo pseudo volumes: TR=1712/TE=160ms, FOV=220mm, matrix 224 × 224, flip angle 90°, voxel size of 0.86 × 0.86 × 2mm with the 50% slice overlap. After acquisition, the T2 images were segmented using the algorithm described in section 2.1 and the inner cortical surfaces were reconstructed and used for registration. Fig. 2 shows an example of the surface relaxation and cortical registration results between longitudinal scans. The more mature cortex is relaxed until its folding complexity is comparable to the less mature cortex. When working with the registration between two later scans (scans of 35 weeks and 41 weeks) where folding patterns are becoming more complex, we found that inflating ~35 weeks surfaces can improve the registration. Those cortexes were therefore smoothed until their ICI and MLN decreased by 15%. The ~41 weeks surfaces were then adaptively inflated to match folding measurements from previous time points. We have performed cortical registration to register the cortex between subsequent time points for all subjects. Also, we have registered the cortical surface at the first time point directly to the cortical surface at the last time point (term-equivalent age) which is more challenging due to the significant cortical development during this time interval. To quantify the ability of the proposed registration method to localize and track the main anatomical features of the cortex, an experienced neonatologist was asked to manually label the central sulci (CS) on all 15 cortical surfaces. The manual labeling of one cortex can then be mapped to its target surface and generate an automatic segmentation via the non-rigid deformation. Fig. 3 gives an illustration of this automatic sulcus labeling. Note that a significant amount of non-rigid deformation is required to map the central sulcus extracted from less mature cortices to later scans. This deformation itself can be used to describe the local evolution of neonatal cortices, which may not be explicitly represented using spherical mapping. The overlap between automated results and manually established ground-truth were computed as a quantitative measurement of registration performance. Both true positive (TP) and false positive (FP) errors are estimated. TP is computed as the percentage area of manual labeling that is accurately labeled by the automatic
Longitudinal Cortical Registration for Developing Neonates
133
mapping. FP is defined as the percentage area of automatic sulcus that is not labeled manually. Table 1 summarizes the results. In all cases, the cortical registration with surface inflation shows the best performance. It is also clear that performing just global affine transformation is not sufficient for automated sulcus mapping. Direct non-rigid surface registration shows higher error rates possibly because the more folds a cortical surface presents, the more local optima the surface similarity can have. Table 1. Mean overlap ratios of automatic central sulcus labelling
1st to 2nd
2nd to 3rd
1st to 3rd
Affine IM NR NR+I Affine IM NR NR+I Affine IM NR NR+I 0.16 0.81 0.73 0.97 0.20 0.77 0.71 0.91 0.07 0.31 0.41 0.72 0.76 0.15 0.14 0.12 0.71 0.25 0.08 0.11 0.87 0.71 0.41 0.27 • 1st to 2nd: mapping central sulcus from the first scan to the second scan; 2nd to 3rd and 1st to 3nd are similarly defined; • Affine: global affine transformation; IM: intensity based non-rigid registration; NR: only non-rigid surface registration; NR+I: non-rigid surface registration with adaptive surface inflation.
TP FP
4 Discussion and Conclusions One aim of cortical unfolding techniques developed for adult brains is to normalize an intermediate representation, i.e. sphere or flat, so that primary anatomical features can appear at the same coordinates in the mapped space. This allows comparison of different patient groups. However the neonatal cortical evolution presents a somewhat different challenge because of evolving complexity with gestational age and it provides a unique opportunity to understand normal human brain development both at an individual level and evolutionary level. Therefore we developed a non-rigid surface registration technique and tested its applicability for use on MR images of developing neonates. Here we have shown that using this technique we can capture local deformations, and track and localize the central sulcus across different gestational ages. The preliminary evaluation in this paper shows that the non-rigid registration achieves better performance if the cortical surfaces are partially inflated. We hypothesize that partial inflation reduces the likelihood of the algorithm stopping in local optima of the surface similarity measure. This aids registration performance. However, the inflation can smooth out the secondary sulci and other smaller features, which can limit the non-rigid surface deformation to only capture changes in the main sulci. This side-effect might be reduced by designing a knowledge based surface similarity measure based on sulcus shape. Improving the surface initialization method may also reduce the degree of inflation needed for effective non-rigid registration. In the future we would like to apply this type of registration technique to both fetal and neonatal brains at different gestational ages to aid the development of atlases of normal cortical growth patterns so that temporal events in altered cortical development of preterm infants can be identified.
134
H. Xue et al.
(a)
(b)
(c)
(d)
Fig. 3. Automatic sulcus labelling via non-rigid cortical registration. A neonate was scanned three times. (a) The inner cortex of the second scan (GA: 33.86 weeks); (b) the cortex at term equivalent age (GA: 39.86 weeks). Their central sulci are manually labelled. (c) Global affine transformation can not deform the central sulcus across time points, while the non-rigid surface registration captures the local evolution.
References 1. Kapellou, O., Counsell, S.J., Kennea, N.: Abnormal cortical development after premature birth shown by altered allometric scaling of brain growth. PLOS Medicine 3(8), 1382– 1390 (2006) 2. Carman, G.J., Drury, H.A., van Essen, D.C.: Computational methods for reconstructing and unfolding the cerebral cortex. Cerebral Cortex 5(6), 506–517 (1995) 3. Fischl, B., Sereno, M.I., Dale, A.M.: Cortical surface-based analysis II: Inflation, flattening, and a surface-based coordinate system. NeuroImage 9(2), 195–207 (1999) 4. Thompson, P.M., Woods, R.P., Mega, M.S., Toga, A.W.: Mathematical/computational challenges in creating deformable and probabilistic atlases of the human brain. Human Brain Mapping 9, 81–92 (2000) 5. Gu, X., Wang, Y., Chan, T.F., Thompson, P.M., Yau, S.T.: Genus zero surface conformal mapping and its application to brain surface mapping. In: Taylor, C.J., Noble, J.A. (eds.) IPMI 2003. LNCS, vol. 2732, pp. 172–184. Springer, Heidelberg (2003) 6. Tosun, D., Rettmann, M.E., Prince, J.L.: Mapping techniques for aligning sulci across multiple brains. Medical Image Analysis 8(3), 295–309 (2004) 7. Xue, H., Srinivasan, L., Jiang, S.Z., Rutherford, M., Edwards, A.D., Rueckert, D., Hajnal, J.V.: Automatic Cortical Segmentation in the Developing Brain. In: Information Processing in Medical Imaging, pp. 257–269 (2007) 8. Han, X., Pham, D.L., Tosun, D., Rettmann, M.E., Xu, C., Prince, J.L., CRUISE,: Cortical reconstruction using implicit surface evolution. NeuroImage 23, 997–1012 (2004) 9. Rueckert, D.: Image Registration Toolkit., wwwhomes.doc.ic.ac.uk/ dr/software 10. MacDonald, D., Kabani, N., Avis, D., Evans, A.: Automated 3D extraction of inner and outer surfaces of cerebral cortex from MRI. NeuroImage 12(3), 340–356 (2000) 11. Drury, H.A., Essen, D.C.V., Anderson, C.H., Lee, W.C., Coogan, T.A., Lewis, J.W.: Computerized mapping of the cerebral cortex: a multiresolution flattening method and a surface-based coordinate system. J. Cogn. Neuro. 8(1), 1–28 (1996)
Longitudinal Cortical Registration for Developing Neonates
135
12. Timsari, B., Leahy, R.: Optimization method for creating semiisometric flat maps of the cerebral cortex. In: Proceedings of the SPIE Conference on Medical Imaging: Image Processing, pp. 698–708 (2000) 13. Van Essen, D.C., Drury, H.A.: Structural and Functional Analyses of Human Cerebral Cortex Using a Surface-Based Atlas. The Journal of Neuroscience 17(18), 7079–7102 (1997) 14. Batchelor, P.G., Castellano Smith, A.D., Hill, D.L.G., Hawkes, D.J., Cox, T.C.S., Dean, A.F.: Measures of folding applied to the development of the human fetal brain. IEEE Transactions on Medical Imaging 21(8), 953–965 (2002) 15. Lee, S., Wolberg, G., Chwa, K.-Y., Shin, S.Y.: Image metamorphosis with scattered feature constraints. IEEE Trans. Visualization Comput. Graph. 2, 337–354 (1996)
16. Wahba, G.: Spline models for observational data. Soc. Industr. Applied Math. (1990)
Regional Homogeneity and Anatomical Parcellation for fMRI Image Classification: Application to Schizophrenia and Normal Controls Feng Shi1 , Yong Liu1 , Tianzi Jiang1 , Yuan Zhou1 , Wanlin Zhu1 , Jiefeng Jiang1 , Haihong Liu2 , and Zhening Liu2 1
2
Department of National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China Institute of Mental Health, Second Xiangya Hospital, Central South University, Changsha 410011, Hunan, China
Abstract. This paper presents a discriminative model of multivariate pattern classification, based on functional magnetic resonance imaging (fMRI) and anatomical template. As a measure of brain function, Regional homogeneity (ReHo) is calculated voxel by voxel, and then a widely used anatomical template is applied on ReHo map to parcelate it into 116 brain regions. The mean and standard deviation of ReHo values in each region are extracted as features. PseudoFisher Linear Discriminant Analysis (PFLDA) is performed for training samples to generate discriminative model. Classification experiments have been carried out in 48 schizophrenia patients and 35 normal controls. Under a full leave-one-out (LOO) cross-validation, correct prediction rate of 80% is achieved. Anatomical parcellation process is proved useful to improve classification rate by a control experiment. The discriminative model shows its ability to reveal abnormal brain functional activities and identify people with schizophrenia.
1 Introduction Schizophrenia is a chronic, severe, and disabling mental disorder that a ects about 1 percent of general population. This disorder usually appears in the late teens or early twenties and then persists for a lifetime. Schizophrenia is characterized as diverse clinical presentations, such as hallucinations, delusions, anhedonia, avolition and impaired cognitive functions [1]. Current available schizophrenia diagnosis is mainly based on clinical symptoms and medical history. More objective approaches are needed to help diagnose schizophrenia and further to be extended to identify its subtypes or other psychotic disorders, which sometimes have similar symptoms with schizophrenia. Functional neuroimaging studies have suggested the neural correlates of these clinical presentations. Due to the fact that brain functional activities not only exists when people perform specific tasks but also maintains in resting state, it is reasonable to hypothesize that abnormal brain activities also appears in schizophrenia when they are in resting state. Most current classification studies trying to distinguish psychiatry diseases from controls focused on structural images [2][3][4], but some recent studies also attempted to extract features from functional images [5][6]. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 136–143, 2007. c Springer-Verlag Berlin Heidelberg 2007
Regional Homogeneity and Anatomical Parcellation for fMRI Image Classification
137
As a mapping of brain spontaneous activity, regional homogeneity (ReHo) was proposed to measure the temporal similarity of fMRI signals in resting state[7]. In previous studies, ReHo was successfully employed to verify the default mode network and located the region of interests (ROIs) without prior knowledge in a brain functional connectivity research [8][9]. Decreased ReHo pattern were also reported in the patients with schizophrenia [10]. In this paper, ReHo was used as a measure of brain activity, and the result map was subdivided into 116 regions according to an anatomical template [12]. The spontaneous activity di erences between schizophrenia and normal controls will lead to di erent ReHo values in some specific brain regions, that is this classification algorithm based on. Features were extracted in each region and a classifier was then generated based on Pseudo-Fisher Linear Discriminant Analysis (PFLDA). The performance of the classifier was evaluated by using a leave-one-out (LOO) cross-validation approach. The remaining paper is organized as follows. Materials are presented in Section 2, a general description of our classification approach is in Section 3, experiments on schizophrenia and discussion on results are given in Section 4. We summarize this paper in Section 5.
2 Materials 2.1 Subjects 55 patients with schizophrenia and 36 controls participated in this study, and then 7 patients and 1 control were excluded according to the following analysis. The remaining 48 (26 males and 22 females, age 23.5 6.6 years) patients were recruited from the inpatient unit at Institute of Mental Health, Second Xiangya Hospital of Central South University. Confirmation of diagnosis was made for all patients by clinical psychiatrists, using the Structured Clinical Interview for DSM-IV, Patient version [11]. The duration of illness was 27.5 38.6 months. The majority of subjects (33 of 48 subjects) were receiving atypical antipsychotic medications and the chlorpromazine equivalent dose was 467.4 215.5mg. Patients were free of any concurrent psychiatric disorders and had no history of major neurological or physical disorders leading to altered mental state. 35 (20 males and 15 females, age 27.1 6.2 years) healthy subjects were recruited by advertisements as control group. All subjects were right-handed and were given written, informed consent prior to take part in the study, which was approved by the Medical Research Ethics Committee of the Second Xiangya Hospital, Central South University. 2.2 Image Acquisition and Preprocessing Imaging was performed on a 1.5-T GE scanner. Foam pads were used to limit head motion and reduce scanner noise. The fMRI scanning was carried out in darkness, and the participants were explicitly instructed to keep their eyes closed, relax, and to move as little as possible. Functional images were collected using a gradient-echo echo-planar sequence sensitive to BOLD contrast (TRTE 2000 40 ms, FA 90Æ , FOV 24 cm). Whole-brain volumes were acquired with 20 contiguous 5 mm thick transverse slices,
138
F. Shi et al.
with a 1 mm gap and 375 375 mm in-plane resolution. For each participant, the fMRI scanning lasted for 6 minutes. Subjects were excluded with larger than 1.5 mm maximum displacement in either of x,y,z directions or 1.5 degree of angular rotation. Image preprocessing was then performed using a statistical parametric mapping software package (SPM2, Wellcome Department of Imaging Neuroscience, London, UK). The first 10 volumes of each functional time series were discarded and the remaining 170 volumes were corrected for the acquisition delay between slices and for head motion. To further reduce the e ects of other possible source of artifacts, such as six motion parameters, linear drift and the mean time series of all voxels in the whole brain, a linear regression was performed after the fMRI images were normalized to the standard echo planar imaging template, and resampled to 3 3 3 mm3 . The fMRI data was temporally band-pass filtered (0.01-0.08 Hz) [13][14][15]
3 Methods 3.1 Regional Homogeneity As a measurement of regional coherence of brain spontaneous activity, Regional homogeneity (ReHo) was defined as the temporal similarity of the low-frequency fluctuations (LLF) in fMRI data [7]. The method was described in brief as follows. ReHo was calculated with Kendall’s coeÆcient of concordance (KCC) [16], which was assigned to each voxel by calculating the KCC of time series of this voxel with its neighbors: N (Ri )2 N(R)2 W i 11 (1) 2 3 12 K (N N) Ri
K
(2)
ri j
j 1
R (N 1)K 2
(3) th
Where W is the KCC, which ranges from 0 to 1; Ri is the sum rank of the i time point and ri j is the rank of the ith time point in the jth voxel; R is the mean of the Ri ; N is the number of time points of fMRI time series, here N 170; K is the number of one given voxel plus its neighbors, here K 27. An individual W map (ie. ReHo map) is then obtained on a voxel by voxel basis for each subject. 3.2 Anatomical Parcellation After registered to standard stereotaxic space in the preprocessing step, the fMRI volumes were segmented into 116 regions by masking the Automated Anatomical Labeling map (AAL) [12]. This template was validated and widely used in many previous studies [5][17][18][19][20]. This parcellation divided the cerebra into 90 regions (45 in each hemisphere) and the cerebella into 26 regions (9 in each cerebellar hemisphere and 8 in the vermis) as Table 1.
Regional Homogeneity and Anatomical Parcellation for fMRI Image Classification
139
Table 1. Cortical and subcortical regions defined in AAL template in standard stereotaxic space Superior frontal gyrus, dorsolateral Calcarine fissure Superior frontal gyrus, orbital Cuneus Superior frontal gyrus, medial Lingual gyrus Superior frontal gyrus, medial orbital Superior occipital gyrus Middle frontal gyrus Middle occipital gyrus Middle frontal gyrus, orbital Inferior occipital gyrus Inferior frontal gyrus, opercular Superior temporal gyrus Inferior frontal gyrus, triangular Temporal pole: superior Inferior frontal gyrus, orbital Middle temporal gyrus Olfactory cortex Temporal pole: middle Gyrus rectus Inferior temporal gyrus Anterior cingulated Heschl gyrus Precentral gyrus Fusiform gyrus Supplementary motor area Hippocampus Median cingulated Parahippocampal gyrus Rolandic operculum Amygdala
Postcentral gyrus Superior parietal lobule Inferior parietal lobule Supramarginal gyrus Angular gyrus Precuneus Paracentral lobule Posterior cingulate gyrus Caudate nucleus Putamen Pallidum Thalamus Insula Cerebellum hemisphere Vermis
3.3 Feature Extraction To distinguish the abnormal brain activity in schizophrenia from normal controls, a feature extraction method was proposed. The fMRI volumes were first performed to calculate Kendall’s coeÆcient of concordance voxel by voxel. The resulting ReHo values were then normalized to zero mean and unit variance of each subject to reduce total variance across subjects. The scaled ReHo map was then parcellated according to AAL template. The mean and standard deviation of ReHo value were calculated in each of 116 regions, so resulting into 232 measurements of each brain. Because the feature dimension was much higher than the training sample size, these measurements were processed with principle component analysis (PCA) and then projected into a lowerdimension space, and PC coeÆcients were obtained and taken as features. 3.4 Pseudo-fisher Linear Discriminant Analysis Fisher Linear Discriminative Analysis (FLDA), a widely used technique for pattern classification, is designed to project data from D dimensions onto an appropriate line on which projected samples are well separated [21][22]. Suppose that we have a set of n D-dimensional samples x, z t x
x (x1 x2 xn )t
(4)
Where z is the scalar dot product on the line, is the projective direction. Theoretically, this line can be found by maximizing the ratio of between-class separability to within-class variability. To this purpose, FLDA considers maximizing the following objective function: t S b (5) J() t S w
140
F. Shi et al.
Where S B is the between classes scatter matrix and S Matrix. The definitions of the scatter matrices are: Sb Sw
(m1 m2 )(m1 m2 )t
N
N
i 1
i 1
1
is the within classes scatter
(xi1 m1 )(xi1 m1 )t
2
(6)
(xi2 m2 )(xi2 m2 )t
(7)
Where m1 and m2 are mean feature vectors of each group, N1 and N2 are sample size of each group. Finally, the optimal can be determined by:
S 1(m1 m2 )
(8)
However, the number of features are always much higher than the number of total training samples in neuroimaging research (N1 N2 D, D is dimension of feature space). Computing inverse matrix of S will lead to an ill-posed problem because FDA will yield unreliable results in this condition. Dimension reduction is needed to preprocess the features. In Pseudo-Fisher Linear Discriminative Analysis, principal component analysis (PCA) is firstly applied on sample features x Ên , samples are then projected to a lower dimensional space and new features x Ên (n N1 N2 1) are generated. In this study, we have performed PCA step to solve this ill-posed problem in section 33. After this, the classical FLDA procedure can be performed and the projective direction will be obtained. After projecting data from D dimensions onto that line, the last thing is to define threshold on it, which is determined by: ¼
z0
(N1 mz1 N2 mz2 ) (N1 N2 )
(9)
Where mz1 and mz2 are the mean of projective scores of the two classes, respectively.
4 Experiments Results Our approach was implemented in 48 schizophrenia and 35 normal controls. To evaluate the performance of proposed discriminative model, a full leave-one-out (LOO) cross-validation was performed. One subject was first selected as test, and the remaining were trained for classification model. Pseudo-Fisher Linear Discriminant Analysis (PFLDA) was employed as classifier. By repeating leave each subject out for test, the average classification rate was obtained. Finally, whole experiments were repeated by choosing the first m (m N1 N2 1) PC coeÆcients as features to test the stability of results. Besides the test subject, we had total 82 subjects thus m would be chosen from 1 to 81. Classification results were shown in Fig 1, from which the best correct prediction performed on patients and controls were 83% and 74% when using 81 features, and the total correct prediction rate reached 80%. The top 10 regions were obtained by tracing the features with largest weight in fisher classifier and listed as Table 2. Among of them, Inferior frontal gyrus, Superior and
Regional Homogeneity and Anatomical Parcellation for fMRI Image Classification
141
0.9
correct rate
0.8 0.7 0.6 0.5 0.4
schizophrenia healthy control total average
0.3 0.2 0
10
20
30
40
50
60
70
80
feature number Fig. 1. Classification results. Circle line: correct prediction rate of schizophrenia. Triangle line: correct prediction rate of normal controls. Square line: average correct prediction rate of total subjects. Table 2. Top 10 discriminative regions Index 1 2 3 4 5
Region Superior parietal lobule, right Pallidum, right Hippocampus, left Amygdala, right Olfactory cortex, right
Feature Index Region Mean 6 Cerebellum hemisphere, right 7 Inferior parietal lobule, right Mean 8 Fusiform gyrus, right SD 9 Calcarine fissure, left Mean Mean 10 Inferior frontal gyrus, orbital, left
Feature Mean Mean SD SD Mean
Table 3. Classification results under leave-one-out Discriminative model Schizophrenia Normal controls Total Proposed method 83% 74% 80% Control method 81% 66% 74%
Inferior parietal lobule, Pallidum, and some regions in limbic system including Hippocampus, Amygdala and Olfactory cortex had been widely reported related with schizophrenia [23][24]. The Cerebellum hemisphere, not reported widely, has also considered to be involved in schizophrenia [18]. To evaluate the e ect of anatomical parcellation in classification process, a control experiment was designed. After obtaining ReHo map, PCA was directly performed for each subject and PC coeÆcients were taken as features. Di erent with the previous method, the parcellation step was removed in the control method, which was also the traditional way to extract features. Classification results were shown as Table 3, from which the best correct prediction performed on patients and controls were 81% and 66%, and the total correct prediction rate only reached 74%, which was obviously lower
142
F. Shi et al.
than the previous method. Therefore, the parcellation information was proved to be important to improve the classification accuracy.
5 Conclusion In this paper, a supervised multivariate classification method was proposed for distinguishing schizophrenia from normal controls by using the features containing both functional and anatomical information. Regional homogeneity of fMRI signals provided brain function information, and anatomical prior knowledge was given by AAL template. The experiment results indicated that the proposed method achieved satisfactory classification rate in this schizophrenia study. Compared with control experiment, anatomical parcellation process which took brain anatomical distribution into consideration was proved contribute to improve the discriminative power. The best prediction rate is 80%, however, this may not enough to meet the requirement of clinical applications. Nonlinear approaches and feature selection method could be investigated in further research for improving the classification performance. Combining with other types of features could also be considered to further improve the eÆciency of proposed discriminative model.
Acknowledgement This work was partially supported by the Natural Science Foundation of China, Grant Nos. 30425004, 60121302 and 30670752, and the National Key Basic Research and Development Program (973), Grant No. 2003CB716100.
References 1. Schultz, S.K., Andreasen, N.C.: Schizophrenia. Lancet 353(9162), 1425–1430 (1999) 2. Fan, Y., Shen, D.G., Gur, R.C., et al.: COMPARE: Classification Of Morphological Patterns using Adaptive Regional Elements. IEEE Trans. on Medical Imaging 26, 95–105 (2007) 3. Lao, Z.Q., Shen, D.G., Xue, Z., et al.: Morphological classification of brains via highdimensional shape transformations and machine learning methods. NeuroImage 21, 46–57 (2004) 4. Liu, Y.X., Teverovskiy, L., Carmichael, O., et al.: Discriminative MR Image Feature Analysis for Automatic Schizophrenia and Alzheimer’s Disease Classification. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 393–401. Springer, Heidelberg (2004) 5. Wang, K., Jiang, T.Z., Liang, M., et al.: Discriminative Analysis of Early Alzheimer’s Disease Based on Two Intrinsically Anti-correlated Networks with Resting-State fMRI. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 340–347. Springer, Heidelberg (2006) 6. Zhu, C.Z., Zang, Y.F., Liang, M., et al.: Discriminative Analysis of Brain Fuction at Restingstate for Attention-Deficit»Hyperactivity Disorder. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 468–475. Springer, Heidelberg (2005) 7. Zang, Y.F., Jiang, T.Z., Lu, Y.L., et al.: Regional homogeneity approach to fMRI data analysis. NeuroImage 22, 394–400 (2004)
Regional Homogeneity and Anatomical Parcellation for fMRI Image Classification
143
8. He, Y., Zang, Y.F., Jiang, T.Z., et al.: Detecting Functional Connectivity of the Cerebellum Using Low Frequency Fluctuations(LFFs). In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3217, pp. 907–915. Springer, Heidelberg (2004) 9. He, Y., Wang, L., Zang, Y.F., et al.: Regional coherence changes in the early stages of Alzheimer’s disease: a combined structural and resting-state functional MRI study. Neuroimage 35, 488–500 (2007) 10. Liu, H., Liu, Z., Liang, M., et al.: Decreased regional homogeneity in schizophrenia: a resting state functional magnetic resonance imaging study. Neuroreport 17, 19–22 (2006) 11. First, M.B., Spitzer, R.L., Gibbon, M., et al.: Structured Clinical Interview for DSM-IV Axis I Disorder-Patient Edition (SCID-I»P), Biometrics Research Department, New York State Psychiatric Institute, New York (1995) 12. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., et al.: Automated anatomical labelling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single subject brain. Neuroimage 15, 273–289 (2002) 13. Fox, M.D., Snyder, A.Z., Vincent, J.L., et al.: The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc. Natl. Acad. Sci. 102, 9673–9678 (2005) 14. Lowe, M.J., Phillips, M.D., Lurito, J.T., et al.: Multiple sclerosis: low-frequency temporal blood oxygen level dependent fluctuations indicate reduced functional connectivity initial results. Radiology 224, 184–192 (2002) 15. Greicius, M.D., Krasnow, B., Reiss, A.L., et al.: Functional connectivity in the resting brain: A network analysis of the default mode hypothesis. Proc. Natl. Acad. Sci. 100, 253–258 (2003) 16. Kendall, M., Gibbons, J.D.: Rank Correlation Methods. Oxford Univ. Press, Oxford (1990) 17. Achard, S., Salvador, R., Whitcher, B., et al.: A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs. J. Neurosci. 26, 63–72 (2006) 18. Liang, M., Zhou, Y., Jiang, T.Z., et al.: Widespread functional disconnectivity in schizophrenia with resting-state fMRI. Neuroreport 17, 209–213 (2006) 19. Salvador, R., Suckling, J., Coleman, M.R., et al.: Neurophysiological architecture of functional magnetic resonance images of human brain. Cereb Cortex 15, 1332–1342 (2005) 20. Salvador, R., Suckling, J., Schwarzbauer, C., et al.: Undirected graphs of frequencydependent functional connectivity in whole brain networks. Philos. Trans. R Soc. Lond. B Biol. Sci. 360, 937–946 (2005) 21. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, New York (2001) 22. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. on PAMI 19, 711–720 (1997) 23. Shenton, M.E., Dickey, C.C., Frumin, M., et al.: A review of MRI findings in schizophrenia. Schizophrenia Research 49, 1–52 (2001) 24. Fallon, J.H., Opole, I.O., Potkin, S.G.: The neuroanatomy of schizophrenia: circuitry and neurotransmitter systems 3, 77–107 (2003)
Probabilistic Fiber Tracking Using Particle Filtering Fan Zhang1 , Casey Goodlett2, Edwin Hancock1 , and Guido Gerig2 2
1 Dept. of Computer Science, University of York, York, YO10 5DD, UK Dept. of Computer Science, University of North Carolina at Chapel Hill, USA
Abstract. This paper presents a novel and fast probabilistic method for white matter fiber tracking from diffusion weighted MRI (DWI). We formulate fiber tracking on a nonlinear state space model which is able to capture both smoothness regularity of fibers and uncertainties of the local fiber orientations due to noise and partial volume effects. The global tracking model is implemented using particle filtering, which allows us to recursively compute the posterior distribution of the potential fibers. The fiber orientation distribution is theoretically formulated for prolate and oblate tensors separately. Fast and efficient sampling is realised using the von Mises-Fisher distribution on unit spheres. Given a seed point, the method is able to rapidly locate the global optimal fiber and also provide a connectivity map. The proposed method is demonstrated on a brain dataset.
1 Introduction White matter fiber tracking or ”tractography” estimates possible fiber paths by tracing the local fiber orientations. However, the local fiber orientations measured by DTI are not completely reliable due to both noise, partial volume effects and crossing fibers. To deal with this uncertainty, probabilistic fiber tracking methods have received considerable interest recently [1,2,3]. Instead of reconstructing the fiber pathways, they aim to measure the probability of connectivity between brain regions. These methods can be described in terms of two steps. Firstly, they model the uncertainty in fiber orientation measurements at each voxel using a probability density function (PDF) [1,3]. Secondly, the probabilistic tracking algorithms simply repeat a streamline propagation process 1000 ∼ 10000 times with propagation directions randomly sampled from the PDF. The fraction of the streamlines that pass through a voxel provides an index of the strength of connectivity between that voxel and the starting point. Most previous methods estimate the connectivity map by sampling directly from the PDF for fiber orientations. The sampling process is challenging, thus it is necessary to resort to MCMC methods [1] or to evaluate the PDF discretely with low angular resolution [3]. The intrinsic drawback of the previous methods is their computational complexity (often more than several hours on a modern PC [1,3]), and this is unacceptable in practice. In this paper, we propose a new probabilistic method for white matter fiber tracking. Our contributions are threefold: First, we formulate fiber tracking using a nonlinear state space model and recursively compute the posterior distribution using particle filtering [4]. The proposed model can capture both smoothness regularity of the fibers and the uncertainties of the local fiber orientations. Our second contribution concerns the PDF modeling of local fiber orientations from DTI. To do so, we build PDFs for prolate and N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 144–152, 2007. c Springer-Verlag Berlin Heidelberg 2007
Probabilistic Fiber Tracking Using Particle Filtering
145
oblate tensors separately. For prolate tensors, we characterise the uncertainty using the axially symmetric model [5]. For oblate tenors, we model the PDF using the normal distribution of the angle with the smallest eigenvectors of tensors. Finally, we use the von Mises-Fisher distribution to model prior and the importance density for sampling particles. This spherical distribution provides a natural way to model noisy directional data, and it can be efficiently sampled using the Wood’s simulation algorithm [6].
2 Tracking Algorithm The problem of fiber tracking is to extract the best possible fiber pathway from a predefined seed point. We formulate the problem using a nonlinear state space model. 2.1 Global Tracking Model A white matter fiber path can be modeled as a sequence of unit vectors Pn+1 = vˆ0:n = {ˆ v0 , ..., vˆn }. Let Y be the set of image data, and the data observed at vˆi is yi = Y(ˆ vi ) = Y(xi ). Our goal is to propagate a sequence of unit vectors that best estimates the true v0:i ) and the observation model p(Y|ˆ v0:i ). fiber based on prior density p(ˆ vi+1 |ˆ We assume that the tracking dynamics forms a Markov chain, so that p(ˆ vi+1 |ˆ v0:i ) = n p(ˆ vi+1 |ˆ vi ). Thus, the prior of the fiber path is p(ˆ v0:n ) = p(ˆ v0 ) i=1 p(ˆ vi |ˆ vi−1 ). Another assumption is thatthe diffusion measurements are conditionally independent given v0:n ) = r∈Ω p(Y(r)|ˆ v0:n ). We also assume that the measurement at a vˆ0:n , i.e p(Y|ˆ point does not depend on any points in the history of the path, i.e p(yi |ˆ v0:i ) = p(yi |ˆ vi ). v0:n |Y) can be expanded to Using the prior p(ˆ v0:n ), the posterior distribution p(ˆ p(ˆ v0:n |Y) = p(ˆ v0 |Y)
n
p(ˆ vi |ˆ vi−1 , Y).
(1)
i=1
Applying Bayes theorem, we have p(yi |ˆ vi )p(ˆ vi |ˆ vi−1 ) , (2) p(yi ) where p(yi ) is a constant regularity factor, i.e. p(yi ) = vˆi p(yi |ˆ vi )p(ˆ vi |ˆ vi−1 ). Most previous probabilistic methods [2,1] estimate the posterior p(ˆ v0:n |Y) by sampling streamline paths from p(yi |ˆ vi ). The sampling is difficult and time consuming. Moreover, it does often not take into account the smoothness constraint for fibers. In contrary, Friman vi−1 , Y). To avoid sampling difet al. [3] estimate the posterior by sampling from p(ˆ vi |ˆ ficulties, they discretise the problem using a finite set of directions. In addition to introducing errors, this discretised sampling is still very time consuming. Moreover, some simple sampling methods may degenerate as path becomes longer [4]. vi−1 , Y) = p(ˆ vi |ˆ
2.2 Recursive Posterior Using Particle Filtering We wish to estimate the posterior distribution iteratively in time. By inserting Equation (2) into Equation (1), we have v0 |Y) p(ˆ v0:n |Y) = p(ˆ
n i=1
p(ˆ vi |ˆ vi−1 )
n p(yi |ˆ vi ) i=1
p(yi )
,
(3)
146
F. Zhang et al.
where p(ˆ v0 |Y) is predefined. The modeling of the transition probability p(ˆ vi |ˆ vi−1 ) and the distribution p(yi |ˆ vi ) will be detailed in the next section. We recast the problem of tracking the expected fiber path as that of approximating the MAP path from the posterior distribution. It is straightforward to obtain the following recursive formula for the posterior from Equation (3) p(ˆ v0:i+1 |Y) = p(ˆ v0:i |Y)
vi )p(yi+1 |ˆ vi+1 ) p(ˆ vi+1 |ˆ . p(yi+1 )
(4)
Since the denominator contains a complex high-dimensional integral, it is not feasible to locate the maximum likelihood path analytically. Like methods discussed above, we evaluate the posterior using a large number of samples which efficiently characterise the posterior. Thus, the statistical quantities, such as the mean, variance and maximum likelihood, can be approximated based on the sample set. Since it is seldom possible to obtain samples from the posterior directly, we use the particle filtering to recursively compute a finite set of sample paths from the posterior based on the Equation (4). To sample a set of K paths, we set K particles at the starting location and allow (k) them to propagate as time progresses. Given the states of the set of particles {ˆ v0:i , k = 1, ..., K} at time i, the process of sequentially propagating the particles to the next time step i + 1 can be described in three stages. These are referred to as prediction, weighting and selection. Let π(ˆ v0:i |Y) be a so-called importance function which has a support including that of the posterior p(ˆ v0:i |Y). For our sequential importance sampling, suppose that we choose an importance function of the form [4] π(ˆ v0:n |Y) = n (k) π(ˆ v0 |Y) i=1 π(ˆ vi |ˆ vi−1 , Y). In the first prediction stage, each simulated path vˆ0:i with (k) index k is grown by one step to be vˆ0:i+1 through sampling from the importance function (k)
(k)
π(ˆ vi+1 |ˆ vi , Y). The new set of paths generally is not an efficient approximation of the posterior distribution at time i + 1. Thus, in the second weighting stage, we measure the reliability of the approximation using a ratio, referred to as the importance weight, be(k) (k) (k) (k) (k) v0:i+1 |Y)/(π(ˆ v0:i |Y)π(ˆ vi+1 |ˆ vi , Y)). tween the truth and the approximation wi+1=p(ˆ (k) (k) (l) K We are more interested in the normalised weights w ˜i+1 = wi+1 / l=1 wi+1 . Inserting (k)
(k)
Equation (4) and expression of wi+1 into the expression of w ˜i+1 , it goes as (k)
(k) (k) (k) vi+1 |ˆ vi )p(yi+1 |ˆ vi+1 ) (k) p(ˆ . (k) (k) π(ˆ vi+1 |ˆ vi , Y)
˜i w ˜i+1 ∝ w
(5)
The choice of importance function will be detailed in the next section. At this point the resulting weighted set of paths provides an approximation of the target posterior. (k) However, the distribution of the weights w ˜i+1 may becomes more and more skewed as time increases. The purpose of the last selection stage is to avoid this degeneracy. We measure the degeneracy of the algorithm using the effective sample size Nef f [4], i.e. (k) Nef f = 1/ K ˜i+1 )2 . When Nef f is below a fixed threshold Ns , then a resamk=1 (w pling procedure is used. The key idea here is to eliminate the paths or particles with (k) low weights w ˜i+1 and to multiply offspring particles with high weights. We obtain the surviving particles by resampling K times from the discrete approximating distribution (k) according to the importance weight set {w ˜i+1 , k = 1, .., K}.
Probabilistic Fiber Tracking Using Particle Filtering
147
Both fiber reconstruction and connectivity map can be easily solved based on the discrete distribution of the posterior. The MAP estimate of the true fiber is the path with the maximal importance weight. The probability of connectivity between x0 and a specific voxel is computed as the fraction of particles that pass through that voxel.
3 Algorithm Ingredients In this section, we give the details of the local ingredients of the global tracking model. 3.1 Observation Density Let λ1 ≥ λ2 ≥ λ3 ≥ 0 be the decreasing eigenvalues of D and eˆ1 , eˆ2 , eˆ3 be the corresponding eigenvectors. We can classify the diffusion tensors into prolate tensors (λ1 > (λ2 ≈ λ3 )) and oblate tensors ((λ1 ≈ λ2 ) < λ3 ) by using fractional anisotropy (FA) [7] and the metric proposed by Westin et al. [8], i.e. cl = (λ1 − λ2 )/ λ21 + λ22 + λ23 . In the case of prolate tensors, we assume that a single dominant diffusion direction, eˆ1 , is collinear with the true underlying fiber orientation vˆ. Borrowing ideas from [5], we suppose the prolate tensor is axially-symmetric. Let λ⊥ be the diffusivity of directions perpendicular to vˆ. Then, the diffusion along a gradient direction gˆj can be written as ¯ − λ⊥ ), where λ ¯ = tr(D)/3. By inserting this expres[5] gˆjT Dˆ gj = λ⊥ + 3(ˆ v · gˆj )2 (λ 2 ¯ sion into the Stejskal-Tanner equation [7], we have sj = s0 e−bj (λ⊥ +3(ˆv·ˆgj ) (λ−λ⊥ )) , where bj is the scanner parameter and s0 is the intensity of the baseline image. Due to noise, the intensity uj measured by the scanner is a noisy observation of the true intensity sj . In [9], Salvador et al. showed j = log(uj ) − log(sj ) ∼ N (0, −1 j ), where j = sj /σj is the signal-to-noise ratio (SNR). Let the intensities observed at i be yi = {u0 , u1 , ..., uM }. Then the observation density for prolate tensors is given as vi ) = p(yi |ˆ
M 2 2 j (log uj −log sj ) j 2 √ e− . 2π j=1
(6)
Panels (a) and (c) of Fig. 1 show two examples. The figure tells that the orientation distribution is very concentrated when its F A and cl are relatively large. In the case of oblate tensors, the dominant direction of diffusion is ambiguous and Equation (6) is inappropriate. It is possible that the plane defined by eˆ1 and eˆ2 contains High
Low
(a)
(b)
(c)
Fig. 1. Examples of observation density. (a) a prolate tensor F A = 0.9299, cl = 0.9193. (b) a prolate tensor F A = 0.3737, cl = 0.3297. (c) an oblate tensor F A = 0.7115, cl = 0.2157.
148
F. Zhang et al.
several crossing fiber tracts. In this case, we represent the fiber orientation vˆ in spherical coordinates. Let θ be the polar angle from the eˆ3 -axis, i.e. θ = arccos(ˆ v · eˆ3 ), and ψ be the azimuth angle. The vector vˆ is mainly distributed on the plane spanned by eˆ1 and eˆ2 . Hence, we choose the distribution of θ to be normal with mean π/2 and standard deviation σ. The azimuth ψ is assumed to have a uniform distribution over [0, 2π]. Thus, our fiber orientation distribution for oblate tensors is given by p(yi |ˆ v) =
(arccos(ˆ v · eˆ3 ) − π/2)2 1 1 √ exp(− . )· 2 2σ 2π σ 2π
(7)
Panel (c) of Fig. 1 shows an example of the density of an oblate tensor in white matter. 3.2 Prior Density vi ) specifies a prior distribution for the change in fiber The transition density p(ˆ vi+1 |ˆ direction between two successive steps. Here, we adopt a model of the prior density based on the von Mises-Fisher (vMF) distribution [10] over a unit sphere. For a ddimensional unit random vector x, the vMF distribution is given by fd (x; μ, κ) =
κd/2−1 (2π)d/2 Id/2−1(κ)
exp(κμT x),
(8)
where κ ≥ 0, μ = 1, and Id/2−1 (·) denotes the modified Bessel function of the first kind and order d/2 − 1. The density fd (x; μ, κ) is parameterised by the mean direction vector μ and the concentration parameter κ. The greater the value of κ the higher the concentration of the distribution around the mean direction μ. The distribution is rotationally symmetric around the mean μ, and is unimodal for κ > 0. In our case, the directions are defined on a two dimensional unit sphere in R3 , i.e. d = 3. Thus, we choose our prior density as the vMF distribution with mean vˆi and concentration parameter κ, i.e. p(ˆ vi+1 |ˆ vi ) = f3 (ˆ vi+1 ; vˆi , κ). (9) The value of the concentration parameter κ here controls the smoothness regularity of the tracked paths. It is set manually to optimally balance the prior constraints on smoothness against the evidence of vi+1 observed from the image data. 3.3 Importance Density Function As discussed in Doucet et al. [4], the optimal importance density is p(ˆ vi+1 |ˆ vi , Y). However, it is difficult for sampling. Thus, our aim is to devise an suboptimal importance function that best represents p(yi+1 |ˆ vi+1 )p(ˆ vi+1 |ˆ vi ) subject to the constraint that it can be sampled from. A most popular choice is to use the prior distribution as the importance function, i.e π(ˆ vi+1 |ˆ vi , Y) = f3 (ˆ vi+1 ; vˆi , κ). (10) The vMF distribution in Equation (9) can be efficiently sampled from using the simulation algorithm developed by Wood [6]. However, the prior importance function is
Probabilistic Fiber Tracking Using Particle Filtering
149
not very efficient. Since no observation information is used, the generated particles are often outliers of the posterior distributions. Indeed, if the diffusion tensor at vˆi is prolate, then the movement to the state vi+1 is mainly attributable to the fiber orientation distribution, which is difficult to sample from. To overcome this problem, we model the observation density in Equation (6) using the vMF distribution. Since we use an axially symmetric tensor model, the distribution in Equation (6) is also rotationally symmetric around the direction of largest probability (see Fig. 1). We thus use the leading eigenvector, eˆi1 , of tensor Di as the mean direction. We have found experimentally that eˆi1 is almost identical to the direction of maximum probability in Equation (6). The average difference between them is less than 2◦ due to a test on 1000 prolate tensors from brain MRI data. The concentration parameter νi at each state vˆi is set to νi = 90 × cl (Di ). This choice is based on empirical trial and error. A better way is to fit the vMF distribution to Equation (6) using the algorithm presented in [11]. However, it would need more computation time. Moreover, particle filtering requires an importance density that is close but not necessarily identical to the observation density. Therefore, for prolate tensors we set the importance density as vi , Y) = f3 (ˆ vi+1 ; eˆi1 , νi ). π(ˆ vi+1 |ˆ
(11)
For oblate tensors, since the observation density in Equation (7) is wide, we can still use the prior as the importance density given in Equation (10). 3.4 Algorithm Outline (k)
Given K particles at step i: vˆ0:i , k = 1, ..., K, the iteration steps is summarised as (k)
– compute diffusion tensor Di for each particle k – Prediction: for k = 1, ..., K (k) (k) • if Di is a prolate tensor, sample vˆ∗ i+1 using Equation (11) (k) (k) • if Di is a oblate tensor, sample vˆ∗ i+1 using Equation (10) – Weighting: for k = 1, ..., K (k) • if prolate, compute w ˜i+1 from Equation (5) using Equation (6), (9) and (11) (k) • if oblate, compute w ˜i+1 from Equation (5) using Equation (7), (9) and (10) – Selection: normalise all the weights and evaluate Nef f (k) (k) • If Nef f ≥ Ns , then for k = 1, ..., K, vˆi+1 = vˆ∗ i+1 • If Nef f < Ns , then for k = 1, ..., K, sample an index z(k) from discrete distribution (z(k)) (k) (k) (k) ˜i+1 = N1 {w ˜i+1 }k=1,..,K , and set vˆi+1 = vˆ∗ i+1 , w
4 Experimental Results We have tested the algorithm on a brain dataset with size 128×128×58 and 2×2×2mm resolution. A six-direction gradient scheme was used with b = 1000s/mm2. Since our particles propagate in a continuous domain, we choose the trilinear method in [12] for interpolation. A step length of 1mm and 5000 particles were used for all examples. The propagation of a particle is stopped when it exits white matter (FA<0.2). We distinguish prolate tensors (cl > τ ) and oblate tensors (cl ≤ τ ) by using a threshold τ = 0.27.
150
F. Zhang et al.
Fig. 2(a) shows the trajectories of 1000 particles seeded from a point in the Corpus callosum. The figure shows that the sampled paths provide a robust delineation of the expected fiber bundle. Fig. 2(b) gives another example with two seed points in the Cingulum bundles. This example reveals how our probabilistic algorithm is able to handle splitting fibers and ambiguous neighborhoods. Fig. 2(c) shows the global optimal MAP paths of the examples in Fig. 2(a) and Fig. 2(b). We also compared our result with that of Friman’s method using the same seed points in Cingulum bundles, as shown in Fig. 2(c). In our method, particles with low probability of existence are eliminated during the resampling stage, and the sampled paths are most concentrated around the final optimal fiber. In contrast, the sampled paths of the Friman’s method are more dispersed, with a number of paths which have low probabilities. Thus, our method samples more representative paths surrounding the optimal candidate. Moreover, our algorithm runs
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 2. (a): 1000 particle traces from a seed point in Corpus callosum. (b): from two seed points in the left and right Cingulum bundles. (c): Optimal MAP paths of (a) and (b). (d): 1000 path samples using Friman’s method [3] from the same seed points as in (b). (e): Zoomed particle traces of two seed points from the MAP path of example (a). (f): Optimal MAP paths of (e).
much faster than Friman’s algorithm (more than 30 times faster due to our MATLAB implementation). To further evaluate the algorithm, we set two seed points from the MAP path of the example in Fig. 2(a) and let the algorithm track from them toward each other. Fig. 2(e) shows 1000 sample paths from each seed point. The figure tells that the sampled paths from two seed points are almost overlapped with each other. Fig. 2(f) gives their two optimal MAP paths, which are very close to each other. Thus, the second seed point can successfully go back to the first one along with the MAP path. This example shows that the performance of our algorithm is robust and stable. On the other hand, based on the particle traces, we can calculate the probability of connection between the seed voxel and a specific voxel by computing the fraction of particles passing through that voxel. We thus can produce a probability map of connections between the seed and all other voxels. In Fig. 3(a), we show the probability map depicted from
Probabilistic Fiber Tracking Using Particle Filtering
151
1.0 0.8
0.6 0.4
0.2
0
(a)
(b)
(c)
Fig. 3. Probability map of our algorithm from (a): a seed point in the Corpus callosum, and, (b): from two seed points in the Cingulum bundles. (c): Probability map of Friman’s method from the same seed points as in (b).
a seed point in Corpus callosum. The coloring shows the belief of our algorithm that a fiber initiated at the seed voxel reaches respective voxel. Fig. 3(b) gives a probability map of longer fiber tracts seeded from Cingulum bundles. The result here is compared to that of Friman’s method, as shown in Fig. 3(c), which gives a wider distribution.
5 Conclusion We have presented a new method for probabilistic white matter fiber tracking. The global tracking model is formulated using a state space framework, which is implemented by applying particle filtering to recursively estimate the posterior distribution of fibers and to locate the global optimal fiber path. Each ingredient of the tracking algorithm is detailed. Fiber orientation distribution is formulated in a theoretical way for both prolate and oblate tensors. Fast and efficient sampling is realised using the vMF distribution. As a consequence, there is no need to apply MCMC sampling [1] or to discretise the fiber orientation distribution [3] for sampling paths. Unlike previous methods [1,3] which are computationally expensive, our method is able to rapidly locate the global optimal fiber and compute the connectivity map for a given seed point.
References 1. Behrens, T., Woolrich, M., Jenkinson, M., Johansen-Berg, H., Nunes, R., Clare, S., Matthews, P., Brady, J., Smith, S.: Characterization and Propagation of Uncertainty in Diffusion-Weighted MR Imaging. Magn. Reson. Med. 50, 1077–1088 (2003) 2. Bjornemo, M., Brun, A., Kikinis, R., Westin, C.: Regularized Stochastic White Matter Tractography Using Diffusion Tensor MRI. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 435–442. Springer, Heidelberg (2002) 3. Friman, O., Farneback, G., Westin, C.: A Bayesian Approach for Stochastic White Matter Tractography. IEEE Trans. on Med. Imag. 25(8), 965–978 (2006) 4. Doucet, A., de Freitas, N., Gordon, N. (eds.): Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001) 5. Anderson, A.: Measurement of Fiber Orientation Distributions Using High Angular Resolution Diffusion Imaging. Magn. Reson. Med. 54, 1194–1206 (2005)
152
F. Zhang et al.
6. Wood, A.T.A.: Simulation of the von Mises-Fisher distribution. Communications in Statistics. Simulation and Computation 23, 157–164 (1994) 7. Basser, P., Mattiello, J., LeBihan, D.: MR diffusion tensor spectroscopy and imaging. Biophysical Journal 66, 259–267 (1994) 8. Westin, C., Maier, S., Mamata, H., Nabavi, A., Jolesz, F., Kikinis, R.: Processing and visualization for diffusion tensor MRI. Medical Image Analysis 6, 93–108 (2002) 9. Salvador, R., Pena, A., Menon, D., Carpenter, T., Pickard, J., Bullmore1, E.: Formal Characterization and Extension of the Linearized Diffusion Tensor Model. HBM 24, 144–155 (2005) 10. McGraw, T., Vemuri, B., Yezierski, R., Mareci, T.: Segmentation of High Angular Resolution Diffusion MRI Modeled as a Field of von Mises-Fisher Mixtures. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 463–475. Springer, Heidelberg (2006) 11. Hill, G.: Algorithm 571: Statistics for von Mises’ and Fisher’s Distributions of Directions. ACM Trans. on Math. Software 7, 233–238 (1981) 12. Zhukov, L., Barr, A.: Oriented Tensor Reconstruction: Tracing Neural Pathways from Diffusion Tensor MRI. In: Proc. IEEE Visualization, pp. 387–394. IEEE Computer Society Press, Los Alamitos (2002)
SMT: Split and Merge Tractography for DT-MRI U˘gur Bozkaya and Burak Acar ˙ Bo˘ gazi¸ci University, Electrical & Electronics Eng. Dept., VAVlab, Istanbul, Turkey
[email protected],
[email protected] www.vavlab.ee.boun.edu.tr
Abstract. Diffusion tensor magnetic resonance imaging (DT-MRI) based fiber tractography aims at reconstruction of the fiber network of brain. Most commonly employed techniques for fiber tractography are based on the numerical integration of the principal diffusion directions. Although these approaches generate intuitive and easy to interpret results, they are prone to cumulative errors and mostly discard the stochastic nature of DT-MRI data. The proposed Split & Merge Tractography (SMT) technique aims at overcoming the drawbacks of fiber tractography by incorporating it with Markov Chain Monte Carlo techniques. SMT is based on clustering diversely distributed short fiber tracts based on their inter-connectivity. SMT also provides real-time interaction to adjust a user defined confidence level for clustering.
1
Introduction
Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) is the unique modality that allows in-vivo imaging of the nervous network in brain. The data is a symmetric, positive semi-definite second-order tensor field that is a second order approximation of the physical diffusion process locally. The principal eigenvectors of the tensors are shown to be aligned with the underlying fibers in regions of anisotropic diffusion. However, the DT-MRI data is derived from a set of Diffusion Weighted Magnetic Resonance Imaging (DWI) data. DWI data is acquired using diffusion weighting gradient magnetic fields, G, in addition to the constant field, B0 . G causes the MR signal to attenuate due to the diffusion along this magnetic field gradient [1, 2, 3]. It is of utmost importance to understand what the DT-MRI data represents in order to develop adequate analysis and visualization methods. DT-MRI represents a macro view of the diffusion process within a finite voxel volume as observed by DWI. Neither the accuracy of the second order approximation, nor its spatial resolution is adequate to represent individual fibers. A novel approach to increase the accuracy of this approximation by using higher order tensors was proposed by Liu et al. [4]. The so called Generalized Diffusion Tensor Imaging (GDTI) has not been put in practice and will not be discussed here. The two major approaches to DT-MRI analysis and visualization are Fiber Tractography and Connectivity Mapping. The former approach solely N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 153–160, 2007. c Springer-Verlag Berlin Heidelberg 2007
154
U. Bozkaya and B. Acar
relies on the numerical integration of the principal diffusion direction (PDD, the major eigenvector of the diffusion tensor) and attempts to reconstruct the fiber that passes through a given point [5]. The most popular method is the 4th order Runge-Kutta [6]. These approaches are prone to cumulative errors and most of them oversee the stochastic nature of the underlying data [7, 8]. The latter approach attempts to utilize the true nature of the DT-MRI data, i.e. the second order approximation of the physical diffusion process, by estimating a connectivity map. They consider each and every possible connection between neighbouring voxels with weights set by the dataset. Several approaches in this group are based on Monte-Carlo simulations of the random walk model [9, 10, 11]. Lenglet et al., on the other hand, recasted the connectivity problem to the Riemannian differential geometry framework where they defined their local metric tensor using the DTI data and solved for geodesics [12]. The most important point that differentiates these two approaches is their response at problematic regions such as crossing, kissing and branching fibers. The tractography methods either pretend to follow a single fiber by choosing a direction to proceed at such points or stop tracking. These methods do not allow for a user interface to set a confidence level despite the nature of the data. The connectivity mapping methods, on the other hand, lets the results be interpreted with respect to some confidence definition. They also allow for branching. Although single fibers do not branch, fiber bundles do and this makes branching while tracking a necessity due to the low spatial resolution of DT-MRI data. Thus, the connectivity mapping is a more direct way of communicating the stochastic and structural information embedded in the data than conventional fiber tractography. Yet the computational cost of connectivity mapping is high and their interpretation is not straight forward. The Split & Merge Tractography (SMT) method proposes a compromise between tractography, which greatly disregards the stochastic nature of the data and accumulates error, and connectivity mapping, which is computationally costly and hard to interpret. SMT is based on clustering short fiber tracts using a Monte Carlo Markov Chain (MCMC) approach. Using short tracts prevents error accumulation, while the MCMC provides a stochastic framework in which we can define a confidence level for the clusters, allowing the user to investigate the data in detail.
2
Method
Split & Merge Tractography (SMT) [13, 14] is a MCMC technique that is used to estimate the unknown distribution of fiber tracts. However, unlike previously proposed methods that exploit the stochastic nature of DT-MRI data for fiber tractography [15, 16], the output of SMT is not the full tracts but rather clusters of short tracts. The underlying rationale behind this is to avoid the error accumulation. The short tracts are computed by the numerical integration of the PDD field using the 4th order Runge-Kutta method [6]. SMT avoids such error accumulation
SMT: Split and Merge Tractography for DT-MRI
155
Fig. 1. A bridge is built from the current short tract Si (initially a seed tract) to Sj , (1) which is selected based on the Gaussian PDF described by Di
in PDD tracking by using short tracts. PDD tracking is started from each voxel unless that voxel is on a previously computed short tract. The maximum length of the short tracts is set to 2.8mm, the tracking is terminated when the Fractional Anisotropy is below 0.25 or the curvature exceeds 20◦ per step. This is the splitting step where the whole brain is populated by short tracts. The merging step is composed of estimating a co-occurrence matrix, M , for this abundant set of short tracts. A single element of M , namely Mij , represents the probability of having the short tracts Si and Sj in the same cluster. The MCMC techniques get into play at this stage of SMT. Let Si be a short tract. Let Γi be a cluster of short tracts that includes Si , i.e. a set of short tracts that are on the same fiber. Then, SMT aims at estimating Mji = P (Sj ∈ Γi |Si ∈ Γi ) , Mij = Mji , i, j = 1, · · · , N
(1)
where N is the total number of short tracts that populates the complete brain. (k) Let Si ; i = 1, · · · , N ; k = 1, 2 represent the k th endpoint of Si , without any specific ordering of endpoints. For a short tract Si , a bridge is built between (1) (k) (1) Si and the Sj with the highest probability of being connected to Si . If we (k)
(k)
denote the position of Si with ri and the diffusion tensor at that position (k) (1) (k) with Di , then the probability of bridging ri and rj can be approximated by ci→j =
(1) (k) P (ri , rj )
1 3 =( ) 2 exp ˜ 4π D (1)
˜= D
(ri
(1)
−ri
(k)
(k)
− rj 2 ˜ 4π D (1)
(1)
− rj )T Di (ri (1)
ri
(2) (k)
− rj )
(k)
− rj 2 (1)
This is the Gaussian distribution as represented by Di . Without loss of generality, let all bridges originate from the first end-point (k = 1) and terminate in (1) the second end point (k = 2). We repeat the whole process starting from rj , until no bridge with high enough (an arbitrarily small threshold, ) probability can be built. The whole process is repeated starting from the second end-point (2) of Si , namely Si (backward clustering). Finally, we get the initial cluster for a given Si . Figure 1 depicts bridging from Si to Sj .
156
U. Bozkaya and B. Acar
This initial cluster is a sample from the distribution of all clusters that include Si . Let us denote this cluster by Γi0 . We then increment Mij and Mji by one for all j such that Sj ∈ Γi0 . The whole process is iterated K times, generating {Γi0 , ..., ΓiK−1 }. Consecutive iterations are performed by breaking the weakest bridge, building a new one at that location and completing the rest of clustering as explained above. Our goal is to estimate the distribution of such clusters, equivalently, to approximate the probability distribution function (PDF) of the connectivity of Si to all other short tracts. Connectivity between Si and Sj is proportional to the probability of the existence of a cluster that includes them both. An approximation to this PDF is the histogram as represented by Mij , j = {1, ..., N }. We used the Metropolis-Hastings algorithm (MHA) to populate the aforementioned histogram [17]. The principal components of MHA are i) a sampling strategy, ii) a sample fitness function, f (.), iii) a candidate generating density, q(., .), which is the (m) denote probability of generating a new sample from a given sample. Let Γi th the m sample selected from all clusters that include Si . The corresponding SMT components are as follows: (m)
1. Sampling Strategy: Given a cluster of short tracts, Γi , the weakest bridge (1) (2) is identified. The strength of a bridge between rp and rq is represented (1) by the Fractional Anisotropy (FA) of Dp because the reliability of PDD (1) (1) tracking decreases with FA. Let us denote the FA at Dp with Fp . Remov(m) ing the weakest bridge, the section of Γi that includes Si , is retained. A (1) new bridge between rp and one of its neighbours is built at random and a new cluster is formed beyond the new bridge. Let the new bridge be built (1) (2) between rp and rw . (m) 2. Sample (Short Tract Cluster) Fitness: The fitness of a sample Γi , i.e. (m) f (Γi ) is chosen to be the minimum of the strengths of its bridges because a cluster’s reliability is dominated by its weakest bridge. 3. Candidate Generating Density: Probability of generating a new sample candidate cluster from a given one is formulated as the product of the probability of removing the weakest bridge and building a new one. It is given as, (1)
1/Fp
cp→w cp→z z∈B Prob. of building a bridge Prob. of removing a bridge (3) (1) where Fp is the fitness of the removed bridge, cp→w is the probability of the (1) newly built bridge originating from rp , A is the set of short tract indices (m) that belong to Γi and B is the set of short tract indices that are in the (1) neighbourhood of rp . (m)
q(Γi
(m+1)
, Γi
)=
(1) j∈A 1/Fj
×
SMT: Split and Merge Tractography for DT-MRI
157
For a given seed tract Si , the MHA is iterated. The newly generated sample at each iteration is accepted with a probability given as, (m+1) (m) (m+1) )q(Γi , Γi )) f (Γi (m) (m+1) (4) α(Γi , Γi ) = min 1, (m) (m+1) (m) f (Γi )q(Γi , Γi ) (m+1)
(m+1)
If Γi is accepted, then we increment Min , Mni ∀Sn ∈ Γi , otherwise, (m) we increment Min , Mni ∀Sn ∈ Γi by one. The number is iterations, K, is empirically determined to be 100. The whole process repeated to build M by taking each short tract as the seed tract. The co-occurrence matrix M is computed and saved off-line. It represents the whole brain connectivity. The user is required to select a volume of interest to mark a set of seed tracts and a confidence threshold, τ . For each seed tract Si in the volume interest, all Sj ’s with Mij ≥ τ × K are selected and displayed. The interface is similar to the dynamic queries interface proposed in [18].
3
Results
We used real patient DT-MRI data for the initial validation of the SMT method. The scans were single-shot EPI scans with diffusion encoding along 12 noncollinear directions plus one reference without diffusion-weighting. The FOV was 25-26cm, TE was minimum with partial k-space acquisition. TR was ∼ 10s and b-value was ∼ 850s/mm2. Seed tracts are selected with a spherical volume of interest (VOI) on the left side of the corpus callosum / optic radiation of a healthy individual, as marked with circles in Figures 2a and 2b. Seed volumes are identical for both images. The confidence threshold, τ , for Figures 2 a and b are 0.0 and 0.1, respectively. Note the decrease of the number of short tracts with increasing confidence. A second set of seed tracts are selected with a spherical VOI at the inferior part of the cortico-spinal tracts of the same healthy individual as shown in Figures 2c and 2d. The VOI covers both the left and the right sides. The cortico-spinal tracts are known to spread as they extend to the superior regions. The confidence threshold, τ , for Figures 2c and 2d are 0.0 and 0.3, respectively. In addition to the effect of τ , we can also observe the branching that SMT allows. The final set of seed tracts are selected with a spherical VOI in the inferior longitudinal fasciculus region, close to the uncinate fasciculus, as marked with circles in Figures 2e and 2f. The confidence threshold, τ , is 0.0 and 0.6, respectively. Computation of the short tracts (13540 short tracts for the current dataset) throughout the brain and the co-occurrence matrix M takes approximately 3 hours on PC with Pentium 4 (2.4GHz) and 1.5GB RAM. This computation is performed once for each dataset in batch mode and M is saved. Visualization and analysis of the data based on the computed M is a real-time application.
158
U. Bozkaya and B. Acar
(a)
(c)
(b)
(d)
(e)
(f)
Fig. 2. Three sets of seed tracts, one for each row, in different regions of the brain of a healthy human are selected and the corresponding short tract clusters with different confidence levels (low for the left images, high for the right images) are displayed
SMT: Split and Merge Tractography for DT-MRI
4
159
Discussion
The SMT method combines the intuitive interpretation of conventional fiber tractography with the stochastic approach of connectivity analysis using a Markov Chain Monte Carlo (MCMC) framework. It is based on estimating the PDF of the cluster of short tracts connected to a given a seed tract using the Metropolis-Hastings algorithm [17]. The advantage of using short tracts is the intuitive user interface that it provides and the negligible tracking error accumulation. SMT displays all short tracts connected to a given seed tract with a probability higher than the user set confidence threshold. The interpolation of the complete tracts is left to the human visual system. The efficiency of this interpolation increases with the increasing density of short tracts. This approach provides a direct way to present the information content of DT-MRI data by explicitly displaying the possibilities at problematic regions, such as kissing and crossing fibers. The MCMC framework, on the other hand, exploits the stochastic nature of DT-MRI data. The data is based on the second order approximation of the total diffusion of water molecules within a finite subvolume (the voxel) in a given direction (as determined by the diffusion weighting gradient fields), observed through an attenuation in MR signal received. In other words, the computed diffusion tensors represent the probabilistic spatial distribution of diffusing molecules in a given voxel within a given time period. Consequently, it is more accurate to consider this distribution, as done in connectivity analysis, than to consider the principal diffusion direction only, as mostly done in conventional fiber tractography. The Metropolis-Hastings algorithm utilizes this information in estimating the clusters of short tracts. It, thus, allows for branching, merging and crossing pathways. Although the computational cost of computing the co-occurrence matrix (M ) is high, it is performed once for each dataset, in batch mode and independent of any VOI. A single M matrix describes the connectivity throughout the brain. We have used 100 iterations of the Metropolis-Hastings algorithm. Increasing the number of iterations would increase the accuracy, yet we have not observed significant differences in the results when the number of iterations is increased beyond 100. The examination of DT-MRI data is based on dynamic queries that define VOIs and is real-time [18]. The SMT method proposes a framework based on clustering short fiber tracts with Markov Chain Monte Carlo techniques, specifically with the MetropolisHastings Algorithm. We have presented the underlying model and preliminary results. Neither the proposed sampling strategy nor the sample fitness or the candidate generating density is claimed to be the optimum choices. Different tractography methods can be developed within the SMT framework simply by using different models for these components. Research on variations of SMT, its performance with high b-value data and a thorough clinical evaluation is left as future work.
Acknowledgments ¨ ITAK ˙ ˙ This work was in part supported by TUB KARIYER-DRESS (104E035) project, Bo˘ gazi¸ci University B.A.P. DTIsuite (07A203) project and EU 6th FP
160
U. Bozkaya and B. Acar
SIMILAR NoE. We thank to Prof. Roland Bammer from Stanford University for his comments and providing the DT-MRI data.
References 1. Basser, P., Mattiello, J., LeBihan, D.: Estimation of the effective self-diffusion tensor from the NMR spin echo. J. Magn. Reson. B 103, 247–254 (1994) 2. Basser, P.: Inferring microstructural features and the physiological state of tissues from diffusion-weighted images. NMR Biomed. 8, 333–344 (1995) 3. Basser, P., Pierpaoli, C.: Microstructural and physiological features of tissues elucidated by quantitative diffusion tensor mri. J. Magn. Reson. B 111, 209–219 (1996) 4. Liu, C., Bammer, R., Acar, B., Moseley, M.: Characterizing non-gaussian diffusion by using generalized diffusion tensors. Magnetic Resonance in Medicine 51, 924–937 (2004) 5. Basser, P., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber tractography using dt-mri data. Magn. Reson. Med. 44, 625–632 (2000) 6. Tench, C., Morgan, P., Wilson, M., Blumhardt, L.: White matter mapping using diffusion tensor MRI. Magnetic Resonance in Medicine 47, 967–972 (2002) 7. Bammer, R., Acar, B., Moseley, M.: In vivo mr tractography using diffusion imaging. European J. Radiology 45, 223–234 (2003) 8. Lazar, M., Alexander, A.: An error analysis of white matter tractography methods: synthetic diffusion tensor field simulations. Neuroimage 20, 1140–1153 (2003) 9. Koch, M., Norris, D., Hund-Georgiadis, M.: An investigation of functional and anatomical connectivity using magnetic resonance imaging. Neuroimage 16, 241– 250 (2002) 10. Hagmann, P., Thiran, J., Vandergheynst, P., Clarke, S., Meuli, R.: Statistical fiber tracking on DT-MRI data as a potential tool for morphological brain studies. In: ISMRM Workshop on Diffusion MRI: Biophysical Issues (2000) 11. Chung, M., Lazar, M., Alexander, A., Lu, Y., Davidson, R.: Probabilistic connectivity measure in diffusion tensor imaging via anisotropic kernel smoothing. Technical Report 1081, University of Wisconsin (2003) 12. Lenglet, C., Deriche, R., Faugeras, O.: Diffusion tensor magnetic resonance imaging: Brain connectivity mapping. Technical Report 4983, INRIA, Sophia-Antipolis, France (2003) 13. Bozkaya, U.: SMT: Split/merge fiber tractography for MR-DTI. Master’s thesis, Bogazici University, Biomedical Engineering Institute, Istanbul, Turkey (2006) 14. Bozkaya, U., Acar, B.: SMT: Split/merge fiber tractography for MR-DTI. In: ESMRMB 2006, Warsaw, Poland (2006) 15. Hagmann, P., Thiran, J., Jonasson, L., Vandergheynst, P., Clarke, S., Maeder, P., Meuli, R.: DTI mapping of human brain connectivity: statistical fibre tracking and virtual dissection. Neuroimage 19, 545–554 (2003) 16. Lazar, M., Alexander, A.: Bootstrap white matter tractography (BOOT-TRAC). Neuroimage 24, 524–532 (2005) 17. Chib, S., Greenberg, E.: Understanding the Metropolis-Hastings algorithm. The American Statistician 49, 327–335 (1995) 18. Sherbondy, A., Akers, D., Mackenzie, R., Dougherty, R., Wandell, B.: Exploring connectivity of the brain’s white matter with dynamic queries. IEEE Trans. Vis. Comput. Graph. 11, 419–430 (2005)
Tract-Based Morphometry Lauren J. O’Donnell1,2, , Carl-Fredrik Westin2 , and Alexandra J. Golby1 1
Golby Surgical Brain Mapping Laboratory, Department of Neurosurgery 2 Lab for Mathematics in Imaging, Department of Radiology Brigham and Women’s Hospital, Harvard Medical School, Boston MA, USA
[email protected]
Abstract. Multisubject statistical analyses of diffusion tensor images in regions of specific white matter tracts have commonly measured only the mean value of a scalar invariant such as the fractional anisotropy (FA), ignoring the spatial variation of FA along the length of fiber tracts. We propose to instead perform tract-based morphometry (TBM), or the statistical analysis of diffusion MRI data in an anatomical tract-based coordinate system. We present a method for automatic generation of white matter tract arc length parameterizations, based on learning a fiber bundle model from tractography from multiple subjects. Our tractbased coordinate system enables TBM for the detection of white matter differences in groups of subjects. We present example TBM results from a study of interhemispheric differences in FA.
1
Introduction
Diffusion tensor magnetic resonance imaging (DTI) provides directional measurements of water diffusion in the brain. DTI tractography [1] follows directions of maximal water diffusion to estimate the trajectories of fiber tracts (major neural connections in the white matter). Clinical and neuroscientific questions regarding white matter pathways may potentially be addressed by analyzing DTI data in regions of specific white matter tracts. Many analyses of tracts [2,3] have calculated the mean (in the entire tract) of a scalar such as the fractional anisotropy (FA). However, due to anatomical factors such as crossing fibers or nearness to cerebrospinal fluid or grey matter, as well as tissue microstructural factors such as packing densities and axon diameters [4], FA varies spatially along tract trajectories. Thus analysis of its mean value may not be optimal for localization of group differences. The spatial patterns of FA along fiber tracts have not yet been studied in detail due to the difficulty of tractography segmentation and of finding pointwise correspondences along the lengths of fibers from multiple subjects. In this paper we propose a general method for calculation of multisubject fiber arc length
This work has been supported by NIH grants U54EB005149, R01MH074794, P41RR13218, and U41RR019703. Thank you to Susumu Mori at JHU for the diffusion MRI data (RO1 AG20012-01 / P41 RR15241-01A1). Thanks to Lilla Zollei for the congealing registration code.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 161–168, 2007. c Springer-Verlag Berlin Heidelberg 2007
162
L.J. O’Donnell, C.-F. Westin, and A.J. Golby
coordinate systems, enabling tract-based morphometry (TBM), the group statistical analysis of tensors or scalar invariants along the length of fiber tracts. Our method first analyzes tractography from multiple subjects, producing a bundle model that gives a prototype fiber. Next, by finding robust pointwise correspondences of all other fibers to the prototype, common arc length coordinates are produced for all subjects. Finally, statistical analysis of the diffusion MRI data (TBM) is performed in the tract-based coordinate system. We illustrate the method with TBM results from an experiment comparing right- and left-hemisphere FA in normal right-handed subjects, showing that significant differences are found in the cingulum bundle and arcuate fasciculus. By mapping these results onto the respective group mean fibers, we demonstrate the anatomical locations of the interhemispheric differences.
2
Related Work
Other groups have described methods for tract-based analysis of DTI. One approach for fiber tract parameterization by arc length required human interaction to define a corresponding point on all fibers [5]. Another approach, demonstrated on the the corona radiata and cingulum of one subject, estimated three level sets to produce a fiber coordinate system [6]. A principled statistical bundle model with arc length coordinates was created in [7]. In [8], a manually segmented arc angle coordinate system specific to the cingulum bundle was applied to a study of interhemispheric FA differences in multiple subjects. A similar method was applied to the pyramidal tract [9]. An approach called tract-based spatial statistics performed voxel-based morphometry of FA data after aligning local high FA values to a group FA skeleton [10]. However, by working in a voxel coordinate system such a method must ignore any tract-derived information. To our knowledge, no existing method for tract-based DTI analysis has both automatically applied a coordinate system to fibers from multiple subjects and demonstrated statistical analysis in this coordinate system.
3
Method: Tract-Based Morphometry
We propose a method for tract-based morphometry, the statistical analysis of diffusion data in a white matter tract coordinate system. We automatically determine a group fiber arc length coordinate system by learning a fiber bundle model using tractography from multiple subjects. The five steps in our method are: fiber bundle definition, prototype fiber calculation, arc length parameterization, measurement of descriptive statistics, and statistical analysis in the group. 3.1
Fiber Bundle Definition
As a preprocessing step, fiber bundle(s) of interest must be defined in all subjects, either automatically [7,11] or interactively [12]. Here we have used congealing normalization [13] and automatic group fiber clustering [11].
Tract-Based Morphometry
3.2
163
Bundle Modeling and Prototype Fiber Calculation
A distinction between our approach and a straightforward 3D approach such as finding a bundle skeleton or centerline is that our method finds a prototype fiber that is the most representative of the bundle’s trajectory across all subjects, according to a fiber affinity metric. For each bundle of interest a fiber bundle model is generated by spectrally embedding all fibers using the affinity metric [11]. Once embedded, each fiber is represented as a point. The embedding of all fibers produces a point cloud whose mean would correspond to a fiber that is most representative of the fiber bundle’s trajectory across all subjects. Because we have a high fiber sample size (the total number of fibers in the bundle from all subjects together), we can estimate the mean using the fiber closest to the mean. We refer to this fiber as the prototype fiber. 3.3
Arc Length Parameterization by Matching to Prototype Fiber
We propose a simple pointwise matching method (Fig. 1) to produce arc length coordinates for all fibers across all subjects. The basic idea is that the prototype fiber defines the arc length coordinates for the bundle, and these coordinates are propagated to all other fibers by matching each point on the prototype fiber to the closest point(s) on the other fibers.
Arc Length Prototype Fiber
1
(b) Example Prototype Fiber Fiber x Matched Region 0
(a) Method for Matching to Prototype Fiber
(c) Example Result
Fig. 1. Arc length coordinates are produced for each fiber “x” in the bundle via matching to the prototype fiber. First, the matched region (a) is found by matching each point on the prototype with the closest point on fiber x. Dashed lines indicate matches. Non-matched regions at either end of fiber x (gray x’s) are excluded from further analysis. Next, points in the matched region that were skipped are matched to the closest point on the prototype. An example prototype fiber from the uncinate fasciculus is shown in (b) followed by one subject’s matching result (c), with nonmatched areas in gray.
The following features of the matching method make it robust to fiber trajectory variation across subjects. First, the method allows partial matching if a fiber is shorter than the prototype fiber. It also rejects any outlier fibers whose matched arc length coordinates are not strictly nonincreasing or nondecreasing. Finally, the TBM analysis is limited to the region that corresponds across subjects: any arc length coordinates that have not matched to fibers from all subjects are excluded.
164
L.J. O’Donnell, C.-F. Westin, and A.J. Golby
3.4
Measurement of Descriptive Statistics
For each subject, the statistic of interest (mean FA) is measured in the fiber arc length coordinate system1 . First, for each fiber x, a vector of FA values vs. arc length is calculated using the matching result from the previous section: for each arc length coordinate the point(s) matching to it on fiber x contribute the average of their FA values. Next, for each subject’s fibers the mean and standard deviation of the FA are calculated for each arc length coordinate2 . Using the above method we also compute the mean (x,y,z) coordinate for each arc length, giving a per-subject mean fiber. 3.5
Statistical Analysis in the Group
We now have a per-subject vector of mean FA values versus arc length, for the bundle(s) of interest. For each arc length coordinate, statistics appropriate to the study may be calculated, and multiple comparisons can be corrected for via permutation testing, as described for fMRI data [14]. By averaging the persubject mean fibers we produce a group mean fiber for visualization of results.
4
Methods for Example TBM Study
We applied TBM to analyze FA differences in the right and left hemispheres in the cingulum bundle and the arcuate fasciculus. Data. DTI data (EPI, 30 directions + 5B0 images, 2.5mm isotropic voxels) from 35 normal right-handed subjects was seeded with tractography (where cL >= 0.2 on a 1.5mm grid). FA data was affinely group registered using congealing [13] (no shear terms) and the registration was applied to the fibers. Fiber Bundle Definition. Tractography clusters, corresponding across hemispheres and across subjects, were automatically identified by group spectral clustering [11]. Bilateral clusters in the arcuate fasciculus (AF) and cingulum regions (CB) were selected to give bundles for further analysis. 10 subjects (AF) and 3 subjects (CB) were excluded due to missing or very short fibers. The total numbers of fibers per bundle were: 1523 (AF L), 1702 (AF R), 4308 (CB L), and 2679 (CB R). The means and standard deviations of the numbers of fibers per subject were: 61 ± 36 (AF L), 68 ± 51 (AF R), 135 ± 47 (CB L), and 84 ± 36 (CB R). Bundle Modeling and Prototype Fiber Calculation. For each bilateral fiber bundle, one prototype fiber was calculated using the embedding from the clustering step. Arc Length Parameterization by Matching to Prototype Fiber. Each fiber was matched to the prototype fiber or to its reflection across the midsagittal plane, depending on whether the fiber was from the right or the left 1
2
We discuss measurement of FA though the measurement would work identically for any other scalar invariant or other local scalar/tensor information. All subjects’ tensor orientation information is taken into account by using the fiber coordinate system, thus to study FA along fibers we average FA, not tensors.
Tract-Based Morphometry
165
hemisphere. This gave a consistent arc length parameterization for all fibers, to allow comparisons across subjects and across hemispheres. Measurement of Descriptive Statistics. The mean FA for each arc length was measured separately for bundles in the right and left hemisphere of each subject. In addition, the mean FA was measured in the entire bundles. Statistical Analysis in the Group. First, significant interhemispheric FA differences in whole bundles were investigated with a two-tailed paired t-test. Then, in TBM analysis, a two-tailed paired t-test was employed at each arc length coordinate to investigate local differences in right and left hemisphere FA. To correct for multiple comparisons, permutation testing [14] was performed by exchanging the labels for right and left within each subject, and 10000 permutations were used to estimate the distribution of the maximal test statistic. Using this distribution, p-values were calculated for each arc length coordinate and overlaid on the group mean fiber for anatomical localization of group differences.
5
Results
We present example results from each step in the method. Input fiber data, fiber embedding, and the resulting prototype fibers from the bundle modeling Arcuate Fasciculus (N = 25 subjects)
Cingulum Bundle (N = 32 subjects)
(a) fibers
(b) embedding
(c) prototype
Fig. 2. Group fiber bundle modeling in the arcuate fasciculus (top) and cingulum (bottom). Input fiber bundles (a) with fibers colored by subject. Fiber embedding according to trajectory similarity (b): each fiber is represented as a point and colored according to subject. The embedding produces a prototype fiber for the bundle model (c). In (a) and (b), for visual clarity a random subset of the data is shown. In (b), the original 10D embedding was reduced to 2D for display using multidimensional scaling.
166
L.J. O’Donnell, C.-F. Westin, and A.J. Golby
step are pictured (Fig. 2). Next, arc length parameterization results are shown from the matching to prototype fiber step (Fig. 3). Then, from the measurement of descriptive statistics step, example FA measurements from all fibers for a single subject are plotted (Fig. 3). Means and standard deviations of FA measurements in entire bundles were: 0.3992 ± 0.0190 (AF L), 0.3830 ± 0.0252 (AF R), 0.4231 ± 0.0297 (CB L), 0.4016 ± 0.0269 (CB R). In entire bundles, mean left arcuate FA was greater than right (p = 0.0197) and mean left cingulum FA was greater than right (p < .0001). TBM results from statistical analysis in the group are presented in Figure 4, showing locations of significant interhemispheric differences in the cingulum bundle and arcuate fasciculus, respectively.
(a) subject 1
(b) subject 2
(c) subject 3
(d) subject FA (AF L)
(a) subject 1
(b) subject 2
(c) subject 3
(d) subject FA (CB L)
Fig. 3. Example arc length parameterizations (a-c) for arcuate fasciculus (top) and cingulum bundle (bottom). Unmatched variable regions (gray) are excluded from analysis. Example FA measurements from one subject versus arc length (d). Each curve gives the measurements from one fiber.
6
Discussion
The success of the method is demonstrated by the results in the cingulum bundle region which reproduce published findings (laterality differences measured using a cingulum-specific arc angle coordinate system [8]). As in [8], differences are localized to the anterior cingulum. The results in the region of the arcuate fasciculus are (to our knowledge) the first demonstration of FA measurements in a tract-based coordinate system in this region. An interesting feature of the mean FA result (Fig. 4 (b)) in the arcuate is that there is significantly lower mean FA in the right hemisphere where the fibers curve from the anteroposterior orientation to the superoinferior orientation. We hypothesize that fiber tractography termination (“broken trajectories”) in this region may be the cause of the lower number of tractography fibers often reported in the arcuate fasciculus in the right hemisphere.
Tract-Based Morphometry
167
Arcuate Fasciculus (N = 25 subjects) FA
mean FA and std. error
0.46 0.44
0.44 20
left hemisphere
0.42
0.42 0.4
0.38
0.38
0
0.36
10
0.34 0.32 0
0.4
10
0.36 0.34 10
right hemisphere
10
20 30 arc length
40
50
20
30
40
(b) Mean FA on mean fiber
(c) p-value on mean fiber
(a) Left (blue) and Right (red) Mean FA Cingulum Bundle (N = 32 subjects) FA
mean FA and std. error
0.5
0.4
0.4
0.2
left hemisphere
20
0.3
0
20
(b) Mean FA on mean fiber 0.2 right hemisphere
0.1 0
20
40 arc length
60
(a) Left (blue) and Right (red) Mean FA
(c) p-value on mean fiber
Fig. 4. Left and right hemisphere FA measurements and significant differences for the arcuate fasciculus (top) and cingulum bundle (bottom). For each arc length coordinate, each subject’s mean FA value was computed for each hemisphere. The (group) mean and standard error of these per-subject means are shown vs. arc length in mm (a). The left and right group mean FA (b) and p-value for interhemispheric FA difference (c) are overlaid on the group mean fiber. In all plots, anterior is to the left.
There are some potential areas of improvement of the method that are under investigation. First, it is possible that the prototype fiber is a short fiber, and it is of interest to select a longer (and non-outlier) fiber to analyze as much of the bundle’s length as possible. Second, the appropriate resolution or scale at which to perform statistical comparisons should be determined. Third, a perhemisphere prototype (followed by an alignment of the two hemispheres’ fibers) might better represent the shape of some structures. Fourth, a non-rigid registration method could improve the pointwise correspondence quality. Finally, a feature of the presented method is that it excludes the ends of the tracts that are not present in some subjects. However, to locate possibly clinically relevant changes in those regions, analysis would be needed at arc length coordinates where fibers from all subjects are not present.
168
7
L.J. O’Donnell, C.-F. Westin, and A.J. Golby
Conclusion
We have presented a new method for tract-based morphometry that learns a bundle model from multiple subjects, generates a prototype fiber and an arc length parameterization for all fibers, and enables statistical analysis along fiber bundles of diffusion data from multiple subjects.
References 1. Basser, P., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber tractography using DT–MRI data. Magnetic Resonance in Medicine 44, 625–632 (2000) 2. Heiervang, E., Behrens, T., Mackay, C., Robson, M., Johansen-Berg, H.: Between session reproducibility and between subject variability of diffusion MR and tractography measures. NeuroImage 33(3), 867–877 (2006) 3. Jones, D.K., Catani, M., Pierpaoli, C., Reeves, S.J., Shergill, S.S., O’Sullivan, M., Golesworthy, P., McGuire, P., Horsfield, M.A., Simmons, A., Williams, S.C., Howard, R.J.: Age effects on diffusion tensor magnetic resonance imaging tractography measures of frontal cortex connections in schizophrenia. Human Brain Mapping 27, 230–238 (2006) 4. Pierpaoli, C., Jezzard, P., Basser, P.J., Barnett, A., Chiro, G.D.: Diffusion tensor MR imaging of the human brain. Radiology 201, 637 (1996) 5. Corouge, I., Fletcher, P., Joshi, S., Gouttard, S., Gerig, G.: Fiber tract-oriented statistics for quantitative diffusion tensor MRI analysis. Medical Image Analysis 10(5), 786–798 (2006) 6. Niethammer, M., Bouix, S., Westin, C.F., Shenton, M.E.: Fiber bundle estimation and parameterization. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 252–259. Springer, Heidelberg (2006) 7. Maddah, M., Wells, W.M., Warfield, S.K., Westin, C.F., Grimson, W.: Probabilistic clustering and quantitative analysis of white matter fiber tracts. In: Int’l Conf. Information Processing in Medical Imaging (2007) 8. Gong, G., Jiang, T., Zhu, C., Zang, Y., Wang, F., Xie, S., Xiao, J., Guo, X.: Asymmetry analysis of cingulum based on scale-invariant parameterization by diffusion tensor imaging. Human Brain Mapping 24(2), 92–98 (2005) 9. Lin, F., Yu, C., Jiang, T., Li, K., Li, X., Qin, W., Sun, H., Chan, P.: Quantitative analysis along the pyramidal tract by length-normalized parameterization based on diffusion tensor tractography: Application to patients with relapsing neuromyelitis optica. NeuroImage 33(1), 154–160 (2006) 10. Smith, S., Jenkinson, M., Johansen-Berg, H., Rueckert, D., Nichols, T., Mackay, C., Watkins, K., Ciccarelli, O., Cader, M., Matthews, P., Behrens, T.: Tract-based spatial statistics: Voxelwise analysis of multi-subject diffusion data. NeuroImage 31, 1487–1505 (2006) 11. O’Donnell, L., Westin, C.F.: High-dimensional white matter atlas generation and group analysis. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, Springer, Heidelberg (2006) 12. Mori, S., Wakana, S., Nagae-Poetscher, L.M., van Zijl, P.C.M.: MRI Atlas of Human White Matter. Elsevier, Amsterdam (2005) 13. Zollei, L., Learned-Miller, E., Grimson, W.E.L., Wells III, W.M.: Efficient population registration of 3D data. In: ICCV 2005, Computer Vision for Biomedical Image Applications (2005) 14. Nichols, T., Holmes, A.: Nonparametric permutation tests for functional neuroimaging: A primer with examples. Human Brain Mapping 15, 1–25 (2002)
Towards Whole Brain Segmentation by a Hybrid Model Zhuowen Tu and Arthur W. Toga Lab of Neuro Imaging, School of Medicine University of California, Los Angeles, USA
Abstract. Segmenting cortical and sub-cortical structures from 3D brain images is of significant practical importance. However, various anatomical structures have similar intensity patterns in MRI, and the automatic segmentation of them is a challenging task. In this paper, we present a new brain segmentation algorithm using a hybrid model. (1) A multiclass classifier, PBT.M2, is proposed for learning/computing multi-class discriminative models. The PBT.M2 handles multi-class patterns more easily than the original probabilistic boosting tree (PBT) [11], and it facilitates the process, eventually, toward whole brain segmentation. (2) We use an edge field, by learning, to constraint the region boundaries. We show the improvements due to the two new aspects both numerically and visually, and also compare the results with those by FreeSurfer [2]. Our algorithm is general and easy to use, and the results obtained are encouraging.
1
Introduction
Segmenting cortical and sub-cortical structures from 3D brain images is a very important task. There has been considerable recent work on 3D segmentation in medical imaging and we cite several representative ones due to the space limit [13,2,9]. The problem is usually tackled in a MAP (maximize a posterior) framework in which appearance models are defined and shape priors used to capture underlying shape regularity. Existing approaches can be roughly categorized into two groups: (1) one which puts strong efforts into the shape priors [13,9]; (2) and the other which classifies pixels/voxels using various features [6,8]. In [1], a hybrid model for brain anatomical structure segmentation was presented. The system [1] adopted a PBT [11] multi-class classifier to select and combine hundreds of cues such as intensity, gradients, curvatures, and locations to model ambiguous appearance patterns locally. However, the PBT multi-class classifier performs 2-way split only, and it is not efficient to use in performing whole brain segmentation, in which there are many cortical and sub-cortical structures. In this paper, we present a new brain segmentation algorithm using a hybrid model. (1) A multi-class classifier, PBT.M2, is proposed for learning/computing multi-class discriminative models. The PBT.M2 handles the multi-class patterns more easily than PBT [11], and it facilitates the process, eventually, toward whole N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 169–177, 2007. c Springer-Verlag Berlin Heidelberg 2007
170
Z. Tu and A.W. Toga
brain segmentation. (2) We use a learned edge field to constrain the structure boundaries. The edges are also learned and computed by fusing many local photometric and geometric features across different scales.
2
Hybrid Discriminative/Generative Model
We give the basic problem formulation in this section. Brain imaging mostly deals with 3D images, which are referred to as volumes for the rest of the paper. For an input volume, V, the task of brain segmentation is to obtain the full partition of each anatomical structure of interest. A solution W can be denoted as W = {Rk , k = 0, .., T }, where R0 is the background region, and Rk , k = 1, .., T T denote the anatomical structures. We have k=0 Rk = Λ where Λ defines the 3D lattice of the input V. Ri ∩ Rj = ∅, ∀i = j. Let the optimal solution W ∗ to be the one which minimizes an energy E(W, V) = EAP (W, V) + α1 Eedg (W, V) + α2 EP CA (W ) + α3 ESM (W ).
(1)
The first term, EAP (W, V), corresponds to the discriminative model p(y|V(N (s)) modeling the joint appearances EAP (W, V) = −
T
log p(y = k|V(N (s))),
(2)
k=0 s∈Rk
where N (s) includes all the voxels in the sub-volume, and y ∈ {0, ..., T } is the label/class for each voxel. p(y = k|V(N (s)) essentially computes the classification probability of voxel s belonging to structure k. Eedg (W, V), on the other hand, focuses on the regions boundary and Eedg (W, V) = −
T
log p(EG(s) = on|V(N (s))).
(3)
k=0 s∈∂Rk
∂Rk denotes the surface of Rk and p(EG(s) = on|V(N (s))) computes the probability of voxel s being on the boundary of a region. The discriminative model, EAP (W, V), captures complex appearances as well as the local geometry by looking at a sub-volume. It also provides context information. If a very accurate EAP (W, V) can be learned, then Eedg (W, V) is not needed. However, due to the large intra-class variability, it is often hard to perfectly classify all the voxels. Eedg (W, V) is more robust against global intensity pattern change than EAP (W, V). EP CA (W ) and ESM (W ) represent the generative models about the shape prior p(R). EP CA (W ) is defined on the global shape model of each structure and ESM (W ) encourages the region boundaries to be smooth. α1 , α2 , and α3 are the weights balancing between how much we rely on the discriminative model, the global shape regularity, and the local smoothness. These weights are learned automatically and the details can be found in [1].
Towards Whole Brain Segmentation by a Hybrid Model
3
171
Learning Discriminative and Generative Models
This section gives more details about how the discriminative and generative models are learned and computed. 3.1
Learning Discriminative Models
To compute EAP , our task is to learn and compute the discriminative model p(y = k|V(N (s))). Each input sample is a sub-volume and the output is the probability of the center voxel s being on region Rk , k = 0..T . This is not an easy task due to the complex appearance patterns of V(N (s)). [1] adopted a probabilistic boosting tree (PBT) approach to learn and compute a multi-class classifier. However, the original PBT performs 2-way split only, which is not always efficient. For example, one has to compute at least two strong classifiers in order to classify 4 different classes. On the other hand, it is desirable to do 2-way split when there are many available classes since learning a single node classifier to perform, say, 25-class classification is both difficult and time-consuming. Let the training set be S = {(Va , ya ), a = 1..n} where Va is a 11 × 11 × 11 volume sample, ya ∈ {0..T } denotes its class label, and n is the total sample number. Let p(S, j) be the proportion of samples in S that belong to the jth class. T The entropy of set S can be defined as Inf o(S) = − j=0 p(S, j) log2 (p(S, j)).
Fig. 1. Classification results of a volume on left hippocampus. The first row shows three slices of part of a volume with the left hippocampus highlighted. The three figures in the second row display the soft map, p(y = 1|V(N (s))) (left hippocampus) at three typical slices by the original PBT algorithm, and the third row shows the result by the PBT.M2.
172
Z. Tu and A.W. Toga
Apparently, the smaller the entropy is, the sparser the classes are in the set. We want to recursively construct a decision tree in which each tree node is either an AdaBoost [3] (2-class) strong classifier or AdaBoost.MH [4] (m-class) strong classifier. For a strong classifier H ∈ {H2 , Hm } (either 2-class or m-class), it splits S into t sub-groups (S1 , ..., St ). We choose the H which obtains the biggest information gain t |Si | Inf o(Si ) − cost(H), G(S, H) = − (4) |S| i=1 where the first term is similar to that in the well-known C4.5 algorithm [10] and cost(H) computes the total computational cost for strong classifier H. Therefore, the choice of H balances between how well to separate the current set and the computational cost. Fig. (2) outlines the basic PBT.M2 algorithm. Given: Labeled training examples S = {(Va , ya ), a = 1..n} with each ya ∈ {0..T } • Train a 2-class AdaBoost [3] classifier, H2 , and m-class (m=T+1) AdaBoost.MH [4] classifier, Hm . • Choose the strong classifier, H ∈ {H2 , Hm }, which maximizes the information gain G(S, H). • Split the training set S using H and recursively train a sub-tree. • Stop the tree node expansion if the error is smaller than a threshold. Fig. 2. PBT.M2 algorithm
PBT.M2 learns and computes an overall multi-class discriminative probability, like PBT [11], by p(y|V) = p˜(y|l1 , V)q(l1 |V) = p˜(y|ln , ..., l1 ), ..., q(l2 |l1 , V)q(l1 |V), (5) l1
l1 ,..,ln
where li represents the ith layer in the tree, and q(li ) computes the discriminative probability by each boosting node in the tree. Fig. (1a) illustrates an abstract version of the PBT.M2. The key of the PBT.M2 classifier is that it is capable of hierarchically fusing a set of informative features automatically selected from a large pool of candidate features (5,000). These features carry both intensity and local geometric properties of each voxel in a sub-volume. They include the x,y,z coordinates of the position of each voxel of interest, intensity value, gradients, curvatures, and various 3D Haar features computed in the sub-volume. Fig. (1b) shows the classification results by PBT and PBT.M2, and we can clearly see the improvement. In our experiments, some features selected by the PBT.M2 are: (1) coordinate x of the center voxel s; (2) Haar filter of size 9 × 7 × 7; (3) gradient at s; Learning Eedg (W, V) To further constrain the region boundary, we use an explicit boundary term Eedg (W, V). Existing work using explicitly edge terms [5] require to specify parameters, e.g. scale, and they are not adaptive. We use PBT to learn boundary
Towards Whole Brain Segmentation by a Hybrid Model
173
voxels and the learning/computing process is nearly the same as that in learning the voxel label classification. The only difference is that their training annotations are different. In Fig. (5) we show how the results are affected by using the learned edge field. 3.2
Learning Generative Shape Models
In this paper, a simple PCA model, similar to [13,1], is adopted based on the signed distance function of the shape. Training images are registered first and we align all the anatomical structures according to their centers when learning the shape prior. For each manually labeled anatomical structure, e.g. left hippocampus R1 , its corresponding signed distance map (SDM) S1 is computed in which the value of each voxel s represents its distance to the surface. We can ¯ Q = U ΣV T , where S¯ is the mean of the learn a PCA model by S = U β + S, SDM for the training shapes, and Q is a matrix with each row vector being a ¯ The third energy in eqn. (1) becomes training sample Si − S. EP CA =
T 1 T β Σk βk + α4 ||UkT Uk (Sk − S¯k )||2 . 2 k
(6)
k=1
T Another energy term, ESM = k=0 ∂Rk dA, is added to encourage smooth surfaces. ∂Rk dA is the area of the surface of region Rk . When the total energy is being minimized in a variational approach, this term corresponds to the force that encourages each boundary point to have small mean curvature, resulting in smooth surfaces.
4
Segmenting 3D Brain Volumes
The goal of the segmenting stage is to find the optimal segmentation/solution which minimizes the energy in eqn. (1). In our problem, the number of anatomical structures and their approximate positions are known. Therefore, we can apply a PDE approach to perform energy minimization. We use steepest descent to minimize energy E(W, V) in eqn. (1). We can derive the motion equations for EAP , Eedg , EP CA , and ESM similarly as in [1] (see the details in the supplementary document). 4.1
The Outline of the Algorithm
Our algorithm is summarized in this section. Training: (1) For a set of training volumes with the anatomical structures manually delineated, we train a PBT.M2 to learn the discriminative model p(y|V(N (s))). (2) For a set of training volumes with the annotated boundary voxels, train a PBT to learn the discriminative model for the edges. (3) For a set
174
Z. Tu and A.W. Toga
of training shapes for each anatomical structure, we learn its PCA shape model as discussed in sect. 3.2. (4) Learn α1 , α2 and α3 to combine the discriminative and generative models. Testing: (1) Given an input volume V, we compute p(y|V(N (s))) for each voxel and assign it with the label of the highest probability, to obtain a classification map. (2) Based on the classification map, obtain an initial segmentation in which all the anatomical structures are topologically connected. (3) Perform boundary evolution in minimizing the total energy E shown in eqn. (1). (4) Stop the algorithm after several iterations.
(a) 8 structures on 15 testing volumes (b) 25 structures on 25 testing volumes. Fig. 3. Error measures on the testing volumes of the two datasets. The new algorithm obtains the best results which illustrate the effectiveness of using PBT.M2 and learned edge field (see text for more explanation). Fig. (5) visually shows the comparison on a testing volume for dataset 1. Fig. (4) shows the result on a volume in dataset 2. The 25 structures include: hippocampus, putamen, caudate, superior temp., Superior occipital, middle frontal...
5
Experiments
High-resolution 3D SPGR T1-weighted MR images were acquired on a GE Signa 1.5T scanner. All the volumes shown in this paper are registered by [12]. For the first dataset of 28 volumes annotated by neuroanatomists for 8 sub-cortical structures. 14 volumes are randomly selected for training and the rest 14 are used for testing. We apply the algorithm stated in sect. 4.1 to segment the eight anatomical structures on both the training and testing volumes. The training and testing processes are repeated twice and we observe the same performances. For the second dataset of 40 volumes with 25 sub-cortical and cortical structures are manually delineated, 15 volumes were used for training and 25 are used for testing. To quantitatively measure the effectiveness of our algorithm, errors are measured using several criteria, and they appear to be all consistent. We report the precision and recall measure here. Let R be the set of voxels annotated by an exˆ be the voxels segmented by the algorithm. The precision and recall are pert and R
Towards Whole Brain Segmentation by a Hybrid Model
measured as P recision = ˆ
175
ˆ R∩R ˆ , R
R and Recall = R∩ R . The average is (P recision + Recall)/2. Fig. (4) shows the error measured on the two datasets. To test the effectiveness of using manul delineations learned edge field and PBT.M2, we conducted several experiments: (1) the original hybrid model [1]; (2) hybrid model in [1]+ 3D Canny edges; (3) hybrid model in [1]+ learned by our algorithm edges; (4) the overall model reported in this paper. It is Fig. 4. Results on a typical testing volume with observed that the new algo- 25 cortical and sub-cortical structures delineated. rithm performs the best and The second row shows the result by the algorithm the model using learned edges reported in this paper. We are in the process of outperforms the one with 3D getting more (50 ∼ 100) structures manually annotated for training our algorithm. Canny edges [7]. To directly compare our algorithm to an existing state of art algorithm, we tested the MRI data using FreeSurfer [2] and our results are better than theirs. Fig. (5) visually shows the comparison in different trials. Fig. (4) shows a result on dataset 2 with manual delineations.
horizontal
sagittal
coronal
3D view
manual
by [1]+3D Canny
Fig. 5. Results on a typical testing volume. Three planes are shown overlayed with the boundaries of the segmented anatomical structures. The first row shows results manually labeled by an expert. The second row displays the result in [1]. The second row shows the result by hybrid model [1]+ 3D Canny. The third row shows the result by hybrid model [1]+learned edges. The fourth row displays the result by the new algorithm. The last row shows the result by FreeSurfer [2].
176
Z. Tu and A.W. Toga
by [1]+ learned edge field
by the method reported in this paper
FreeSurfer
Fig. 5. (continued)
6
Conclusions
In this paper, a system for brain anatomical structure segmentation is proposed. The algorithm is very general, and easy to train and test. It has nearly no parameter to tune (a couple very general ones are specified, e.g. the number of weak classifiers for each boosting node). The system makes use of the training data annotated by experts and learns the rules implicitly from examples. We are in the process of getting more (50 ∼ 100) structures manually annotated for training our algorithm. Our goal is to allow the system to eventually perform full brain segmentation, and the algorithm reported in this paper facilitates this. Also, more thorough experiments compared with other existing algorithms, e.g.[9], will be conducted.
Acknowledgment This work was funded by the NIH through the NIH Roadmap for Medical Research, Grant U54 RR021813 entitled Center for Computational Biology (CCB).
References 1. Tu, Z., Narr, C., Dollar, P., Thompson, P., Toga, A.: Brain Anatomical Structure Parsing by Hybrid Discriminative/Generative Models. review of IEEE Trans. on Medical Imaging
Towards Whole Brain Segmentation by a Hybrid Model
177
2. Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., Dale, A.M.: Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355 (2002) 3. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. of Comp. and Sys. Sci. 55(1) (1997) 4. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting, Dept. of Statistics, Stanford Univ. Technical Report (1998) 5. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. Int’l. J. Computer Vision 1(4), 321–332 (1988) 6. Lao, Z., Shen, D., Jawad, A., Karacali, B., Liu, D., Melhem, E., Bryan, N., Davatzikos, C.: Automated Segmentation of White Matter Lesions in 3D Brain MR Images, Using Multivariate Pattern Classification. In: Proc. of 3rd IEEE In’l Symp. on Biomedical Imaging, Arlington, VA, USA, April 6-9, IEEE Computer Society Press, Los Alamitos (2006) 7. Monga, O., Deriche, R., Malandain, G., Cocquerez, J.P.: Recursive filtering and edge closing: Two primary tools for 3D edge detection. Image and Vision Computing 9(4), 203–214 (1991) 8. Rohlfing, T., Russakoff, D.B., Maurer Jr., C.R.: Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation. IEEE Trans. on Medical Imaging 23(8) (2004) 9. Pohl, K.M., Fisher, J., Kikinis, R., Grimson, W.E.L., Wells, W.M.: A Bayesian model for joint segmentation and registration. NeuroImage 31, 228–239 (2006) 10. Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. of Art. Intell. Res. 4, 77–90 (1996) 11. Tu, Z.: Probabilistic boosting tree: Learning discriminative models for classification, recognition, and clustering. In: Proc. of ICCV (2005) 12. Woods, R.P., Mazziotta, J.C., Cherry, S.R.: MRI-PET registration with automated algorithm. Journal of Computer Assisted Tomography 17, 536–546 (1993) 13. Yang, J., Staib, L.H., Duncan, J.S.: Neighbor-Constrained Segmentation with Level Set Based 3D Deformable Models. IEEE Trans. on Medical Imaging 23(8) (2004)
A Family of Principal Component Analyses for Dealing with Outliers J. Eugenio Iglesias1 , Marleen de Bruijne12 , Marco Loog12 , Franc¸ois Lauze2 , and Mads Nielsen12 1
Department of Computer Science, University of Copenhagen, Denmark 2 Nordic Bioscience A/S, Herlev, Denmark
Abstract. Principal Component Analysis (PCA) has been widely used for dimensionality reduction in shape and appearance modeling. There have been several attempts of making PCA robust against outliers. However, there are cases in which a small subset of samples may appear as outliers and still correspond to plausible data. The example of shapes corresponding to fractures when building a vertebra shape model is addressed in this study. In this case, the modeling of “outliers” is important, and it might be desirable not only not to disregard them, but even to enhance their importance. A variation on PCA that deals naturally with the importance of outliers is presented in this paper. The technique is utilized for building a shape model of a vertebra, aiming at segmenting the spine out of lateral X-ray images. The results show that the algorithm can implement both an outlier-enhancing and a robust PCA. The former improves the segmentation performance in fractured vertebrae, while the latter does so in the unfractured ones.
1 Introduction Principal Component Analysis (PCA) is a technique that simplifies data sets by reducing their dimensionality. It is an orthogonal linear transformation that spans a subspace which approximates the data optimally in a least-squares sense (Jolliffe 1986). This is accomplished by maximizing the variance of the transformed coordinates. If the dimensionality of the data is to be reduced to N , an equivalent formulation of PCA is to find the N -set of orthornormal vectors, grouped in the P matrix, which minimizes the error made when reconstructing the original data points in the data set. The error is measured in a L2 norm fashion: C=
N
PPt xi − xi 2
(1)
i=1
where C is the cost, N is the number of training cases and xi are the centered data vectors to approximate. Least-squares is not robust when outliers are present in the dataset, as they can skew the result from the desired solution, leading to inflated error rates and distortions in statistical estimates (Hampel et al. 1986). Many authors, especially in the neural networks literature, have tried to reduce the impact of outliers on PCA by modifying the cost in N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 178–185, 2007. c Springer-Verlag Berlin Heidelberg 2007
A Family of Principal Component Analyses for Dealing with Outliers
179
Equation 1. Xu and Yuille 1991, for example, introduced a binary variable that is zero when a data sample is considered to be an outlier, one otherwise: CXu =
N
Vi PPt xi − xi 2 + η(1 − Vi )
i=1
where Vi is the set of binary variables. The term η(1 − Vi ) prevents the optimization from converging to the trivial solution Vi = 0, ∀i. The main disadvantage of this method is that it either completely rejects or includes a sample. Moreover, a single noisy component in a sample vector can make it be discarded completely. Gabriel and Zamir 1979 proposed a similar method in which every single component of each data point is controlled by a coefficient, instead of having one binary weight per vector. Outliers are still considered, but have lower importance. Furthermore, undesired components (known as intra-sample outliers) can be downweighted without discarding the whole sample vector. Several other weighting and penalty terms have more recently been proposed (see for example De la Torre and Black 2003), but the formulation remains essentially the same. All these approaches aim at reducing the effects of outliers in the model. In this paper, a family of PC analyses, capable of both increasing and decreasing the contribution of outliers in the model, is proposed. The algorithm was tested on a shape model applied to the segmentation of the vertebrae from lateral x-ray images from the spine. In this case, the fractured vertebrae may appear as outliers, but they are the most important cases and should be enhanced rather than disregarded.
2 Methods 2.1 Φ-PCA and α-PCA In contrast to directly minimizing the squared data reconstruction error as in normal PCA (Equation 1), the presented Φ-PCA algorithm minimizes: C=
N
Φ[PPt xi − xi 2 ]
(2)
i=1
where Φ is a twice-differentiable function such that Φ(x2 ) is convex. The fact that Φ is twice-differentiable makes it possible to use Hessian-based methods in the optimization, providing quadratic convergence. The convexity requirement ensures the existance of just one minimum for C. A simple and at the same time powerful form of the function is Φ(x) = xα , with α > 0.5 in order to accomplish the convexity condition. This special case will be called α-PCA. Large values for α (α > 1, in general) will enhance the outliers, as they become more expensive compared to normal cases. In particular, α = ∞ would lead to minimizing the L∞ norm, and hence the maximum reconstruction error over shapes measured in a L2 norm fashion. On the other hand, smaller values (0.5 < α < 1) will have the opposite effect, leading to a more robust PCA. The case α = 0.5 minimizes the L1 norm. Finally α = 1 amounts to standard PCA.
180
J.E. Iglesias et al.
The data points xi must be centered, which means that their mean must be subtracted from them: xi = si −μ, where si represents the original, non-zero mean data samples. In the proposed algorithm, the “mean” is no longer the component-wise arithmetic mean of the data points as in the standard PCA, but the vector which minimizes (assuming M dimensions for the data points): M N N Φ 2 Φ 2 CμΦ = (3) Φ[μ − si ] = Φ (μt − sit ) i=1
i=1
t=1
Once the xi vectors have been calculated the Φ-PCA, which consists of searching the basis vectors P that minimize the cost function in Equation 2, can be performed. Numerical methods will be required in both minimizing C and Cμ , as there is no closedform expression for μΦ or P. An important difference between standard and Φ-PCA is that, in the latter, the principal components have to be recalculated if the desired dimensionality changes. In standard PCA, on the other hand, the first N1 principal components are common for two analyses with N1 and N2 components, assuming that N2 > N1 . Optimization of the Mean: The expressions for the gradient and the Hessian of the cost CμΦ in Equation 3 are quite simple and fast to calculate. Using the component-wise arithmetic mean as initialization, Newton’s method converges rapidly to the solution: Φ Φ −1 ∇CμΦ (μΦ μΦ n+1 = μn − [HCμΦ (μn )] n ),
where the gradient ∇CμΦ is a column vector consisting of the first-order derivatives: M N ∂CμΦ Φ 2 =2 Φ (μl − sil ) (μΦ k − sik ), ∂μΦ k i=1 l=1 and the Hessian matrix H consists of the second-order derivatives: M M N ∂ 2 CμΦ Φ 2 Φ 2 Φ 2 (μl −sil ) (μk −sik ) + Φ (μm − sim ) 2Φ Hkk = 2 =2 ∂μΦ m=1 i=1 l=1 k Hku = Huk
M N ∂ 2 CμΦ 2 Φ = =4 Φ (μΦ (μΦ l − sil ) k − sik )(μu − siu ) Φ ∂μΦ k ∂μu i=1 l=1
Optimization of the Basis: Once the mean has been subtracted from the data points, the cost C in Equation 2 must be minimized. The function has the interesting property that it reaches its global minimum for an orthonormal P matrix such that Pt P = I. This makes it possible not to have to constrain P to accomplish this condition during the optimization, even if that implies that in general PPt will not represent a projection matrix, and hence PPt xi − xi does not express the reconstruction error any longer. In this minimization problem, only the expression for the gradient is implemented, as the one for the Hessian matrix is too complex and its computation too expensive. Using matrix calculus, all the partial derivatives can be calculated simultaneously:
A Family of Principal Component Analyses for Dealing with Outliers
181
N N dC d d = Φ (PPt xi − xi )t (PPt xi − xi ) = . . . Φ PPt xi − xi 2 = dP dP i=1 dP i=1 N
Φ [xti PPt PPt xi − xti xi − 2xti PPt xi ] −4xi xti P + 2 xi xti PPt + PPt xi xti P
i=1
Once the gradient is known, different standard techniques can be used to update P. In a simple gradient descent scheme, for example: Pn+1 = Pn − k
dC dP
where k is the step size. Line search can then be used with a normal PCA as initialization in order to quickly find the optimal P. In this algorithm, different step sizes are probed at each iteration, keeping the one that leads to the minimum value of the cost function C. It is important to mention that the orthonormality condition, which would simplify the expressions of the cost and the gradient, cannot be assumed throughout the process, as the P matrix is being modified unconstrainedly (even though it converges to an orthonormal matrix). 2.2 Shape Models Based on Φ-PCA In shape models (Cootes et al. 1995), a set of landmarks is defined on a set of previously aligned shapes. One data vector si is built per shape by stacking of the x and y coordinates of the landmarks. Next, the mean is subtracted from them and PCA performed on the resulting xi data vectors, aiming at representing the shapes with a lower dimensionality and with a higher specificity than the explicit cartesian coordinates, at the expense of a certain approximation error. The differences between shape models based on standard and Φ-PCA will be described. First, the shapes are aligned with the Procrustes method (Goodall 1991) and their mean calculated. Rotation, translation and scaling are allowed for aligning the shapes. The alignment parameters and the mean are optimized simultaneously, minimizing: Calign =
N i=1
Φ[Ti (zi , θi ) − μΦ 2 ] =
N i=1
Φ[si − μΦ 2 ] =
N
Φ[xi 2 ]
i=1
where Ti (zi , θi ) represents the aligned si shape according to the set of parameters θi . The constraint μt μ = 1 prevents the shapes from shrinking towards zero. The iterative algorithm described in Cootes et al. 1995 was used for solving the problem: 1. 2. 3. 4. 5.
Normalize the size of the first shape and use it as a first estimate of the mean. Align all the shapes to the current estimate of the mean. Update the estimate of the mean by finding the mean of the aligned shapes. Normalize the size of the new estimate of the mean. Go to step 2 until convergence.
182
J.E. Iglesias et al.
The mean in the third step must be found by numerically minimizing the cost in Equation 3, as already explained. However, as minimizing Φ(t2 ) is equivalent to minimizing t2 = PPt x − x2 , the alignments in the second step can be easily calculated by minimizing the sum of squared distances in the standard way (see Cootes et al. 1995 once more for a simple solution). Another consequence of this property is that the PCA coordinates bi of a shape can still be calculated in the same way as in the normal PCA: si ≈ μΦ + Pbi
(4)
bi = P (si − μ )
(5)
t
Φ
3 Experiments This study is based on a dataset which consists of lateral X-rays from the spine of 141 patients. Vertebrae L1 through L4 were outlined by three different expert radiologists, providing the ground truth of the study. 65 landmarks were extracted for each vertebra using the MDL algorithm described in Thodberg 2003. The same radiologists also provided information regarding the fracture type (wedge, biconcave, crush) and grade (mild, medium, severe) for the vertebrae (see Genant et al. 1993). In addition, they also annotaded the six landmarks used in the standard six-point morphometry (Black et al. 1995, Genant et al. 1993), located on the corners and in the middle point of both vertebra endplates. These points define the anterior, middle and posterior heights, which are used to estimate the fracture grade and type. Both normal PCA and α-PCA (for different values of α) were applied on the dataset keeping 7 (α-)PCA coordinates, capable of preserving approximately 95% of the total variance in the data in all the cases. For both algorithms the mean and maximum squared reconstruction errors were calculated. The dependence of the error on the number of fractures in the training set was also studied. It should be noted that a higher number of components would achieve better precision and still provide a good trade-off with respect to the specificity of the model, but a smaller amount was kept in this experiment in order to better illustrate the difference between PCA and α-PCA. Finally, PCA and α-PCA were tested in an active shape model (Cootes et al. 1995) for segmenting the L1-L4 vertebrae in the images. Two shape models were built, one for the six landmarks and the other for the full contour, and the relationship between the (α-)PCA coordinates of both models fitted to a conditional Gaussian distribution. In order to allow for more flexibility in the model, a higher number of principal components was utilized: seven for the six landmarks and eleven for the complete contour, keeping approximately 98% of the total variance in both cases. The mean of the conditional distribution was used as initialization for the segmentation of the full contour. At each iteration, the gray level information along a profile perpendicular to the contour was used to calculate a desired position for each point at the following looping. The new contour can then be calculated by fitting the model to the new points using Equations 4 and 5. The conditional covariance was used to measure the Mahalanobis distance from the new (α-)PCA coordinates b to the conditional mean. In case of it being larger than a certain threshold Dmax , the vector is scaled down b = b(Dmax /D(b)) to ensure D(b) ≤ Dmax . This way, the solution is constrained to stay close to the six landmarks. The process is repeated until convergence.
A Family of Principal Component Analyses for Dealing with Outliers
183
4 Results 4.1 Mean and Maximum Reconstruction Error Figures 1-a and 1-b show the dependence on α of the sum of squared errors when fitting the model to labelled points. The maximum error decreases with α, as expected, doing it faster for the fractures. The mean error shows how values of α lower than one tend to increase the error in fractures, as they are no longer important in the model, and decrease it in unfractured vertebrae, even if it is not much. It should be noted that unfractured vertebrae are in general quite well modelled already. Values larger than one initially improve the results in fractures, at the expense of making them slightly worse in unfractured vertebrae. Finally, if α increases too much, the model tends to fit merely the most unlikely cases, making the average results worse both for unfractured vertebrae and mild fractures. 4.2 Influence of the Number of Training Fractures In this experiment, the model was built with all the unfractured vertebrae and different fractions of the total amount of available fractures: from 12.5% to 100% in 12.5% increments. α was set equal to 1.75, providing a good trade-off between the maximum and mean errors in fractured and unfractured vertebrae, according to the results presented in above. Figure 1-c shows that α-PCA is especially useful, clearly outperforming the normal PCA, when the number of fractures in the training set is relatively small.
Fig. 1. a) Dependance of the mean sum of squared errors with α. b) Dependence of the maximum sum of squared errors with α. c) Dependence of the mean sum of squared errors with the number of fractures present in the training set for α = 1.75.
4.3 Active Shape Model Vertebrae L1 through L4 were segmented from the available images using a shape model conditioned on the six landmarks annotated by the radiologists, using both standard and α-PCA (α = 1.75). The experiments were performed in a leave-one-out fashion: the model used for segmenting a certain image is built upon all the other ones.
184
J.E. Iglesias et al.
Table 1. Mean point-to-line error (in mm) for the different analyses and doublesided p-values for a t-test and a signed rank test with 95% confidence interval
Unfractured Mild fractures Medium fractures Severe fractures All fractures
No. shapes Error PCA (mm) Error α-PCA (mm) 500 0.44 0.47 15 0.62 0.56 38 0.66 0.57 11 0.97 0.60 64 0.70 0.57
p t-test p signed rank 1.79·10−4 6.05·10−4 1.14·10−2 1.25·10−2 7.87·10−3 1.40·10−2 4.78·10−3 9.77·10−3 3.10·10−11 3.53·10−12
The point-to-line errors from the true contour to the output of the algorithm are displayed in Table 1, along with the p-values resulting from a paired, double-sided t-test and a paired, double-sided Wilcoxon signed rank test. The results show that the standard PCA leads to a lower mean error in unfractured vertebrae, but α-PCA provides more uniform results along the different grades of fracture severity, at the expense of a slight increase in the total mean error. Moreover, α-PCA significantly outperforms the standard PCA in fractures, especially in the severe ones. It also has the property of assigning different importance to each case in a continuous manner without requiring fracture information for the training data. If this information was available, it would be possible to build two different models, but then a large number of training fractures would be required. Besides, if two models are fitted, a mistake in the decision about which result to keep could lead to a very bad fit. Regarding the p-values, both tests indicate that the difference in the means between the two setups is significant. Finally, two radiographs which have been segmented with standard and α-PCA (α = 1.75) are displayed along with the contour provided by the radiologists in figure 2. They both correspond to severe fractures. α-PCA provides a better approximation of the real shape, especially around the points in which it changes its direction rapidly.
Fig. 2. Segmentation examples. For each pair, the image on the left corresponds to the ground truth and the image on the right to the shape model-based segmentation, both for standard PCA (white) and α-PCA (black).
5 Discussion and Conclusion A family of modified PC analyses has been presented in this paper. The family deals with outliers in the data set in an optimal way according to a predefined function, whose shape determines whether the importance of outliers increases or decreases compared with normal PCA. The family Φ(x) = xα is proposed, but others could be used. Compared to other methods in the literature, the one presented here has the ability of enhancing or disregarding outliers with just one compact and simple formulation.
A Family of Principal Component Analyses for Dealing with Outliers
185
In most applications it is desirable to reduce the influence of abnormal cases on the principal components. However, if PCA is utilized in a segmentation method, it is essential to be able to adapt to such cases, whose correct processing might be even more important than that of the normal ones. In this paper, α-PCA was tested in the creation of a vertebra shape model, giving a higher importance to abnormal cases without requiring prior knowledge on which of the shapes are fractured or present other abnormalities, such as osteophytes. The segmentation accuracy was improved in such cases. It should finally be noted that the conditional model used in the segmentation algorithm assumes a Gaussian distribution for the PCA coordinates of the shapes. Using α-PCA instead of standard PCA makes the distribution resemble less a Gaussian, indirectly affecting the segmentation results. This fact may also affect further statistics analysis if the PCA coordinates are for example used to estimate the fracture grade.
References Jolliffe, I.: Principal component analysis. Springer, New York (1986) Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W.: Robust statistics:the approach based on influence functions. Wiley, New York (1986) Xu, L., Yuille, A.: Robust principal component analysis by self organizing rules based on statistical physics approach. IEEE Trans. Neural Networks 6, 71–86 (1991) Gabriel, K., Zamir, S.: Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21(21), 489–498 (1979) De la Torre, F., Black, M.J.: A framework for robust subspace learning. International Journal of Computer Vision, 117–142 (2003) Goodall, C.R.: Procrustes methods in the statistical analysis of shape. supervised methods: a comparative study on a public database. J. R. Stat. Soc. Medical Image Analysis 53, 285–339 (1991) Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models - their training and application. Comput Vis. Image Underst. 61(1), 38–59 (1995) Thodberg, H.H.: Minimum Description Length Shape and Appearance Models. In: Proceedings of Information Processing in Medical Imaging, Springer, Heidelberg (2003) Black, D.M., Palermo, L., Nevitt, M.C., Genant, H.K., Epstein, R., San Valent´ın, R., Cummings, S.R.: Comparison of methods for defining prevalent vertebral deformities: the study of osteoporotic Fractures. J. Bone Miner. Res. 10, 890–902 (1995) Genant, H.K., Wu, C.Y., van Kuijk, C., Nevitt, M.C.: Vertebral Fracture Assesment Using a Semiquantitative Technique. J. Bone Miner. Res. 8(9), 1137–1148 (1993)
Automatic Segmentation of Articular Cartilage in Magnetic Resonance Images of the Knee Jurgen Fripp1,2 , Stuart Crozier2, Simon K. Warfield3 , and S´ebastien Ourselin1 1
3
BioMedIA Lab, e-Health Research Centre CSIRO ICT Centre, Australia {jurgen.fripp, sebastien.ourselin}@csiro.au 2 School of ITEE, University of Queensland, Australia
[email protected] Computational Radiology Laboratory, Harvard Medical School, Children’s Hospital Boston, USA
[email protected]
Abstract. To perform cartilage quantitative analysis requires the accurate segmentation of each individual cartilage. In this paper we present a model based scheme that can automatically and accurately segment each individual cartilage in healthy knees from a clinical MR sequence (fat suppressed spoiled gradient recall). This scheme consists of three stages; the automatic segmentation of the bones, the extraction of the bonecartilage interfaces (BCI) and segmentation of the cartilages. The bone segmentation is performed using three-dimensional active shape models. The BCI is extracted using image information and prior knowledge about the likelihood of each point belonging to the interface. A cartilage thickness model then provides constraints and regularizes the cartilage segmentation performed from the BCI. The accuracy and robustness of the approach was experimentally validated, with (patellar, tibial and femoral) cartilage segmentations having a median DSC of (0.870, 0.855, 0.870), performing significantly better than non-rigid registration (0.787, 0.814, 0.795). The total cartilage segmentation had an average DSC of (0.891), close to the (0.896) obtained using a semi-automatic watershed algorithm. The error in quantitative volume and thickness measures was (8.29, 4.94, 5.56)% and (0.19, 0.33, 0.10) mm respectively.
1
Introduction
MR imaging allows the non-invasive assessment of cartilage tissue, which is required for clinical studies, surgical treatments and drug trials. To obtain statistical significance, this assessment is usually performed on each cartilage separately (or in subregions) using morphological measures (volume, thickness, surface area or curvature) [1]. These measures require the cartilage to be segmented separately (or in subregions), a task that can significantly influence the error and reproducibility of the quantitative analysis, and has thus far proved difficult to automate, with current approaches manual [2] or semi-automatic (e.g. region growing [3], B-spline snakes [4] and live-wires [5]). These approaches take 30 N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 186–194, 2007. c Springer-Verlag Berlin Heidelberg 2007
Automatic Segmentation of Articular Cartilage
187
minutes to several hours and require a skilled trained operator to obtain reproducible results. T1 weighted fat suppressed (FS) spoiled gradient recall images (SPGR) are the most commonly used MR sequence for cartilage assessment. They have poor cartilage-meniscus and cartilage-synovial fluid contrast and can exhibit significant imaging artifacts that obscure and artificially create defects [6]. These along with image resolution, magic angle and partial volume effects cause the problems that make accurate cartilage segmentation difficult. A few recently developed sequences reduce many of these problems [7]. In diseased knees, osteophytes, lesions and cracks are commonly observed and make segmentation more difficult. The development of an automatic approach is desirable and has been pursued by several groups. With healthy knees, accurate segmentation results (Dice Similarity Coefficient (DSC) ≈ 0.9) were obtained by Grau using a modified semiautomatic watershed metric that utilizes prior information and requires around 10 minutes of user seeding [8]. Unfortunately, it segments all the individual cartilages as a single object, which is not sensitive enough for use in quantitative analysis. Using low field non-FS T1 -weighted MR images, a tissue classifier has a been found to obtain reasonably accurate results on a large database of healthy and OA patients (DSC ≈ 0.8) [9]. In [10] an almost fully automatic approach has been used to segment ankle cartilages by first segmenting the bones, creating a surface mesh from which a local graph is built and two (bone and cartilage) surfaces are extracted simultaneously using two separate cost functions. In this paper we present a model based scheme that uses localization obtained from a segmentation hierarchy as well as prior knowledge of the cartilage position and thickness variation to accurately segment each individual cartilage (patellar, tibial and femoral) in healthy knees. This scheme was validated and compared with FFD based NRR and the modified watershed approach of Grau [8]. The influence of repositioning and partial voluming effects was evaluated. Finally, we demonstrate the promise of this approach using volume and thickness quantitative measures.
2 2.1
Method MR Acquisition
An inhomogeneous database of N = 20 healthy volunteers who were not known to have knee problems were imaged using a FS SPGR MR sequence; 14 scanned at 1.5T (Tesla) and 6 scanned at 3T (Tesla). No demographic information was available, however no exclusion criteria based on age or gender was used. The following acquisition parameters were used in imaging: echo times - 5, 7 or 12 ms, repetition time - 60 ms, flip angle - 40o , Field of view 120 mm, matrix = 512 × 512 or 256 × 256, 1.5 mm slice thickness, and 60, 64 or 70 sagittal slices. Each of the acquired images were manually segmented by an expert. Left knees were reflected across their axes and treated as right knees (Note: this is not currently handled automatically).
188
2.2
J. Fripp et al.
Segmentation System Overview
The major problem with segmenting the knee bones and cartilages from clinically relevant sequences (e.g. FS SPGR), is that both types of anatomy have poor or missing boundary interfaces. The bone is difficult to distinguish from fat, tendons and background, while the cartilage have problems with synovial fluid, meniscus and even ligaments and muscle tissue. In many applications statistical models [11] has proved to be useful at handling this type of problem. It should be noted that diseased knees are more difficult to segment, as osteophytes, lesions, intensity inhomogeneities and cracks are often found, however these are not present in our current database and our approach has not been validated on diseased knees. This overall approach is illustrated in Figure 1 and is summarized as follows. A point distribution model represents the shape of the bones, whose variability across the database is modelled using 3D statistical shape models (SSM). A hybrid segmentation scheme based around 3D active shape models is used to segment the bones. Using the bone segmentation, the BCI is extracted and the local tissue properties of the bones and cartilages are estimated and represented using a Gaussian distribution. The initial tissue properties are then re-estimated using a 3 class expectation maximization based Gaussian mixture model performed in the local (mask) region around the BCI. This information, combined with constraints provided by a model (principal component analysis) of the thickness variation (observed in the database) is used to estimate the cartilage thickness at each point in the point distribution model. The outer cartilage surface is then extracted and this coupled bone-cartilage segmentation is voxelised. The SSM creation and bone segmentation was previously presented in [12] and this paper only provides a more detailed explanation of the cartilage segmentation.
Fig. 1. Flow diagram of the segmentation scheme used to automatically segment the bones, extract the BCIs and segment the cartilages. The surface rendering results presented are of case 12 with case 6 used as the atlas.
Automatic Segmentation of Articular Cartilage
189
Extraction of the BCI. The BCI is the region of the bone that has articular cartilage above it. This region is determined automatically by first extracting the points on the bones which have a high probability of being on the BCI (determined from the training database). Using this initial BCI we then, 1. Estimate the cartilage tissue parameters which are modeled as a Gaussian distribution. – The distribution is estimated from samples extracted along a 1D profile g, normal to the surface at each point in BCI to the strongest negative gradient within 6.5 mm (supersampled at twice the in-plane resolution using cubic B-spline interpolation). 2. Consider all points who have two neighbors on the BCI (via triangulation). – If a point has at least 30% of the samples cartilage tissue then add point to current BCI. 3. Converged when the number of points on the BCI remains unchanged else Goto 2.
Cartilage Segmentation. The first stage of the segmentation process refines the estimate of the tissue properties used in the BCI extraction and generates a distance image from each of the bones. This is achieved by first creating a binary mask of voxels that are above and within 6.5 mm of the BCI. Inside this masked region an estimate of the tissue properties is obtained using an expectation maximization gaussian mixture model (3 classes, initialized using previous estimate of bones, cartilage and other (tissue between bone and cartilage intensities). From the tissue properties, a cartilage tissue probability image is generated (Gaussian based with values above the mean given a probability of 1). This is used as input to the cartilage segmentation algorithm, which operates as follows: Until range is 0. 1. For each point i on BCI find thickness ti . – Assume that the position j along the profile g that maximizes the MR image gradient and |δg |
2. 3. 4. 5.
j 1 internal probabilities (p) (F (gj , pj ) = max( max(|δg|) +k k pj−k ) corresponds to the correct outer cartilage edge. Parameterize and reconstruct likely thickness using model of thickness variation (trained from normal patients). Enforce BCI boundary constraints. Set ti to 0 for points that do not have an overall probability of having 50% cartilage tissue. Decrease capture range by reducing search range around ti .
After convergence the coupled bone-cartilage model is voxelised and the distance map is used to relabel overlapping voxels to their nearest cartilage interface. 2.3
Non-Rigid Registration
There are many ways of performing NRR, a popular method, due to it’s general applicability, transparency and computational efficiency is the FFD approach first proposed by Rueckert [13]. In this paper, normalized mutual information was used with a total of 6 hierarchical levels (including the affine), each level decreased the spacing between the control points by half (20, 10, 5, 2.5 and 1 mm). The computational load was reduced by using a dilated mask to restrict the set of active control points. 2.4
Validation
The validation of the segmentations are presented using three volume-based measures, sensitivity (= TP /(TP + FN )), specificity (= TN /(TN + FP )), DSC (= 2TP /(2TP + FP + FN )) where TP is true positive, TN is true negative, FP
190
J. Fripp et al.
is false positive and FN is false negative of the automatically obtained binary bone segmentation compared to the expert binary manual segmentation. The experiments were leave-one-out, with the robustness to initialization tested by segmenting each case 19 (N-1) times, each time using a different initialization (case as the atlas). Due to the high computational load, the NRR was only performed using two different atlases (case 6 and 15).
3
Results
As can be seen in Table 1 and Figure 2 our approach obtains much better sensitivity, specificity and DSC than NRR. The results obtained by our approach and NRR followed a similar trend across the cases (Figure 2) varying mainly depending on the image quality. A qualitative illustration of the difference in quality between our approach, manual segmentation and NRR is illustrated in Figure 3. At low resolutions (cases 16 to 20) the results were slightly lower, however, the images were much noisier with much less contrast between the cartilages and surrounding tissue. This was especially true for cases 17 and 19 were very the low contrast resulted in none or very poor delineation (Figure 4) with few distinctive gradients found on the outer cartilage interfaces.
Fig. 2. The DSC for each case (case 6 as an atlas), showing that the left NRR after 1mm has inferior accuracy compared to right our approach. Note 1: The manual segmentations of the patellar cartilage for cases 4 and 17 appear to be incomplete.
Overall the primary cause of segmentation error was regions affected by partial voluming and other signal decreases. This occurred primarily around the edges and thin regions of the cartilages, it was also observed in regions of high curvature (in slice thickness direction). This problem could be reduced by using certain MR sequences and acquisition parameters. In a few cases small errors were also observed between the femoral and tibial cartilages (e.g Slice 48 in Figure 3). The approach was quite robust to initialization (atlas choice) with failures in cartilage segmentation only observed (with the exception of case 17) when the previous bone segmentation had failed (failure rate of bone 3.60% [12]).
Automatic Segmentation of Articular Cartilage
191
Table 1. Mean (Standard deviation) (Median) of volume-based validations Affine - (Patellar) - (Tibial) - (Femoral) NRR (1 mm) - (Patellar) - (Tibial) - (Femoral) Our approach - (Patellar) - (Tibial) - (Femoral)
Sensitivity 0.450 (0.163) (0.502) 0.460 (0.170) (0.491) 0.418 (0.143) (0.474) Sensitivity 0.803 (0.119) (0.848) 0.781 (0.156) (0.804) 0.795 (0.162) (0.836) Sensitivity 0.821 (0.135) (0.849) 0.829 (0.207) (0.860) 0.837 (0.162) (0.865)
Specificity 0.998 (0.001) (0.998) 0.998 (0.001) (0.998) 0.994 (0.002) (0.994) Specificity 0.999 (0.001) (0.999) 0.999 (0.001) (0.999) 0.997 (0.002) (0.997) Specificity 1.000 (0.000) (1.000) 0.999 (0.000) (0.999) 0.999 (0.000) (0.999)
DSC 0.422 (0.164) (0.491) 0.473 (0.166) (0.519) 0.427 (0.138) (0.478) DSC 0.732 (0.156) (0.787) 0.785 (0.095) (0.829) 0.758 (0.148) (0.795) DSC 0.833 (0.135) (0.870) 0.826 (0.083) (0.855) 0.848 (0.076) (0.870)
Fig. 3. Overlayed segmentations (gray contour on patellar and tibial cartilages) for case 9 (case 15 as atlas, slices 16 an 48). top left MR top right Manual bottom left NRR (DSC=0.82,0.79,0.82) bottom right Our approach (DSC=0.87, 0.85, 0.86). Note: Areas of interest used 1.5× zoom.
We further validated our approach using the data of the fourth subject used in [8]. This subject was scanned four times to evaluate the effect of partial voluming and repositioning, with the ground truth segmentation of each scan obtained using STAPLE from ten manual segmentations performed by two experts (each five times). As can be seen in Table 2, the values of sensitivity, specificity and DSC for the total cartilage (patellar, tibial and femoral) were similar, with the lower sensitivity and higher specificity of our approach indicating it slightly undersegments compared to the modified watershed algorithm. The primary advantage of our approach is that it does not require any user interaction and each
192
J. Fripp et al.
Fig. 4. Overlayed segmentation (gray contour on tibial cartilage) example of case 17 (case 6 used as atlas) using slice 20 left MR middle Manual right Our approach. Note: Poor delineation between cartilage interfaces cause our approach (and NRR) to fail. Table 2. Average (from 5 segmentations) of the total cartilage (patellar, tibial and femoral) results obtained using our algorithm compared to the improved watershed approach of Grau [8]. Only the total cartilage was compared as Grau’s approach cannot obtain the individual cartilage segmentations that are necessary to perform statistically significant quantitative analysis Scan (1) (2) (3) (4)
Improved Watershed Sens. Spec. DSC 0.8965 0.9987 0.8988 0.8649 0.9990 0.8907 0.8763 0.9990 0.8984 0.8905 0.9988 0.8978
Our approach Sens. Spec. DSC CVDSC % 0.8410 0.9993 0.8897 0.06 0.8402 0.9994 0.8898 0.08 0.8490 0.9992 0.8902 0.1 0.8591 0.9992 0.8959 0.09
Fig. 5. Surface rendering of segmentation results obtained for case 9 (using case 15 as an atlas)
cartilage is segmented and labelled separately, which is essential for quantitative analysis. The effectiveness of this approach for quantitative analysis was evaluated using two quantitative analysis measures (volume and thickness). The volume was estimated directly from which we found that the (manual, automatic) segmentations had an average volume of (4245, 3912), (6026, 6056) and (14703, 14463) mm3 and average absolute volume difference error of 8.29%, 4.94% and 5.56% for the patellar, tibial and femoral cartilages respectively (excluding case 17). The thickness was calculated from the whole BCI using an approach based on [14], from which obtained an average absolute thickness difference of (0.19, 0.33, 0.10 mm) for the (patellar, tibial, femoral) cartilage (excluding case 17).
4
Discussion and Conclusion
In this paper we have presented a segmentation scheme that automatically and accurately obtains cartilage segmentations from healthy volunteers. The
Automatic Segmentation of Articular Cartilage
193
complete scheme is fully automatic and has a reasonable computation time (slightly over an hour). Each cartilage is obtained as a separate object, which is essential for performing statistically significant quantitative analysis. We also found that the results did not vary significantly due to repositioning and partial voluming effects, which is critical as these effects are impossible to avoid. Using this scheme for quantitative analysis obtained an average absolute volume difference error of only 8.29%, 4.94% and 5.56%. Future work is focused on further improving the segmentation results by increasing the size of the training database and including additional information, including localized tissue and texture models and explicit curvature constraints. This is primarily aimed at improving the segmentation accuracy at the poorly delineated cartilage-cartilage interfaces and towards the thinner cartilage boundaries. Future research will be focused on validating this scheme on the more complex problem of diseased knees.
Acknowledgment The authors wish to thank Andrea U. J. Mewes and Johannes Pauser for help in acquiring and interactively segmenting the MR scans. This investigation was supported in part by NIH grants R21 MH067054 and R01 RR021885, a research grant from CIMIT and grant RG 3478A2/2 from the NMSS. The authors would also like to thank Mark Holden for help in using the VTK CISG Segmentation Propagation Tool [15] to perform the non-rigid registration.
References 1. Eckstein, F., et al.: Magnetic resonance imaging (MRI) of cartilage in knee osteoarthritis (OA): morphological assessment. Osteoarthritis and Cartilage 14, 46– 75 (2006) 2. Cicuttini, F., et al.: Comparison of conventional standing knee radiographs and magnetic resonance imaging in assessing progression of tibiofemoral joint osteoarthritis. Osteoarthritis Cartilage 13(8), 722–727 (2005) 3. Waterton, J., et al.: Diurnal variation in the femoral articular cartilage of the knee in young adult humans. Magnetic Resonance in Medicine 43, 126–132 (2000) 4. Stammberger, T., et al.: Determination of 3D cartilage thickness data from MR imaging: computational method and reproducibility in the living. Magnetic Resonance in Medicine 41(3), 529–536 (1999) 5. Gougoutas, A., et al.: Cartilage volume quantification via live wire segmentation. Academic Radiology 11(12), 1389–1395 (2004) 6. Yoshioka, H., et al.: Articular cartilage of knee: Normal patterns at MR imaging that mimic disease in healthy subjects and patients with osteoarthritis. Radiology 231, 31–38 (2004) 7. Lang, P., et al.: MR Imaging of Articular Cartilage: Current State and Recent Developments. Radiologic Clinics of North America 43(4), 629–639 (2005) 8. Grau, V., et al.: Improved watershed transform for medical image segmentation using prior information. IEEE Trans. Medical Imaging 23(4), 447–458 (2004)
194
J. Fripp et al.
9. Folkesson, J., et al.: Segmenting articular cartilage automatically using a voxel classification approach. IEEE Trans. Medical Imaging 26(1), 106–115 (2007) 10. Li, K., et al.: Simultaneous segmentation of multiple closed surfaces using optimal graph searching. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 406–417. Springer, Heidelberg (2005) 11. Cootes, T., et al.: Active shape models - their training and application. Computer Vision and Image Undertanding 61(1), 38–59 (1995) 12. Fripp, J., et al.: Automatic segmentation of the bone and extraction of the bonecartilage interface from magnetic resonance images of the knee. Physics in Medicine and Biology (2007) 13. Rueckert, D., et al.: Nonrigid Registration Using Free-Form Deformations: Application to Breast MR Images. IEEE Transactions on Medical Imaging 18(8), 712–721 (1999) 14. Jones, S., et al.: Three-dimensional mapping of cortical thickness using Laplace’s equation. HBM 11, 12–32 (2000) 15. Hartkens, T.: VTK CISG Registration Toolkit (2006), Software available at http://freshmeat.net/projects/vtkcisg/
Automated Model-Based Rib Cage Segmentation and Labeling in CT Images Tobias Klinder1,2 , Cristian Lorenz2 , Jens von Berg2 , Sebastian P.M. Dries2 , orn Ostermann1 Thomas B¨ ulow2 , and J¨ 1
2
Institut f¨ ur Informationsverarbeitung, University of Hannover, Germany
[email protected] Philips Research Europe - Hamburg, Sector Medical Imaging Systems, Germany
Abstract. We present a new model-based approach for an automated labeling and segmentation of the rib cage in chest CT scans. A mean rib cage model including a complete vertebral column is created out of 29 data sets. We developed a ray search based procedure for rib cage detection and initial model pose. After positioning the model, it was adapted to 18 unseen CT data. In 16 out of 18 data sets, detection, labeling, and segmentation succeeded with a mean segmentation error of less than 1.3 mm between true and detected object surface. In one case the rib cage detection failed, in another case the automated labeling.
1
Introduction
Although bony structures show high contrast in Computed Tomography (CT) images, their detection, identification, and correct segmentation remain still a challenge today. The image analysis process is complicated by, e.g., similarity of adjacent structures to the object to be segmented, partial volume effects resulting in no clear object boundary, or the fact that scans frequently contain pathology. One attempt to overcome these difficulties is to include prior knowledge in the form of anatomical models. The rib cage shapes the human chest protecting all inner soft-tissue organs. For that reason, a geometric model of the osseous thorax is of special interest because it can serve as a reference for the location of soft-tissue organs. A model of the vertebral column is in itself of clinical relevance since it may support orthopedic and neurological applications. In this context, a precise segmentation of the vertebral column including an identification of the individual vertebrae is essential. Model-based segmentation is known to be dependent on a good initialization. Once adaptation is misled, it can hardly recover. Especially in the case of the rib cage with its amount of similar adjacent structures, careful positioning is an important point.
We would like to thank Katrina Read from Philips Medical Systems, Cleveland (USA) as well as the University of Maryland Medical Center, Baltimore (USA) for all image data.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 195–202, 2007. c Springer-Verlag Berlin Heidelberg 2007
196
T. Klinder et al.
To our knowledge not many publications address the segmentation and labeling of the rib cage or the complete vertebral column. In contrast to semiautomatic region-based approaches, Shen et al. present a fully automatic tracingbased algorithm extracting and identifying rib centerlines [1]. A different work also providing a segmentation and labeling of individual ribs is done by Staal et al. [2]. Compared to our method, these approaches are specifically tuned to ribs and do not offer an additional vertebra segmentation. From the modeling perspective, some work has been done in modeling particular complex anatomical objects as, e.g., the human heart [3]. Also the modeling of individual vertebrae has been addressed earlier [4]. However, to our knowledge it is the first time that a complete osseous thorax has been modeled and adapted to CT data scans. With its amount of particular structures, the thorax is thereby an extension from individual object to object-group modeling. The remainder of this text is organized as follows. Section 2.1 describes the setup of the initial rib cage model. For positioning of the complete model to given CT data, a ray search based approach was developed which is introduced in Sect. 2.2. As an alternative, Sect. 2.3 describes an iterative positioning of vertebra models. By using these approaches, relevant structures were segmented in 29 CT data sets from which a mean model has been built as described in Sect. 2.4. The results of the positioning and adaptation of the mean model to unseen data are given in Sect. 3.
2 2.1
Methods Inital Model Setup
Triangulated surface models of all 24 ribs and all 24 presacral vertebrae were initially created. Vertebra model generation was based on the scanning of commercially available plastic phantoms with a Philips Brilliance40 CT scanner. For rib model generation, an interactive segmentation from patient CT data was performed similar to [3]. By adapting all created surface models to their corresponding anatomical objects in reference patient CT data, the individual models were assembled to form an initial rib cage model including a complete vertebral column on the basis of patient data. Adaptation was done using a shape-constrained deformable surface model approach [4], which minimizes an energy term consisting of internal Eint (shape similarity) and external Eext (image features) energies. Due to the characteristic shape of the ribs, we establish a centerline-based description as an alternative for rib surface meshes. We calculate a rib’s centerline by iteratively cutting a given rib mesh with planes being perpendicular to the surface. The center points of the obtained cut contours correspond to the desired centerline points. Shape information is added to the centerline by calculating an ellipse fit [5] to each cut contour. Between centerline points and ellipse parameters, we interpolate using B-Splines. Compared to a simple surface mesh description, this alternative enables an automatic rib surface generation when the location of the centerline of a rib is given and the centerline is identified.
Automated Model-Based Rib Cage Segmentation and Labeling
2.2
197
Global Model Positioning
Before positioning the model to a data set, at least parts of the corresponding structures have to be found. Since the ribs provide a framework for the entire chest, a proper positioning for the complete model can be determined from a correct rib detection. For object detection, we apply a ray search based approach [6]. By sending rays through the data set searching for a typical gray value profile, rib candidates can be detected. Due to the characteristic shape of the thorax, rib detection is divided into two symmetrical problems - one search for the left and one for the right ribs. In each case, a radial cylindrical ray search is applied in every image slice with an angular sampling of n = 180 and the cylinder axes pointing in head-foot direction. The location of the cylinder axes are determined by the top points of the lungs.
Fig. 1. Defined search profile for rib detection. The individual lengths are able to vary between zero and empirically determined maximal values. The thresholds Tcm and Tlung separate the particular crossings. While Tlung is constantly set to Tlung = −100 HU, the exact value for Tcm has to be automatically determined for each patient.
With the settings of the cylindrical ray searches, the profile for rib detection can be defined. A ray crossing a rib bone shows a pattern of a high gray value when entering the rib through cortical bone followed by a lower one when traversing the bone marrow and again a high gray value exiting the rib through cortical bone. Before crossing a rib, a ray passes through lung tissue and a certain length of soft tissue. The entire profile is shown in Fig. 1. If a ray owns the defined profile, the middle point of the two positions, where the ray enters and exits the rib, is saved as a rib candidate. After detection, left and right rib centerlines are extracted seperately out of the candidates by using the coordinate system of the radial cylindrical ray searches as a reference. At first, the center of mass of candidates of successive slices obtained from rays with the same angle in the cylinder coordinate system is calculated. Afterwards, all combined candidates are grouped to individual rib clusters. Neighbouring candidates belong to the same cluster if they have approximately the same distance to the origin of the cylinder coordinate system and do not significantly vary in their z-coodinate.
198
T. Klinder et al.
Model centerlines and extracted centerlines are registered with an iterative closest point (ICP) algorithm [7] allowing an affine transformation. The extracted centerlines are identified by registering all possible model combinations. If for instance seven rib pairs are extracted, combination 1-7, 2-8,...6-12 are registered. The configuration with the minimal residual error is supposed to correspond to the true configuration. After identifying the detected ribs and iteratively registering model centerlines to candidates, we apply a thin-plate spline approximation [8] of centerline points in order to cope with inter-patient variability of rib centerlines and to provide an improved positioning for the following segmentation. To take possible outliers into account, approximation is preferred to pure interpolation. An example for the registration is given in Fig. 2. With the found location of the rib centerlines and their identification, rib surface models are automatically generated as mentioned in Sect. 2.1. The initial positioning of the vertebrae is given by applying the transformation obtained from the rib cage model registration. However, the vertebrae can be segmented more precisely when translating the vertebra models in direction of the mean difference vectors calculated out of the locations of the first centerline points of corresponding ribs after global model registration and thin-plate spline approximation.
Fig. 2. An example for the entire registration process. The black dots correspond with the extracted centerlines of the rib candidates. The blue lines are the registered centerlines of the model using ICP. The red lines indicate the difference vectors of candidates and closest model centerline point of the corresponding rib. After thinplate spline approximation, the final positioning of the model is found (yellow). See electronic version for color figure.
2.3
Iterative Vertebra Model Positioning
Global model positioning requires thoracic CT scans covering the entire chest. Since the approach determines the initial model pose by detecting ribs it can not be applied for, e.g., lumbar or head-neck scans. Due to this fact, we developed an alternative model positioning for the vertebra models based on object relations. In order to express relations between the individual vertebrae, we have defined a local vertebra coordinate system (VCS). Since the shape of the vertebrae significantly changes down the spine, the VCS had to be derived from typical invariant object characteristics. The definition of the VCS is based on the automatic calculation of three object-related simplified representations: a cylinder fit to the vertebral foramen,
Automated Model-Based Rib Cage Segmentation and Labeling
199
the middle plane of the upper and lower vertebral body surfaces, and the vertebra’s sagittal symmetry plane (see Fig. 3). Out of the three representations, the VCS was defined. The intersection point of the axis of the fitted cylinder with the middle plane defines the origin of the VCS. The normal vector of the middle plane defines the zvcs -axis. The xvcs -axis is defined as the orthogonal component of the normal vector of the symmetry plane to the zvcs -axis. The yvcs -axis is defined as the cross-product of the zvcs - and the xvcs -axis. By using the derived object relations expressed in the form of VCSs, the vertebra models can be iteratively positioned. Starting from one adapted vertebra model, neighboring models can be positioned in the data set by applying the transformation between corresponding vertebrae obtained from the vertebral column model. From this initial position, the models are automatically adapted using again [4] and their patient specific VCSs are calculated. An iterative repetition of this process provides a segmentation of the shown extract of the vertebral column.
(a) Cylinder fit
(b) Middle plane
(c) Symmetry plane
Fig. 3. Different vertebra representations. An example for a cylinder fit to the vertebral foramen is given in (a). Figure (b) shows the middle plane of the vertebral body upper and lower surfaces. In each case, the corresponding automatically detected triangles are shown as filled. The symmetry plane is shown in (c).
2.4
Mean Model Building
For mean model generation, we adapted the initial model to a sample of 29 CT data sets showing different portions of the rib cage and the vertebral column. In 17 data sets, all being chest CT scans, the model was positioned with the approach described in Sect. 2.2 and adapted using [4]. In the other twelve data sets, being whole spine, head-neck, thoracic, and lumbar scans, only the vertebrae were segmented using the iterative positioning from Sect. 2.3. In each case, the automatic adaptation was inspected by the first author and local misadaptations were manually corrected. A clinical expert verified the results. By adapting the same particular surface models to different patient data sets, point correspondences between the models were preserved. Usually, mean model building of individual objects is based on initially registering all corresponding meshes to one reference by a single global transformation
200
T. Klinder et al.
and then averaging all vertex positions. However, in the case of a constellation of objects as, e.g., the osseous thorax not only the mean shape of the individual objects but also their mean location relative to the others has to be found. For that reason, we apply an iterative registration and averaging process. We start with the selection of one reference vertebra, register all corresponding vertebrae from other samples to that chosen reference using a rigid transformation (6 degrees of freedom) and finally average vertex positions. Starting from the calculated mean vertebra model, we register corresponding vertebrae to the mean shape and apply the transformation at first on the upper neighboring vertebrae. By averaging again the vertex positions of the transformed neighboring vertebrae, we obtain the corresponding mean shapes and also their mean location relative to the start mean vertebra model. An iterative execution of this process provides the upper part of the mean vertebral column. In the same manner, the lower part can be obtained. One advantage of this procedure is the ability to cope with sample data containing different portions of the vertebral column. In the case of the ribs, corresponding neighboring vertebrae are registered with the mean vertebra models. The obtained transformation is then applied on the rib models and finally vertex positions are again averaged.
3
Results
For adaptation of the mean rib cage model to unseen data, we followed the two different approaches for model positioning from Sect. 2.2 and Sect. 2.3. Both approaches were performed on 18 chest CT data sets that were part of the ensemble of the 29 data sets. In order to simulate model adaption to unseen data, the mean rib cage model has been generated by leaving out in each case the data set under consideration. All 18 data sets show a resolution of 0.85-0.97 mm in x- and y-direction and 2.5 mm in z-direction. The results of the automatic adaption were inspected by the first author and manually corrected if necessary. Again, a physician verified the corrections. In each case, the mean and maximal distances between adapted and corrected rib and vertebra models were calculated for all meshes of an entire data set and afterwards averaged resulting in mean and maximal distance values, dmean and dmax . As distance measure, we calculated the Euclidian distance between corresponding vertices. Automatic rib detection and subsequent extraction of rib centerlines following Sect. 2.2 was successful in 17 data sets. In the remaining case, the ray search based approach could only find very few candidates so that the centerline extraction failed and the rib cage model could not be positioned. In 16 out of the remaining 17 cases the following identification succeeded. One data set did not show a significant minimum after ICP registration. After finding an initial position, all particular surface models were adapted individually to their corresponding anatomical structure using [4]. The complete procedure including detection, identification, positioning, and adaptation took less than 5 min on a workstation with 2.16 GHz. The results of the segmentation are shown in Table 1. With a
Automated Model-Based Rib Cage Segmentation and Labeling
201
Table 1. Summary of model adaptation to 16 data sets
dmean [mm] dmax [mm]
vertebra adaptation min mean max 0.30 1.27 1.8 4.35 6.27 9.41
min 0.21 3.31
rib adaptation mean max 0.36 0.92 7.01 11.43
mean distance error over all data sets of 1.27 mm for the vertebrae and 0.36 mm for the ribs, we achieved an adequate level of accuracy on average. However, local misadaptations, e.g., at the vertebra rib articulation, cause significant local maximal errors. In one case, most parts of the model adapted to neighboring structures caused by an imprecise object detection. In order to give a visual impression, Fig. 4 (a) shows the segmentation result in one image slice of one arbitrarily chosen data set and the corresponding adapted model in Fig. 4 (b). Compared to the positioning and adaptation of the complete model, we obtained similar results for the segmentation of the vertebral column using the iterative adaption from Sect. 2.3. In each case, we chose the lowest vertebra shown in the data set as start vertebra. The mean values for dmean and dmax were 1.12 and 5.56 mm. However, in one data set the approach completely failed. Due to a vertebra fracture, one vertebra could not be segmented correctly, so that the positioning of all subsequent models was incorrect resulting in misadaptation.
(a) Image Slice
(b) Adapted Model
Fig. 4. Adapted mean model to unseen image data. Figure (a) shows the result in one image slice. Color coding illustrates labeling (see electronic version). Each label corresponds to the correct anatomical object. The arrow points to a local misadaptation. The adapted model is shown in (b).
4
Conclusion
Model-based segmentation requires careful positioning. Especially, in the case of the rib cage with its amount of similar neighboring structures, misadaptation to neighboring structures is a crucial point. However, with the developed
202
T. Klinder et al.
positioning, the individual models adapted in almost all cases correctly to their corresponding image objects. In 16 out of 18 data sets, detection, labeling, and segmentation succeeded with a mean distance between the true and the detected object surface of 1.27 mm in case of the vertebrae and 0.36 mm in case of the ribs. An alternative approach for spine segmentation was performed that uses object relations derived from the mean model. This approach was successful in 17 out of 18 cases with a mean distance of 1.12 mm. We believe that we achieved an acceptable level of accuracy for some applications. If a higher accuracy is needed, the results may serve as a good basis for a locally detailed delineation.
References 1. Shen, H., Liang, L., Shao, M., Qing, S.: Tracing Based Segmentation for the Labeling of Individual Rib Structures in Chest CT Volume Data. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3217, pp. 967–974. Springer, Heidelberg (2004) 2. Staal, J., van Ginneken, B., Viergever, M.A.: Automatic rib segmentation and labeling in computed tomography scans using a general framework for detection, recognition and segmentation of objects in volumetric data. Medical Image Analysis 11, 35–46 (2006) 3. Lorenz, C., von Berg, J.: A comprehensive shape model of the heart. Medical Image Analysis 10, 657–670 (2006) 4. Weese, J., Kaus, M., Lorenz, C., Lobregt, S., Truyen, R., Pekar, V.: Shape constrained deformable models for 3d medical image segmentation. In: Insana, M.F., Leahy, R.M. (eds.) IPMI 2001. LNCS, vol. 2082, pp. 380–387. Springer, Heidelberg (2001) 5. Fitzgibbon, A.W., Pilu, M., Fisher, R.B.: Direct Least Squares Fitting of Ellipses. IEEE Transaction on Pattern Anaylysis and Machine Intelligence 21(5), 476–480 (1996) 6. Lorenz, C., von Berg, J.: Fast automated object detection by recursive casting search rays. Computer Assisted Radiology and Surgery, 230–235 (2005) 7. Besl, P.J.: A Method For Registration of 3-D Shapes. IEEE Transaction on Pattern Analysis and Machine Intelligence 14(2), 239–256 (1992) 8. Donato, G., Belongie, S.: Approximate Thin Plate Spline Mappings. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 21–31. Springer, Heidelberg (2002)
Efficient Selection of the Most Similar Image in a Database for Critical Structures Segmentation Olivier Commowick1,2 and Gr´egoire Malandain1 1
INRIA Sophia Antipolis - Asclepios Team, 2004 Rte des Lucioles BP 93 06902 Sophia Antipolis Cedex, France
[email protected] 2 DOSIsoft S.A, FRANCE, 45-47 Avenue Carnot, 94 230 Cachan, France
Abstract. Radiotherapy planning needs accurate delineations of the critical structures. Atlas-based segmentation has been shown to be very efficient to delineate brain structures [1]. However, the construction of an atlas from a dataset of images [2], particularly for the head and neck region, is very difficult due to the high variability of the images and can generate over-segmented structures in the atlas. To overcome this drawback, we present in this paper an alternative method to select as a template the image in a database that is the most similar to the patient to be segmented. This similarity is based on a distance between transformations. A major contribution is that we do not compute every patient-to-sample registration to find the most similar template, but only the registration of the patient towards an average image. This method has therefore the advantage of being computationally very efficient. We present a qualitative and quantitative comparison between the proposed method and a classical atlas-based segmentation method. This evaluation is performed on a subset of 45 patients using a Leave-One-Out method and shows a great improvement of the specificity of the results.
1
Introduction
Conformal radiotherapy requires a very accurate planning in order to optimize the irradiation dose on the tumor while controlling the doses on critical structures. This task requires their precise delineation. However, it is yet performed manually and is therefore time consuming and tedious. Recently, the creation and use of an anatomical atlas (an image of the anatomy, typically MRI or CT, associated to its segmentation) has been explored to delineate brain critical structures [1]. This atlas-based segmentation is decomposed in two steps: first, the patient image is registered on the atlas image ; then, the transformation is applied to the atlas structures to obtain the segmentation. This approach has shown good results on the brain. It would therefore be of great interest to use a similar approach on the head and neck region where 7% of the cancers arise. Two main approaches can be used to build an anatomical atlas. The first one (i) consists in using a single image delineated by an expert. For example, it N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 203–210, 2007. c Springer-Verlag Berlin Heidelberg 2007
204
O. Commowick and G. Malandain
is possible to use a symmetric atlas built from the BrainWEB1 and manually delineated, as presented in [1]. The second method (ii) is to build an average image and its segmentation from a dataset of manually delineated images. We have chosen this last solution (ii) in [2] to build an atlas of the head and neck region from images delineated following the guidelines provided in [3]. However, anatomical variability is very high in this region, particularly on the spinal cord or the lymph nodes areas. The generated mean contours can thus be too large in the atlas yielding over-segmentations. The atlas can also be very different from the patient to segment. Registration discrepancies can therefore appear leading to segmentation errors. To overcome these drawbacks, a previous work in the literature [4] presented an interesting approach that clusters the database of delineated images to build several atlases representing homogeneous sub-populations. However, the selection of the most adequate atlas for a given patient is not addressed. Moreover, a large number of samples is required in the database to build meaningful clusters. In this paper, we aim at segmenting the critical structures in a patient image P using a database of manually delineated images. Since our database has too few samples to build meaningful clusters, we propose to find the most similar database image to the patient image to be segmented, and to register it non linearly on this image. The key point is then to efficiently select among a database the most similar image to a given patient image, up to an affine transformation. This most similar sample is defined as the one that needs the smallest local deformations to be registered on the patient image to be segmented. The remainder of the article is organized as follows. We will first present our approach to select the image that is the most similar to the patient to delineate. This method requires to build a mean atlas from a database of delineated images. We will then focus on the method that is used to build an average image and its average segmentations from the database. Finally, we will show some qualitative and quantitative results on a database of 45 Head and Neck CT images, using a Leave-One-Out evaluation methodology.
2
Method
2.1
Selection of the Most Similar Image
In this paper, we aim at selecting the most similar image among the database and use it as the atlas. As mentioned above, the most similar image is defined as the one that will be the “less” deformed to be non linearly registered on the patient. The selection is then based on a comparison of the non linear transformations TP →Ik to bring P on each image Ik , i.e the most similar image is defined as: I˜ = arg min d(Ik , P ) = arg min TP →Ik − Id. Ik
1
Ik
http://www.bic.mni.mcgill.ca/brainweb/
(1)
Efficient Selection of the Most Similar Image in a Database
205
However, this comparison is computationally very expensive as it requires to perform all the registrations between P and the images Ik . To reduce this computation time, we suppose that we are using an average image M built from the images of the database (see section 2.2). From the atlas construction, we indeed obtain for each image Ik a non linear transformation TM→Ik bringing it on the average image. When registering P on M , we also obtain a non linear transformation TM→P . The key hypothesis of our work is then to assume that −1 TP →Ik can be approximated by TP →Ik ≈ TM→Ik ◦ TM→P . Using our hypothesis, the similarity between P and Ik can then be evaluated, up to an affine transformation, using the following equation: −1 d(Ik , P ) = TM→Ik ◦ TM→P − Id =
−1 TM→Ik ◦ TM→P (i) − Id
(2)
i −1 where i corresponds to the voxels of the dense transformation. As TM→Ik ◦TM→P is close to the identity, the use of Euclidean norm is reasonable. This measure could be also computed using a Log-Euclidean distance [5]. From the average image construction, the non linear transformations TM→Ik have already been computed. Using our hypothesis, we now only need to perform one non linear registration between M and P to select the most similar image to the patient to delineate. Finally, the patient P is delineated by registering it ˜ on I˜ and applying the transformation to the structures of I.
2.2
Construction of an Average Image
To select the most similar image, we need to build an atlas from the database of images. Many methods have been explored to perform this task. Guimond et al. [6] introduced a framework to create an unbiased mean image from a database of patients. Lorenzen et al. [7] improved this framework to cope with large deformations. De Craene et al. [8] proposed a coupled estimation of the average segmentations and the average image. Finally, Grabner et al. [9] presented an interesting method to compute directly an average symmetric image. We use in this paper a method that was already presented in [2]. As in that paper, we did not use the method of [8] because of the variability of the manual segmentations in the dataset. The construction is composed of three main steps. First, we build an average image using the method developed in [6]. This method has the advantage of being faster and simpler than the method proposed in [7]. This first step produces for each image Ik a transformation T˜M→Ik bringing it on the average image. Then, the atlas image is symmetrized in two steps. First, the method of Prima et al. [10] is used to find the inter-hemispheric plane in the average image. Then, it is symmetrized with respect to this plane. In addition, we compute in [2] average structures from the manual delineations. These are first brought on the average image using the T˜M→Ik transformations. Then, we use the STAPLE algorithm [11] in its multi-label implementation to compute non-overlapping most probable average segmentations. These delineations are then symmetrized in the same way as for the average image.
206
3
O. Commowick and G. Malandain
Evaluation Methodology
To evaluate our method, we have chosen to use a Leave-One-Out method. It consists in picking out one patient from the dataset of delineated images. The atlas is then built from the remaining images using the method presented in section 2.2. Two methods were compared for the automatic segmentation. First, the classical atlas-based segmentation method was used to segment automatically the remaining image. This image is therefore registered on the average atlas and the transformation is applied to the structures to get the segmentation. Then, we used our most similar image method, i.e. the most similar image among the remaining samples of the database is registered on the left out patient, using the same registration algorithms and parameters, to get its segmentation. The results obtained by each of those methods can then be compared to the manual delineations of the remaining patient using quantitative measures. In this paper, we have chosen to compute the sensitivity and specificity measures to evaluate the quality of the automatic segmentation methods. The evaluation has been performed by following this Leave-One-Out methodology on a database of 45 CT images of patients delineated for head and neck radiotherapy following the guidelines provided in [3]. As they were delineated for radiotherapy, some structures were not delineated or fused with other structures. Some structures are therefore not available for some patients. We have therefore used for the Leave-One-Out evaluation a subset of only 12 patients which were completely delineated manually.
4
Results
In this section, we first focus on qualitative results of the most similar image selection and of segmentation of one patient left out of the dataset. Then, we present quantitative results using the evaluation method presented above. 4.1
Qualitative Results
We first show in Fig. 1 a qualitative view of the image selected as the most similar compared to the atlas and to the patient to segment. Our method is dedicated to select the most similar image up to a global affine transformation and we therefore present in this figure all the images registered on the patient to segment using a global affine transformation. First, the patient and the atlas are very different. There is indeed an important difference between their morphologies. These differences often lead to registration discrepancies, that can cause segmentation errors. On the contrary, the most similar image is much closer to the patient than the atlas and should therefore be much easier to register. The second part of our qualitative validation was then to compare the results of classical atlas-based segmentation and our method of most similar image based delineation. We therefore show in Fig. 2 coronal and axial views of the segmentation of an other patient left out of the dataset.
Efficient Selection of the Most Similar Image in a Database
(a)
(b)
(c)
(d)
(e)
(f)
207
Fig. 1. Example of selected most similar image. Comparison between the atlas, most similar image and the patient. These images are affinely registered on the patient (see text). (a), (d): Slices of the patient. (b), (e): Corresponding atlas slices. (c), (f): Most similar image slices. Upper line: Coronal slices. Bottom line: Axial slices.
We first see in this figure (arrows on (b)) that the results obtained from the atlas are over-segmented when compared to the manual segmentations. This is due, as mentioned in the introduction, to the variabilities between contours when creating the atlas structures. This variability indeed results in over-segmented mean structures in the atlas itself. In the most similar image based segmentation, the results are not over-segmented anymore, as the atlas is now made from one single image, and are much closer to the real structures, particularly on lymph nodes and parotids. However, some regions such as the lymph nodes areas IV (arrows at the bottom of the images) and the right parotid (arrows at the upper left of the images) still have some errors. These are mainly due to the intraexpert, inter-patient variability in the manual delineation of the structures. We show in Fig. 3 an example illustrating these segmentation differences on the images after a global affine registration. This figure clearly shows that the manual delineations of the most similar image used for Fig. 2 are different from those of the patient to segment. First, the right parotid (arrows on axial and coronal views) incorporate a small region near the spinal column in the most similar image and not in the patient to segment. Moreover, the lymph node areas (arrows on the coronal views) also incorporate extra regions at the bottom that we can find back om image (c) and (f) in Fig. 2. Finally, there is also a smaller variability on the spinal cord delineation.
208
O. Commowick and G. Malandain
Fig. 2. Qualitative atlas and most similar image based segmentation results. (a), (d): Manual segmentation of the patient. (b), (e): Atlas-based segmentation of the patient. (c), (f): Segmentation using the most similar image in the database. Upper line: Coronal slices. Bottom line: Axial slices.
Fig. 3. Contours variability between the dataset images. Example showing views of the manual delineations on the patient presented in Fig. 2 (a), (c) compared to the manual contours on the most similar sample (b), (d). These images are affinely registered on the patient image (see text).
4.2
Quantitative Evaluation
We have seen above that the qualitative results were good, even if there were some errors due to contours variabilities in the database. We now present the quantitative evaluation described in section 3 over a subset of 12 patients fully delineated. We show in Table 1 the mean quantitative measures computed over the structures of the 12 patients. We have added in this table a third measure called distance, which corresponds to the distance to the best sensitivity and
Efficient Selection of the Most Similar Image in a Database
209
Table 1. Mean quantitative measures using the two different segmentation methods. Columns stand for the quality measures. Distance corresponds to the distance to the best achievable measure (Sensitivity = 1, Specif icity = 1). The lines stand for the method used to delineate the images (see text).
Atlas-based segmentation Most similar image
Sensitivity Specificity Distance 0.827 0.684 0.389 0.675 0.849 0.380
specificity achievable, i.e. d = (1 − Sens., 1 − Spec.). This last measure is computed to give a simplified idea of the quality of the result. This table shows first a great improvement in specificity when using our method, which confirms our first qualitative observations in the preceding results. However, the sensitivity is also lower than the one of the atlas-based segmentation. This is again due to the high variability of the contours in the dataset, as we have seen in Fig. 3. Nevertheless, the results are still very promising and the global measure for the most similar image based segmentation is a little better than the one obtained by the atlas.
5
Conclusion
We have presented in this paper a method to select up to a global affine transformation the most similar image to the patient to be delineated. This method is based on the use of a distance between the transformations used to build the atlas and the transformation computed to register the patient on the atlas. As the atlas is computed once and for all patients to be delineated, this method is therefore very easy to implement and very efficient. It indeed only requires one more registration when compared to classical atlas-based segmentation. This method has been validated on a subset of 12 patients in a database of 45 patients using a Leave-One-Out method. The obtained segmentations are not over-segmented anymore. However, we have seen through our experiments that the results are corrupted by an important intra-expert, inter-patient variability of the manual segmentations. One way to cope with this problem would be to use a database with repeated segmentations made by several experts. This would on one side help to quantify this delineation variability (inter and intra-expert variability) and also to obtain better results. This variability also explains why we obtain too large contours when building the atlas, as mentioned in the introduction. The coupling could therefore also be done when creating the atlas. The contours and their variability could then be introduced in the building process to better constrain the atlas creation. Previous work in this direction include the work of [8], where the binary structures were included in the atlas formation process. To handle the variability of contours, the use of several atlases representing sub-populations as in [4] could also be very interesting. An important perspective to this work is then to use an extension of our approach to find efficiently
210
O. Commowick and G. Malandain
sub-populations in the database. In addition to this efficiency, the selection of the most similar average atlas would then be solved using our selection method. Finally, more validation is to be added to this work. The quantitative results shown here are indeed computed on the structures that are both present in the most similar image and in the left out patient. As the structures of the dataset were delineated for radiotherapy, some structures are missing and therefore the validation is not done on all structures. More validation on more structures and on images acquired in clinical conditions in different centers would then be a major step towards clinical validation and use.
Acknowledgments This work was partially founded by ECIP project MAESTRO (IP CE503564) and ANRT. The authors are grateful to Pr. V. Grgoire for providing its expertise, the image database and the manual delineations.
References 1. Bondiau, P.Y., Malandain, G., et al.: Atlas-based automatic segmentation of MR images: validation study on the brainstem in radiotherapy context. Int. J. Radiat. Oncol. Biol. Phys. 61(1), 289–298 (2005) 2. Commowick, O., Malandain, G.: Evaluation of atlas construction strategies in the context of radiotherapy planning. In: SA2PM Workshop (From Statistical Atlases to Personalized Models) Held in conjunction with MICCAI 2006 (2006) 3. Gr´egoire, V., Levendag, P., et al.: CT-based delineation of lymph node levels and related CTVs in the node-negative neck: DAHANCA, EORTC, GORTEC, NCIC, RTOG consensus guidelines. Radiotherapy Oncology 69(3), 227–236 (2003) 4. Blezek, D.J., Miller, J.V.: Atlas stratification. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 712–719. Springer, Heidelberg (2006) 5. Arsigny, V., Commowick, O., Pennec, X., Ayache, N.: A Log-Euclidean framework for statistics on diffeomorphisms. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006 (I). LNCS, vol. 4190, Springer, Heidelberg (2006) 6. Guimond, A., Meunier, J., Thirion, J.P.: Average brain models: A convergence study. Computer Vision and Image Understanding 77(2), 192–210 (2000) 7. Lorenzen, P., Davis, B., Joshi, S.C.: Unbiased atlas formation via large deformations metric mapping. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 411–418. Springer, Heidelberg (2005) 8. De Craene, M., du Bois d’Aische, A., Macq, B., Warfield, S.: Multi-subject registration for unbiased statistical atlas construction. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 655–662. Springer, Heidelberg (2004) 9. Grabner, G., Janke, A.L., Budge, M.M., Smith, D., Pruessner, J., Collins, D.L.: Symmetric atlasing and model based segmentation: An application to the hippocampus in older adults. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006 (II). LNCS, vol. 4191, pp. 58–66. Springer, Heidelberg (2006) 10. Prima, S., Ourselin, S., Ayache, N.: Computation of the mid-sagittal plane in 3D brain images. IEEE Transaction on Medical Imaging 21(2), 122–138 (2002) 11. Warfield, S.K, Zou, K.H, Wells, W.M: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging 23(7), 903–921 (2004)
Unbiased White Matter Atlas Construction Using Diffusion Tensor Images Hui Zhang1 , Paul A. Yushkevich1 , Daniel Rueckert2 , and James C. Gee1 1
Penn Image Computing and Science Laboratory (PICSL), Department of Radiology, University of Pennsylvania, Philadelphia, USA 2 Department of Computing, Imperial College, London, UK
Abstract. This paper describes an algorithm for unbiased construction of white matter (WM) atlases using full information available to diffusion tensor (DT) images. The key component of the proposed algorithm is a novel DT image registration method that leverages metrics comparing tensors as a whole and optimizes tensor orientation explicitly. The problem of unbiased atlas construction is formulated using the approach proposed by Joshi et al., i.e., the unbiased WM atlas is determined by finding the mappings that best match the atlas to the images in the population and have the least amount of deformation. We show how the proposed registration algorithm can be adapted to approximately find the optimal atlas. The utility of the proposed approach is demonstrated by constructing a WM atlas of 13 subjects. The presented DT registration method is also compared to the approach of matching DT images by aligning their fractional anisotropy images using large-deformation image registration methods. Our results suggest that using full tensor information can better align the orientations of WM fiber bundles.
1
Introduction
Diffusion tensor imaging (DTI) is a unique imaging technique that probes microscopic tissue properties by measuring local diffusion of water molecules [3]. Its demonstrated ability to depict in vivo the intricate architecture of white matter (WM) [4] has made it an invaluable tool for furthering our understanding of WM both in normal populations and in populations with brain disorders. Wakana et al. [5] provided a powerful illustration of this new insight into WM by building an annotated WM atlas delineated via semi-automatic tractography-based segmentation on a single-subject anatomy. From the perspective of computational neuroanatomy, DTI also offers an exciting opportunity for the construction of an unbiased WM atlas, i.e., a standard coordinate system not biased by the anatomy of a particular subject but representing the complex variability within the whole population of interest. Such atlas can serve as a deformable template, which will enable detailed atlas information to be mapped to individual subject spaces [6]. Another important application of such atlas is in identifying as well as localizing WM differences across populations using DTI. In this scenario, individual images are mapped N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 211–218, 2007. c Springer-Verlag Berlin Heidelberg 2007
212
H. Zhang et al.
to the stereotactic space defined by the atlas, thereby removing shape differences among the individuals. On one hand, any change in microstructural tissue properties of shared anatomies, e.g., the mean rate of diffusion or the diffusion anisotropy, can be examined. On the other hand, the shape differences or variabilities encoded in the transformations that define the mapping between individual to the atlas are essential for the understanding of volumetric changes in WM structures. The key element in the construction of such a WM atlas is an effective image registration algorithm that establishes accurate mapping of common WM structures between images. Goodlett et al. [7] demonstrated the advantage of using large-deformation registration over affine registration for this purpose. The authors aligned WM structures by registering the scalar images derived from the fractional anisotropy (FA) images that were in turn derived from DT images. In this paper we describe the construction of such a WM atlas using a novel DT registration algorithm. Compared to [7], the registration algorithm proposed here takes full advantage of the relevant information encoded in DT images, particularly the tensor orientation, thus enabling more faithful alignment of different WM tracts, as first demonstrated in [8]. The rest of the paper is organized as follows. Sec. 2 describes the proposed DT registration algorithm while Sec. 3 gives details of its application to WM atlas construction. In Sec. 4, the preliminary results of applying the proposed procedure to a large database are presented and we report the quantitative analysis of the behavior and the performance of the proposed DT registration algorithm. A comparison of aligning DT images using the proposed registration algorithm to the approach of aligning their FA images using large-deformation registration methods suggest that using full tensor information can better align the orientations of WM fiber bundles. Future directions are discussed in Sec. 5.
2
Diffusion Tensor Image Registration
The DT image registration algorithm proposed here is an extension of the deformable DT image registration method that we recently proposed in [1]. The algorithm described in [1] models transformations as piecewise affine and leverages full tensor-based similarity metrics while optimizing tensor orientation explicitly. In addition, the derivatives of the registration cost function are analytic, enabling both fast and accurate derivative-based optimization. However, the algorithm, by design, is most accurate when deformation is not large, thus can render less optimal results when deformation becomes large. The extension proposed here aims to address this limitation of the algorithm without forgoing its advantages. Below we first summarize the algorithm in [1] and then describe our specific enhancements. 2.1
Piecewise Affine Algorithm
The algorithm in [1] approximates smooth transformations using a dense piecewise affine parametrization which is sufficient when the required deformations
Unbiased White Matter Atlas Construction Using Diffusion Tensor Images
213
are not large. The dense piecewise affine parametrization effectively divides the template space into uniform regions and parametrizes the transformation within each region by an affine transformation. The registration cost function consists of an image matching term (the likelihood) and a transformation smoothness term (the prior). The image term is the summation of the region-wise tensor image difference determined via a particular choice of tensor metric .. For a particular region Ω, the linear part of the associated affine transformation is parametrized using the matrix polar decomposition, such that x → (QS)x + T, where Q is a special orthogonal matrix representing the pure rotation, S is a symmetric positive definite matrix representing the pure deformation and T is the translation vector. By using the finite strain tensor reorientation [9], the image term is formulated as Is ((QS)x + T) − QIt (x)QT 2 dx , (1) φ(p) = Ω
where It and Is are the template (fixed) and subject (moving) DT images respectively, and p denotes the 12 affine parameters. The overal smoothness of the piecewise affine transformation is regularized by penalizing the discontinuities across region boundaries. For two adjacent regions Ωi and Ωj with the associated affine transformations being Fi and Fj respectively, the discontinuity is formulated as Fi (x) − Fj (x)dx, (2) Ωi ∩Ωj
where . denotes the vector norm. The dense piecewise affine approximation to the underlying transformation is estimated and refined in a hierarchical framework by beginning with coarse subdivision of template space then continuing with finer subdivision. The transformation estimated at the finest level is interpolated using the standard approach [10] to generate a smooth warp field which is then used to deform the subject into the space of the template with the PPD reorientation strategy [9]. Although the algorithm is applicable for general tensor similarity measures, we chose to use the tensor metric that measures the L2 distance between the anisotropic part of the apparent diffusion profiles associated with the DTs that we first described in [1]. Under this metric, the distance between two DTs D1 and D2 is equal to 8π 1 (D1 − D2 2C − Tr2 (D1 − D2 )) , (3) 15 3 where D1 − D2 C is the Euclidean distance between the two tensors and equal to Tr((D1 − D2 )2 ). This choice is consistent with the seminal observations by Pierpaoli et al in [4] that in human brain the isotropic part of diffusion are similar in values in grey and white matter regions. Hence, a metric focusing on identifying differences in anisotropic diffusion can be more optimal by ignoring differences in isotropic diffusion that are likely a result of noise or partial volume contamination in the data.
214
2.2
H. Zhang et al.
Enhancements to the Piecewise Affine Algorithm
The proposed extension handles larger deformations by both enhancing the piecewise affine algorithm itself and by iteratively composing smaller incremental deformations estimated using the enhanced piecewise affine algorithm. The enhancement to the piecewise affine algorithm aims to ensure that resulting deformation fields have physically meaningful Jacobian (matrix) determinant values, i.e., being positive and not close to be singular. The affine parametrization proposed in [1] parametrizes the pure deformation S using its six linearly independent components. This parametrization does not forbid S to have negative determinant and this undesirable scenario did occur in practice in our experimentation. We propose to instead parametrize the pure deformation S using its Cholesky decomposition, i.e., S = LLT , where L is a lower triangular matrix. In this scheme, S is now parametrized by the six non-zero elements in L and is guaranteed to be positive semidefinite. To penalize Jacobian determinant close to be singular, we add a Jacobian prior term adopted from [11], which is Tr(S 2 + S −2 − 2I) =
3
(s2i + s−2 i − 2),
(4)
i=1
where si are the ith eigenvalues of S. Observe that this prior term is zero when S is the identity matrix and increases when any of the eigenvalues si deviates from 1. Our second strategy for better handling large deformations is to use the enhanced piecewise affine algorithm to incrementally estimate the underlying true deformation. Given N successively determined incremental deformations {Pi }N i=1 , the final deformation is determined by their compositions, i.e., x → P1 ◦ P2 ◦ ... ◦ Pi ◦ ... ◦ PN −1 ◦ PN (x). During each incremental estimation, we set the weightings of both prior terms to stringent values such that large deformation is penalized and the incremental dense piecewise affine transformation is sufficiently close to the corresponding interpolated smooth deformation. The sufficient values for these weights can be empirically determined for a particular dataset. However, given that the DTs take physical values and overall don’t differ significantly across different datasets, we have found in practice that these weights, once determined for one dataset, worked well for others. The weightings that we have found worked well in practice are 0.08 for the smoothness prior and 0.2 for the Jacobian prior. The number of incremental estimation steps that we found sufficient is between 5 and 6. Each incremental estimate takes about 10 mins or less on a modern 3.0GHz Intel Xeon processor and the total estimation usually takes less than one hour.
3
Unbiased White Matter Atlas Construction
We formulate the unbiased atlas construction problem according to the approach proposed by Joshi et al. in [2]. Joshi et al. stated the unbiased atlas construction problem as estimating an image that requires the minimum amount of
Unbiased White Matter Atlas Construction Using Diffusion Tensor Images
215
deformation to map into every image in the population. In our context, given a population of N DT images {Ii }N i=1 , the atlas estimation problem can then be defined as N ˆi , ˆI} = arg min ( Ii ◦ Hi (x) − I(x)2 dx + D(Hi )), (5) {H Hi ,I
i=1
R3
where Hi is the deformation applied to the image Ii and D(Hi ) is some appropriate metric quantifying the amount of deformation associated with Hi . It can be shown that under the tensor metric Eqn. 3, the image that minimizes Eqn. 5, when the transformations {Hi }N i=1 are fixed, is simply N ˆI(x) = 1 Ii ◦ Hi (x). N i=1
(6)
This result is completely analogous to the result in [2] for the sum of squared error scalar image similarity metric, which allows us to use an algorithm similar to the iterative greedy method used to minimize Eqn. 5 in [2]. Our iterative algorithm is as follows. At iteration m, m ≥ 0, the atlas estimate ˆI(m) is estimated using Eqn. 6 with Hi = H (m) = P (0) ◦ P (1) ◦ ...P (m) (x) where i i i i (m) Pi is the estimated incremental dense piecewise affine transformation for the (0) ith image at iteration m. When m = 0, {Pi }N i=1 are initialized to the identity (0) transformation and the initial atlas ˆI is computed as an average of the original (m) N subject DT images {Ii }N }i=1 are estimated by registering i=1 . When m ≥ 1, {Pi (m−1) N }i=1 to the atlas estimate ˆI(m−1) using the piecethe DT images {Ii ◦ Hi wise affine algorithm described in Sec. 2. In this implementation, the amount of deformation D(Hi ) is approximated via the two prior terms discussed in Sec. 2. For our current implementation, we have found that the sufficient number of incremental iterations N for the estimated atlas to converge is around 6. The convergence is checked by estimating ˆI(m) − ˆI(m−1) . In practice, this procedure is applied to the subject images after they have been corrected for global size and pose differences using affine registration.
4
Experiments and Results
We demonstrate the performance of the proposed algorithm by applying it to construct a WM atlas from 13 DT images drawn from a large MR imaging database. MRI was performed on a Philips 3-Tesla system with maximum gradient strength of 62 mT/m on each independent axis and slew rate of 100 mT/m/ms on each axis using a 6-channel phased array head coil. Diffusionweighted images were acquired with a single-shot echo-planar diffusion-weighted sequence with 15 non-collinear gradient directions @ b = 1000 s/mm2 with a SENSE factor of 2. The additional imaging parameters are as follows: TR 12000ms, TE 51ms, slice thickness 2mm, field of view 224mm, matrix 128 x 128, resulting in voxel size 1.75 x 1.75 x 2 mm3 .
216
H. Zhang et al.
Fig. 1. Comparison of the atlas constructed from affine registered images (top row) to the atlas constructed from registered images using the proposed algorithm (bottom row). The regions with more prounounced differences are highlighted with arrows. The RGB image encodes the principal diffusion directionss: red for left-right, green for anterior-posterior and blue for inferior-superior [12].
We applied the proposed algorithm to the DT images reconstructed from the diffusion-weighted images using the standard linear regression [3]. The atlas constructed is shown in fig. 1 along with the initial atlas constructed from affinely aligned images. Compared to the initial atlas (top row), the final atlas (bottom row) has considerably sharper edge features as well as much richer details in the cortical regions. To demonstrate the behavior and the performance of the enhanced piecewise affine registration algorithm, the algorithm is used to register the affine registered images to the constructed unbiased atlas with 6 incremental steps. We quantitatively assessed the overall quality of spatial normalization after each incremental step using two voxelwise statistics: normalized FA standard deviation σ ¯F A and dyadic coherence κ. Since diffusion anisotropy and the dominant direction of diffusion are two features that account for most of the variations in WM [4], misalignment that renders different WM structures being mapped to one another should yield large voxelwise variations in either one or both of the features. The two voxelwise statistics directly assess these variations and hence can be indicative of misalignment of WM structures. Given a set of DTs sampled at some voxel from the normalized images after a particular incremental step, σ ¯F A is defined as the ratio of the standard deviation and mean of the FA values of these DTs, and κ [13] takes values that range from 0 to 1 (0 for randomly oriented directions and 1 for identically oriented directions). These statistics were computed for the voxels with FA > 0.2 in the atlas. The resulting statistical maps
Unbiased White Matter Atlas Construction Using Diffusion Tensor Images
217
from different incremental steps were compared using their respective empirical cumulative distribution functions (CDF). The method producing better spatial alignment should result in more reduction in σ ¯F A and larger increase in κ, which in turn will be reflected as its σ ¯F A and κ CDFs being more to the left and to the right, respectively. The performance of the algorithm is compared against both the initial affine alignment and the alignment rendered using a large-deformation scalar registration method which optimizes a cross-correlation metric under the constraints of a diffeomorphic transformation model in multi-resolution and symmetric fashion [14]. The large-deformation algorithm is applied to normalize the FA images of the affine-aligned DT images to the FA image of the unbiased DT atlas. The results are shown in Fig. 2. It shows that, by taking incremental steps, the proposed algorithm is able to gradually improve the quality of normalization ¯F A , the proposed algorithm with respect to both σ ¯F A and κ. With respect to σ can perform almost on par with the large-deformation algorithm registering FA images. With respect to κ, the proposed algorithm performs substantially better than the large-deformation algorithm, reflecting the benefit of aligning DT images using full tensor metrics. 1
1
0.9
0.9
0.8
0.8 0.7
0.6
Empirical CDF
Empirical CDF
0.7
Initial FA Tensor Step 2 Tensor Step 4 Tensor Step 6
0.5 0.4
0.6 0.5 0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
Initial FA Tensor Step 2 Tensor Step 4 Tensor Step 6
0
0.1
0.2
0.3
0.4 0.5 0.6 0.7 Normalized FA Standard Deviation
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4 0.5 0.6 Dyadic Coherence
0.7
0.8
0.9
1
Fig. 2. The empirical CDFs of both σ ¯F A and κ derived from the initial affine aligned images (Initial), the images aligned using the large-deformation FA registration (FA), and the images aligned using the proposed algorithm at 3 stages (Tensor Step 2, 4 and 6)
5
Discussion
In this paper we have described an algorithm for unbiased WM atlas construction that leverages a novel high-dimensional DT registration algorithm. The strength of the proposed algorithm lies in its ability to optimally align WM structures. The current approach however is limited in its ability to accurately assess the amount of deformation. We plan to address this issue by leveraging the recent work by Arsigny et al. [15], which enables the construction of diffeomorphic maps from piecewise affine transformations. The ability to create diffeomorphic interpolation of the estimated piecewise affine transformations will afford us the
218
H. Zhang et al.
principled approach to estimate the amount of deformation in the metric space of diffeomorphisms. Acknowledgment. The authors gratefully acknowledge support of this work by the NIH via grants EB006266, NS045839, HD046159, HD042974 and MH068066.
References 1. Zhang, H., Yushkevich, P.A., Alexander, D.C., Gee, J.C.: Deformable registration of diffusion tensor MR images with explicit orientation optimization. MIA 10 (2006) 2. Joshi, S., Davis, B., Jomier, M., Gerig, G.: Unbiased diffeomorphic atlas construction for computational anatomy. NeuroImage 23 (2004) 3. Basser, P.J., Mattiello, J., Bihan, D.L.: Estimation of the effective self-diffusion tensor from the NMR spin echo. JMR 103 (1994) 4. Pierpaoli, C., Jezzard, P., Basser, P.J., Barnett, A., Chiro, G.D.: Diffusion tensor MR imaging of the human brain. Radiology 201 (1996) 5. Wakana, S., Jiang, H., Nagae-Poetscher, L.M., van Zijl, P.C., Mori, S.: Fiber tractbased atlas of human white matter anatomy. Radiology 230 (2004) 6. Grenander, U.: General pattern theory. Oxford Univ. Press, Oxford (1994) 7. Goodlett, C., Davis, B., Jean, R., Gilmore, J., Gerig, G.: Improved correspondence for DTI population studies via unbiased atlas building. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, Springer, Heidelberg (2006) 8. Park, H.J., Kubicki, M., Shenton, M.E., Guimond, A., McCarley, R.W., Maier, S.E., Kikinis, R., Jolesz, F.A., Westin, C.F.: Spatial normalization of diffusion tensor MRI using multiple channels. NeuroImage 20 (2003) 9. Alexander, D.C., Pierpaoli, C., Basser, P.J., Gee, J.C.: Spatial transformations of diffusion tensor magnetic resonance images. TMI 20 (2001) 10. Little, J.A., Hill, D.L.G., Hawkes, D.J.: Deformations incorporating rigid structures. CVIU 66 (1997) 11. Ashburner, J., Andersson, J.L.R., Friston, K.J.: Image registration using a symmetric prior — in three dimensions. HBM 9 (2000) 12. Pajevic, S., Pierpaoli, C.: Color schemes to represent the orientation of anisotropic tissues from diffusion tensor data: application to white matter fiber tract mapping in the human brain. MRM 42 (1999) 13. Jones, D.K., Griffin, L.D., Alexander, D.C., Catani, M., Horsfield, M.A., Howard, R., Williams, S.C.R.: Spatial normalization and averaging of diffusion tensor MRI data sets. NeuroImage 17 (2002) 14. Avants, B.B., Gee, J.C.: Geodesic estimation for large deformation anatomical shape averaging and interpolation. NeuroImage 23 (2004) 15. Arsigny, V., Commowick, O., Pennec, X., Ayache, N.: A log-euclidean polyaffine framework for locally rigid or affine registration. In: Pluim, J.P.W., Likar, B., Gerritsen, F.A. (eds.) WBIR 2006. LNCS, vol. 4057, Springer, Heidelberg (2006)
Real-Time SPECT and 2D Ultrasound Image Registration Marek Bucki1 , Fabrice Chassat2 , Francisco Galdames3 , Takeshi Asahi2 , Daniel Pizarro2, and Gabriel Lobo4 1
3
TIMC Laboratory, UMR CNRS 5225, University Jospeh Fourier, 38706 La Tronche, France
[email protected] 2 Center of Mathematical Modeling, UMI CNRS 2807, University of Chile, Blanco Encalada 2120, Santiago, Chile Electrical Engineering Department, University of Chile, Tupper 2007, Santiago, Chil 4 Nuclear Medicine Service, San-Juan de Dios hospital, Occidental metropolitan service health, Huerfanos 3255, Santiago, Chile
Abstract. In this paper we present a technique for fully automatic, realtime 3D SPECT (Single Photon Emitting Computed Tomography) and 2D ultrasound image registration. We use this technique in the context of kidney lesion diagnosis. Our registration algorithm allows a physician to perform an ultrasound exam after a SPECT image has been acquired and see in real time the registration of both modalities. An automatic segmentation algorithm has been implemented in order to display in 3D the positions of the acquired US images with respect to the organs.
1
Introduction
Nowadays medical images are classically used to help physicians achieve more accurate diagnosis. In the literature, [1] have defined basic criteria to classify all types of medical image registrations. We focus on 2D/3D intrasubject multimodality registration, which means that we consider two modalities of images for the same patient at almost the same moment: functional 3D SPECT and anatomical 2D US images. This choice is motivated by the diagnosis we want to improve and the fact that these low cost technologies are available in most of the medical centers in Chile, where this study took place. The system is intended to help the surgeon differentiate acute lesions from scars in urinary tract infections. A discordant result between SPECT and US may occur because: 1) a lesion is seen in one but not in the other image (acute lesions) or 2) the US exploration simply missed a zone. Image registration provides better correlation between SPECT and US results. The superimposition of both modalities helps overcome that issue by visualizing the data in a common referential. A great amount of work can be found dealing with this registration problem, however in the case of 3D SPECT and 2D US registration for diagnosis, most of it is not applicable. A major drawback of the technique proposed in [2] is that it only matches a few selected US images with the SPECT volume. The purpose of N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 219–226, 2007. c Springer-Verlag Berlin Heidelberg 2007
220
M. Bucki et al.
our registration system is to locate on a US image an organ area which has been identified in the SPECT volume. This is totally incompatible with the selection of a few ultrasound images. Extrinsic registration methods often need invasive markers or fiducials [3][4] which is also incompatible with non-invasive diagnosis. As for intrinsic registration method, we face the problem of using two different image modalities, which makes it extremely difficult to find common landmarks. Some intrinsic registration can be made by common segmentation of the same object and distance minimization between the segmented points or surfaces [5][6]. This method is quite inappropriate in the case of US, as automatic segmentation of kidney is still an issue. Mutual information based registration [7] could be of interest, nevertheless we decided not to go further in this direction, as the common base needed for this method seems hard to find in the case of US, most of the documented studies being based on MRI or CT [8]. In order to overcome the issues mentioned above, our registration strategy is based upon the calibration of both scanners (SPECT and US) using a calibration frame and a localizer. This method [9] allows us to obtain easily the rigid transformation between the SPECT and US referentials but requires that the patient remains still during the whole procedure: SPECT acquisition followed R provided by the US exam. Our system is implemented on a Surgetics Station R by Praxim (France). This station is equipped with a Polaris infra-red optical passive localizer, manufactured by Northern Digital Inc. (Canada) which allows us to track in space so-called ‘rigid bodies’. We fix a physical reference ‘rigid body’ on the gamma camera bed. Assuming that the patient does not move during the whole exam, this bed reference represents also the patient space. Another rigid body is fixed on the transducer in order to track the US images. Once both US and SPECT images’ positions have been converted into the common patient referential we extract from the SPECT volume the slices corresponding to the US images. The resulting composite image, comprising both anatomical and functional information, is displayed on the station screen in real-time during the US exam. In addition, our software implements an automatic segmentation algorithm of 3D SPECT images, that lets the physician navigate the US image position with respect to the organ (see section 4).
2
Registration Overview
In order to merge SPECT and US data we need to compute for every US image pixel the corresponding SPECT value. This value is interpolated from the voxels closest to the physical US pixel position within the patient SPECT volume. We define the following referentials: U S for the US image, P ROBE for the US probe, LOC for the station localizer, BED for the gamma camera bed, M IRE for the calibration cube and SP ECT for the SPECT volume. The matrix that gives the position of any US pixel P within the SPECT volume is MUS→SP ECT such as: (P )SP ECT = MUS→SP ECT ∗ (P )US . This
Real-Time SPECT and 2D Ultrasound Image Registration
(a)
221
(b)
Fig. 1. (a) Referentials being tracked by the system during image registration. (b) Calibration cube with the 4 catheters labelled A, B, C, D and the corresponding ‘stains’ a, b, c and d on a SPECT Z-slice (grey square).
matrix can be written as: MUS→SP ECT = MBED→SP ECT ∗ MLOC→BED ∗ MP ROBE→LOC ∗ MUS→P ROBE . The time consuming calibration procedures are carried out beforehand. In order to compute MUS→P ROBE we use the well-tried freehand membrane scan method [10]. Section 3 describes the procedure to obtain MBED→SP ECT . These calibration matrices remain valid for all subsequent exams, as explained below. During the exam, the station localizer computes both MLOC→BED and MP ROBE→LOC (see Fig. 1a). The MUS→SP ECT matrix is assembled and for each pixel in the US image the corresponding SPECT value is computed by trilinear interpolation within the SPECT volume. A color code is associated with the pixel and the resulting color image is blended with the original US image. The physician can select the level of transparency to put emphasis on anatomical or physiological information.
3
SPECT Calibration
To compute MBED→SP ECT we use a calibration cube [9]. A rigid body is mounted on its front side defining the M IRE referential. Inside the cube 4 catheters filled with technetium run along the faces (see Fig. 1b). The cube is roughly positioned parallel to ZSP ECT axis, i.e. the bed axis, and scanned. The catheters appear clearly within the SPECT volume (see Fig. 2a). The catheters are segmented as follows. An initial threshold level is chosen and all voxels with lower intensity are ignored. From each Z-slice of the volume we extract a set of connex areas of pixels having an appropriate aspect ratio, that we call ‘stains’. We compute the gravity center of the 4 stains with the highest maximal greylevel. Ideally these gravity centers form a square but since the cube can be misaligned with ZSP ECT , we admit some tolerance around the threshold aspect ratio used to accept or reject the ‘square’ formed by the 4 stain centers. Once all the slices have been processed, the 4 sets of valid stain centers
222
M. Bucki et al.
(a)
(b)
Fig. 2. (a) SPECT Z-slice of a cube. The catheters generated the ‘stains’ a, b, c and d (see also Fig. 1b). (b) Set-up for SPECT calibration. A: plate referential permanently fixed on the bed. B: BED rigid body fixed on the bed for the duration of the exam. C: M IRE rigid body fixed on the calibration cube.
are linked together using a proximity search to form 4 point clouds, one for each catheter. Then a Principal Component Analysis is carried out on each point cloud to find its main inertial axis (i.e. the catheter). After these geometrical features have been extracted, they are labelled A, B, C and D using a-priori knowledge of the SP ECT referential orientation. The current segmentation error is computed as the maximum distance between all stain centers and their corresponding catheter. The initial threshold is increased and this procedure is repeated as long as a minimum of 40 slices can be properly segmented. The retained segmentation is the one with the minimal error. Once the 4 catheters have been segmented in SP ECT , we build an intermediary referential LOG. The center of LOG is the point OLOG that minimizes the distance to the 4 catheter axes. It is obtained by simple geometrical construction. We use this point along with its projections on axes A and B (see Fig. 1b) to define the referential XLOG and YLOG unitary vectors. The ZLOG vector is obtained by cross product. We proceed in the same way to build the LOG referential in M IRE. This time we use a pointer with a rigid body mounted on it to localize the tips of the 4 catheters in the physical space of M IRE. This operation needs not to be done each time we calibrate the gamma camera. We call it ‘calibration of the calibration’. We use the LOG referential to assemble MMIRE→LOG and MLOG→SP ECT . Then the localizer retrieves the position of M IRE with respect to BED and finally the SPECT calibration matrix is computed as: MBED→SP ECT = MLOG→SP ECT ∗ MMIRE→LOG ∗ MBED→MIRE . This matrix is valid as long as we keep the BED reference mounted on the bed and needs not to be recomputed for the subsequent patients. For practical reasons though, instead of leaving a rigid body permanently fixed to the bed, we chose a more robust solution to store our calibration matrices. We mounted on the bed side a 20x10x1cm thick plate having 5 calibrated holes on its surface.
Real-Time SPECT and 2D Ultrasound Image Registration
223
Unlike the BED rigid body, this plate does not interfere with the classical use of the gamma camera and can be left in place (cf. Fig. 2b). The reference system defined by this plate is recovered each time a new rigid body is fixed on the bed. To this end a pointer is used to localize the 5 holes in space. The SPECT calibration matrix is then transferred from the plate coordinates system into the bed referential. Finally we store in our system several calibrations made with distinct gamma camera parameters. At the moment of the exam, the physician chooses the calibration that fits the patient morphology.
4
Segmentation
To perform the segmentation we chose a deformable model method for its robustness and high noise immunity [11]. Our model is a simplex mesh that is iteratively adjusted to the form of the kidney. Simplex meshes are appropriate for this type of modeling since the position of a vertex can be expressed as a function of the position of its three neighboring vertices and the shape parameters [12][13] and because the deformation is controlled by discrete geometrical entities, allowing a simple control. In order to obtain an initial simplex mesh, we first generate a triangle mesh using a ‘marching cubes’ algorithm [14]. By applying a topological dual operation to the triangle mesh of this initial isosurface of the kidneys we obtain the initial simplex mesh of our model. The iterative deformation of the model is controlled by means of a Newtonian law of motion, using internal and external forces [15]: m
∂Pi − ∂ 2 Pi → → − + F int + F ext = −γ ∂t2 ∂t
where m is the mass of a vertex, usually 1, Pi is the position of the vertex i, γ → − → − is the damping factor and F int , F ext are the internal and external forces. The external forces push the vertices towards the edges of the kidney and are derived from the gradient ∇f (x, y, z) of the edge map f = |∇Image(x, y, z)|2 of the image, computed with a Sobel filter. The internal forces are given by the model [12][13] and their purpose is to control the smoothness of the deformation. The iterations stop when the mean deformation of the mesh is smaller than 0.1%.
5
Clinical Protocol
The registration protocol is carried out in the following way. First, a BED rigid body is mounted on the bed and its position localized within the bed plate referential. Then the patient is positioned and prepared for the SPECT exam in a classical way. The SPECT is acquired. The acquisition parameters along R and the volume with the data volume are loaded into the Surgetics Station segmentation is carried out. The patient stays on the bed without moving. A
224
M. Bucki et al.
(a)
(b)
Fig. 3. (a) 3D SPECT-2D US registration. (b) The US image is displayed in 3D with respect to the segmented kidneys.
physician does the US exam directly after the SPECT acquisition and the image registration is performed on the fly (cf. Fig. 3a). Each US image is also displayed in 3D with respect to the segmented surfaces of the kidneys (cf. Fig. 3b).
6
Results
We assessed the accuracy of the system using a plate comprising a grid of runnels. The runnels are filled with Technetium and the SPECT of the grid is acquired. Then, without moving the grid, we use a calibrated pointer equipped with a rigid body to locate in BED the centers of the runnels. The resulting point cloud is registered to the SPECT data using our algorithm. The error is computed by measuring the distance between the points and the reconsctructed runnels in SPECT. Five grid acquisitions were performed with different SPECT acquisition parameters. The overall registration error computed over the resulting 1092 points is mean = 1.5mm, max = 6.3mm and RM S = 2.0mm. The US calibration accuracy, on the other hand, is given by the software implementing the freehand membrane scan method. The calibration error, computed over 240 segmented points, is max = 2.6mm and RM S = 1.0mm. Although a global registration error could not be assessed due to the difficulty to design a SPECT and US compatible phantom, we think that the combined errors from SPECT calibration and optical localisation, on the one hand, along with the US calibration errors, on the other hand, are compatible with the intended use of the system i.e. use the functionnal information to guide the sonographical exploration of the patient’s kidney. Automatic kidney segmentations were successfully carried out on 5 patients and the registration proved to have, in all cases, clinically relevant accuracy throughout the exploration field. We rejected 1 exam where obvious patient
Real-Time SPECT and 2D Ultrasound Image Registration
225
movements have led to permanent (i.e. not breathing related) registration misalignement.
7
Discussion
From the practical point of view, the registration procedure is fast and easy to perform. It is also low-cost since it does not require MRI imaging or 3D sonography. The extra time required for the installation of the system is about 5 to 10 minutes. The only time consuming calibration procedures are performed only once and without the need for patient interaction. The fact that the SPECT and US exams are done at the same time in the same room might be considered as a drawback. This requires an extra effort of organization from the clinical point of view although it saves a lot of time for the patient. The kidneys undergo deformations and displacements while we breathe [16]. After a 30 min SPECT acquisition the patient might also feel the need to move. Any such movement breaks our hypothesis for rigid relationship between patient and bed and introduces error. Nevertheless, due to the duration of the acquisition, kidney respiratory movements are also present within the SPECT volume and although we might imagine strategies to stick to a reproductible position such as deep inspiration or deep expiration, as suggested in [16], we will not be able to overcome the problem from the SPECT side. An alternative way would be to consider that the SPECT data contains information about the activity of the organ in some ‘mean’ position which could be recovered with breath monitoring. But we didn’t explore further this possibility. Another source of motion are the interactions between the physician and the patient such as US probe pressure on the back of the patient. The physician must take care not to apply excessive pressure on the probe during the exam. The scope of the system can also be extended. For example we can imagine a renal puncture guidance using US enhanced with SPECT information. Still assuming rigid registration, which of course is not the exact reality, we can also imagine applications of this registration strategy to the sentinel node biopsy in patients with breast cancer, in which case the traditionally US guided protocol could benefit from the SPECT registration.
Acknowledgements Authors wish to thank C´esar Jim´enez and Andr´es P´erez from the nuclear medicine service, as well as Dr. Reginesi from radiology service, for their help and constant advice throughout the research process. This project has been financially supported by the FONDEF D01-I-1035 (Chile), the Alfa IPECA European project and carried out in collaboration with Praxim-Medivision (France) and Koelis (France).
226
M. Bucki et al.
References 1. Maintz, J., Viergever, M.: A survey of medical image registration. Medical Image Analysis 2(1), 1–36 (1988) 2. Walimbe, V., Zagrodsky, V., Raja, S., Jaber, W.A., DiFilippo, F.P., Garcia, M.J., Brunken, R.C., Thomas, J.D., Shekhar, R.: Mutual information-based multimodality registration of cardiac ultrasound and spect images: a preliminary investigation. International Journal of Cardiovascular Imaging 19(6), 483–494 (2003) 3. Ellis, R.E., Toksvig-Larsen, S., Marcacci, M., Caramella, D., Fadda, M.: A biocompatible fiducial marker for evaluating the accuracy of ct image registration. In: Lemke, H.U., Vannier, M.W., Inamura, K., Farman, A.G. (eds.) Computer assisted radiology, Excerpta medica - international congress series, vol. 1124, pp. 693–698 (1996) 4. Peters, T., Davey, B., Munger, P., Comeau, R., Evans, A., Olivier, A.: Threedimensional multimodal imageguidance for neurosurgery. IEEE Transactions on medical imaging 15(2), 121–128 (1996) 5. Kagadis, G.C., Delibasis, K.K., Matsopoulos, G.K., Mouravliansky, N.A., Asvestas, P.A., Nikiforidis, G.C.: A comparative study of surface- and volume-based techniques for the automatic registration between ct and spect brain images. Med. Phys. 29, 201–213 (2002) 6. Wolfsberger, S., Rossler, K., Regatschnig, R., Ungersbock, K.: Anatomical landmarks for image registration in frameless stereotactic neuronavigation. Neurosurg. Rev., Department of Neurosurgery, University of Vienna Medical School 25(1-2), 8–72 (2002) 7. Pluim, J.P., Maintz, J.B., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22(8), 986– 1004 (2003) 8. Studholme, C., Hill, D.L.G., Hawkes, D.J.: Automated 3-d registration of mr and ct images of the head. Medical Image Analysis 1(2), 163–175 (1996) 9. Peria, O., Chevalier, L., Francois-Joubert, A., Caravel, J.P., Dalsoglio, S., Lavallee, S., Cinquin, P.: Using a 3d position sensor for registration of spect and us images of the kidney. In: Ayache, N. (ed.) CVRMed 1995. LNCS, vol. 905, pp. 23–29. Springer, Heidelberg (1995) 10. Lango, T.: Ultrasound guided surgery: Image processing and navigation. Thesis, Norwegian University of Science and Technology (2000) 11. McInerney, T., Terzopoulos, D.: Deformable models in medical image analysis: A survey. Medical Image Analysis 1(2), 91–108 (1996) 12. Delingette, H.: General object reconstruction based on simplex meshes. INRIA, Sophia-Antipolis, France, Tech. Rep. 3111 (1997) 13. Delingette, H.: Simplex meshes: A general representation for 3-d shape reconstruction. INRIA, Sophia-Antipolis, France, Technical Report 2214 (1994) 14. Montani, C., Scateni, R., Scopigno, R.: Discretized marching cubes. In: IEEE Visualization, Proceedings of the conference on Visualization 1994, Washington D.C., pp. 281–287 (1994) 15. Galdames, F.J., Perez, C.A., Est´evez, P.A., Held, C.M.: Segmentation of renal spect images based on deformable models. In: SURGETICA 2005, Computer-Aided Medical Interventions: tools and applications, pp. 89–96 (2005) 16. Schwartz, L., Richaud, J., Buffat, L.: Kidney mobility during respiration. Radiother. Oncol. 32(1), 84–86 (1994)
A Multiphysics Simulation of a Healthy and a Diseased Abdominal Aorta Robert H.P. McGregor1, Dominik Szczerba1 , and G´ abor Sz´ekely1 Computer Vision Laboratory, Sternwartstr. 7, 8092 Z¨ urich, Switzerland http://www.vision.ee.ethz.ch
Abstract. Abdominal Aortic Aneurysm is a potentially life-threatening disease if not treated adequately. Its pathogenesis is complex and multifactorial and is still not fully understood. Many biochemical and biomechanical mechanisms have been identified as playing a role in the formation of aneurysms but it is as yet unclear what triggers the process. We investigated the role of the relevant biomechanical factors, in particular the wall shear stress and the intramural wall stress by simulating fluid structure interaction between the blood flow and the deforming arterial wall in a healthy abdominal aortic bifurcation, the preferred location of the disease. We then extended this study by introducing a hypothetical weakening of the aortic wall. Intramural wall stress was considerably higher and wall shear stress considerably lower in this configuration, supporting the hypothesis that biomechanical aneurysmal growth factors are self-sustaining.
1 1.1
Introduction Background
Abdominal aortic aneurysm (AAA) is recognized as a major cause of mortality in developed countries. Fifteen thousand people die every year from AAA rupture in the United States alone, making it the 13th leading cause of death in this country [1]. It is characterized by a permanent and irreversible widening of the infrarenal abdominal aorta, which, if left untreated, can dilate further and eventually rupture, leading to death in most cases. Although this disease is increasingly common due to ageing population, its precise causes are still not exactly understood. It is generally believed that there is no single cause for its occurrence, but AAA results from a complex interaction of many biochemical and biomechanical processes in which genetic predispositions also play a part. The observation that it often occurs simultaneously with atherosclerosis led to the assumption that it is linked to this disease, although this theory has been challenged recently [2]. Nevertheless, from a biomechanical point of view, the study of both these pathologies requires precise knowledge of the local hemodynamic factors acting on the arterial wall as well as the stresses within it. Until recently such a holistic approach has been difficult to implement, due to limitations of imaging equipment and in particular of computing power. Thanks to N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 227–234, 2007. c Springer-Verlag Berlin Heidelberg 2007
228
R.H.P. McGregor, D. Szczerba, and G. Sz´ekely
recent technological advances we are now able to perform full fluid-solid interaction (FSI) simulations of the living artery with commercially available hardware, thus gaining valuable insight into these processes. 1.2
Related Work
The pathogenesis of AAA is subject of ongoing research. A recent recapitulative study in this area is presented by Ailawadi et al. [3], who point out the complexity of the disease and identify the main mechanisms participating in AAA formation. They agree with Grange et al. [2] to say that the wall structure and elasticity is changed in a diseased artery as compared to a healthy one mostly due to degradation of elastin and collagen, the two principal load-bearing fibers of the extracellular matrix, by biochemical processes. There is a large number of publications pertaining to the elasticity and deformation model of the arterial wall. Zhao et al. [4] present a good overview of this topic and show the large diversity of approaches. Every author seems to develop a new mathematical model of his own, making it very difficult to choose a particular one from the literature. Of special interest to us is the contribution by Raghavan et al. [5] who compare the elastic properties of an aneurysmal arterial wall with those of a healthy one and link the elastin and collagen contributions to an overall constitutive equation. Raghavan expands and uses this model in subsequent work [6] for simulating the intramural wall stress (IWS) in AAAs, using patient specific geometries and static pressure-loading. Much research is taking place in this direction, the main idea being to develop a useful clinical tool to assess an AAA’s risk of rupture as the current indicators (aneurysmal diameter or, more recently, volume) are considered inexact, see [7] for an overview. Leung et al. [8] went further and performed a fully coupled FSI simulation to evaluate the rupture risk, but concluded that considering the amount of computational power and time involved, and the small difference in results, static models were already sufficient for this kind of analysis. AAA is known to preferentially occur distal to the renal arteries and proximal to the arterial bifurcation. This has led to the hypothesis that the complex flow patterns occurring here are a possible cause, in particular that low timeaveraged wall shear stress (WSS), flow recirculation as well as high temporal gradients of wall shear stress are important factors in atheroscopic plaque formation and aneurysmal genesis. Several researchers have simulated blood flow at the abdominal bifurcation so as to quantify these effects, e.g. [9], but they do not account for the biomechanical effects inside the wall. In fact, there is little to be found in the literature on IWS in a healthy aorta. Zhao et al. [4] have reviewed the available tools for FSI simulations and presented their own method applied to model a carotid bifurcation. FSI simulations are now becoming more accessible due to the technological progress of computer hardware and in particular the availability of 64 bit architectures, offering increased memory. We propose to use the opportunities which these improved conditions open to closer investigate and analyze the
A Multiphysics Simulation of a Healthy and a Diseased Abdominal Aorta
229
biomechanical processes inherent to aneurysmal growth, using a holistic approach. The novelty of our study lies in the investigation a healthy patientspecific artery using FSI and to model its long-term weakening based on biomechanical factors.
2
Methods
Workflow pipeline. We start the procedure by acquiring the MRI data from which the arterial lumen is segmented. This geometry is then filled with a tetrahedral mesh and prisms are extruded to create the wall mesh. A FSI simulation is then performed to model the behavior of a healthy artery throughout a heart cycle. Finally a wall weakening is introduced to study the long-term behavior of a diseased aorta. The MRI data were acquired using a Philips Achieva 1.5T, using a 3-D phase contrast (PC) pulse sequence, which provided a time-resolved velocity field as well as bright blood anatomy images of the lower abdomen of a healthy 35 year old male volunteer. The slices are taken perpendicularly to the infrarenal abdominal aorta. The images are 224 × 207 pixels in size for 20 slices, with a slice thickness of 5mm and an in-plane pixel spacing of 1.29 mm, over 24 phases of the cardiac cycle. These data were recorded in 18 min (pure measurement time) and the volunteer had to perform breath holds during the acquisition cycles. The velocity data were used as a reference for validation of the simulated blood flow. The arterial lumen was segmented from the anatomy images and smoothed, so as to produce an initial surface mesh. This was then used as input to a novel meshing algorithm [10] resulting in a high quality tetrahedral mesh with refinement close to the walls. Three layers of 0.5 mm thick prisms were then added to the surface, to model the aortic wall as this is typically ∼ 1.5 mm thick [8]. These prisms were then subdivided into tetrahedra. Figure 1(a) shows a cutaway of the final mesh. The refinement of the lumen mesh is necessary for two reasons: firstly it speeds up computations enormously and secondly it ensures a sufficiently high resolution at the interface. This is desirable as the wall is thin, but still needs to have at least three layers of cells for reliable predictions. In order to have wall elements which are not excessively thin radially, they need to have short edges in the circumferential direction. The final lumen mesh consists of 10’043 tetrahedra with an average quality of 0.74 (defined as normalized radius ratio of circumscribed to inscribed sphere), while the wall mesh contains 25’686 tetrahedra with an average quality 0.41. The simulations were performed by a finite element model (FEM) code, solving the incompressible Navier-Stokes equations coupled with the structural mechanics stress equations and using an Arbitrary Lagragian Eulerian (ALE) model for mesh displacement. All the mesh elements were chosen to be Lagrangian quadratic. The coupling is passive in the sense that we do not address momentum transfer from constricting walls. The geometry was fixed at both inlet and outlets and a sinusoidal pressure wave, mimicking the systolic and diastolic pressure distributions was prescribed as boundary conditions for the flow equations.
230
R.H.P. McGregor, D. Szczerba, and G. Sz´ekely
The flow was assumed to be laminar, incompressible and Newtonian (an assumption which has shown to be valid for large arteries [11]), with a density of 1020 kg/m3 and a dynamic viscosity of 0.003 P a/s. The arterial wall was considered to be linear elastic as this has shown to be a valid approximation for pressure loading within normal systolic and diastolic ranges [12], with an elasticity modulus of 2.7 106 P a, a density of 2000 kg/m3 and a poisson ratio of 0.49, so almost incompressible. In aneurysmal tissue the extra-cellular matrix has been observed to be degraded. One hypothesis for this degradation which has been proposed by Vorp et al. [7] is that this is due to large stress over a period of time. From a biochemical point of view this corresponds to saying that the elastin and collagen, the main components of the extra-cellular matrix, are degraded when they are under load, leading to long term plastic behavior of the aorta. We simulated this by locally decreasing the Young’s modulus in the infrarenal abdominal aorta above the bifurcation, corresponding to the observed preferential localization of the disease. We incrementally weakened the arterial wall so as to observe the effect this would have on WSS and IWS. The localization was governed by an upside-down Gaussian bell, which was multiplied with the weakening factor: E = E0 (1 − w) e
−(z−z0 )2 2 2σz
,
with E0 being the original Young’s modulus, w the weakening (w = 0.1 would mean the minimal E would be 10% smaller than E0 ), z0 was set as the midplane between the renal arteries and the iliac bifurcation and σz was chosen to be 30% of the geometry’s full length. The use of a Gaussian bell can be justified by the propagative nature of the weakening process, which is internally driven by failure of the strengthening fibers in the extra-cellular matrix. Fibers adjacent to a broken one will suffer from increased strain and thus have a high likelihood of failing as well. In a first order approximation we model this propagation as diffusion, but a more sophisticated model could be easily integrated.
3
Results
Simulation of a Healthy Aortic Bifurcation. The simulation of a healthy aortic bifurcation revealed that time averaged WSS was low on the outer wall of the left branch at the bifurcation (see figure 3(c)), which is the one with the larger takeoff angle. This correlates well with the typical location of atherosclerotic plaques, but it is also interesting to observe that the upper portion of the geometry (proximal to the bifurcation), especially on the anterior side, also suffers from low time averaged WSS (see figure 3(b)). It may also be observed that the aorta proximal to the bifurcation has a higher IWS than in the iliac arteries (see figures 3(e) and 3(f)) and this holds throughout the cardiac cycle (see figure 1(b)). This region is typically prone to AAA and the causes may lie in a combination of low time averaged WSS, causing atherosclerotic plaques which locally modify
A Multiphysics Simulation of a Healthy and a Diseased Abdominal Aorta
231
(a)
(b)
Fig. 1. (a) Cutaway of the mesh used for simulation. (b) IWS stress along line s at different phases of the cardiac cycle: +: during systolic acceleration, ◦: at peak systole ∗: during diastolic deceleration and 2: at peak flow reversal.
the wall’s mechanical properties and high time averaged IWS, causing fatigue in the elastin and collagen fibres which may eventually become degraded. The use of a coupled model also allows for the computation of displacements of the aortic walls during the heart cycle. The maximal displacements were observed at the bifurcation (see figure 3(a)), in the locations where the IWS is largest. The computed displacements are 4.3 times greater at these locations than the average wall displacement, being up to 1.02 mm at peak systole. This localized concentration of high IWS is very similar to what Thubrikar et al. [13] found at arterial branchings. Simulating Aneurysmal Growth. The weakening (w) was incrementally increased until reaching a maximum of 96 %. The results are shown in figure 4. Figure 4(b) shows the IWS in the weakened aorta (with maximal weakening) at peak systole, this is on average 1.54 times larger than in the healthy aorta, the most noticeable differences being on the posterior wall. Despite the smaller Young’s modulus, the stresses are clearly higher in a weakened artery, this is in agreement with Leung [8] and Fillinger et al. [14] who also find higher average stress in the aneurysmal area than in the healthy one. The WSS (see figure 4(b)) is on average 0.74
100 125
NO PLAQUE
115
110
105
80
160
70
140
60
120
WITH PLAQUE
IWS
100 0
0.2
0.4
0.6
Weakening
0.8
IWS (% of original)
WSS (% of original)
Diameter (% original)
WSS 90
120
1
0
(a)
0.2
0.4
0.6
Weakening
0.8
100 1
(b)
Fig. 2. The effect of weakening on the arterial wall: (a) increase in artery diameter, (b) average IWS and average WSS. In both cases the dotted line shows the effect of plaque formation and the solid one ignores it.
232
R.H.P. McGregor, D. Szczerba, and G. Sz´ekely
(a) Absolute displacement of wall at peak systole (in mm).
(b) Time averaged WSS, anteriorposterior view (in N/m2 ).
(c) Time averaged WSS, posterioranterior view (in N/m2 ).
(d) Velocity profile at systolic peak (in m/s).
(e) Time averaged IWS (in N/m2 ), anterior-posterior view.
(f) Time averaged IWS (in N/m2 ), posterior-anterior view.
Fig. 3. Results of simulation
(a) Weakened artery deformation as compared to healthy one.
(b) IWS at peak systole, posterioranterior view.
(c) WSS at peak systole, posterioranterior view.
Fig. 4. Results of simulation using a weakened artery
A Multiphysics Simulation of a Healthy and a Diseased Abdominal Aorta
233
times that in a healthy aorta. Figure 2(b) shows the evolution of these factors as a function of weakening of the aorta. Assuming WSS and IWS are factors which degrade the wall, they are also factors which worsen with wall decay, suggesting that the aortic wall weakening will lead to a snowball effect. When considering the results shown on figure 2 (solid lines), one would expect that an aneurysm is doomed to rupture within a short delay, due to the exponential nature of the process. However, this does not take into account the formation of atherosclerotic plaques in the intima, which is caused by low time averaged WSS and contributes to stiffening the wall. We performed a study of the effect this would have by adding a term to the Young’s modulus which increased in areas of low WSS. As figures 2 (dotted lines) show, this indeed stabilizes the weakening process to some extent.
4
Validation
Validation of such a simulation is always difficult, as there is no in vivo gold standard. We are, however, able to compare the simulated flow velocity to that measured by MRI, even though this also has an inherent error. We found that our computed velocity profile (see figure 3(d)) is qualitatively similar to the MRI profile, even though the latter is rather noisy especially at low velocities (flow reversal). Quantitatively it is hard to compare the two as we rely on a hypothetical pressure profile, which corresponds to average physiological values, but does not necessarily match that of the actual volunteer. We therefore prefer to compare our findings with generic results of other researchers active in this field. We find similar flow profiles as Long et al. [9] and Taylor et al. [15] and we are also in quantitative and qualitative agreement regarding the time averaged WSS distributions. Zhao et al. [4] present the stress distribution in a healthy abdominal aortic bifurcation and, although they use a static model and lower pressure load, the spatial variation they find is very similar to ours.
5
Conclusions
We have performed a full FSI simulation of a healthy aortic bifurcation with physiological loading. We have shown that both low WSS and high IWS are to be found preferentially in the aorta proximal to the bifurcation and distal to the renal arteries, which correlates well with the typical positioning of AAAs. We have also shown that using the same model, but with a locally weakened wall, the biomechanical factors contributing to aneurysm growth are worsened, thus causing a snowball effect which to continual growth of the aneurysm and ultimately to rupture. However, we were able to demonstrate that atherosclerotic plaque formation counteracts this effect and may help to reach an equilibrium in the process. In the future we hope that such a tool could be used to identify risk factors and eventually to predict the onset and the evolution of the disease.
234
R.H.P. McGregor, D. Szczerba, and G. Sz´ekely
Acknowledgments. This work was supported by the Indo-Swiss Joint Research Programme.
References 1. Gillum, R.F.: Epidemiology of aortic aneurysm in the united states. Journal of Clinical Epidemiology 48(11), 1289–1298 (1995) 2. Grange, J.J., Davis, V., Baxter, B.T.: Pathogenesis of abdominal aortic aneurysm: an update and look toward the future. Cardiovascular Surgery 5(3), 256–265 (1997) 3. Ailawadi, G., Eliason, J.L., Upchurch, G.R.: Current concepts in the pathogenesis of abdominal aortic aneurysm. Journal of Vascular Surgery 38(3), 584–588 (2003) 4. Zhao, S.Z., Xu, X.Y., Collins, M.W.: The numerical analysis of fluid-solid interactions for blood flow in arterial structures part 1: a review of models for arterial wall behaviour. Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine V212(4), 229–240 (1998) 5. Raghavan, M.L., Webster, M.W., Vorp, D.A.: Ex vivo biomechanical behavior of abdominal aortic aneurysm: Assessment using a new mathematical model. Ann. Biomed. Eng. 24(5), 573–582 (1996) 6. Raghavan, M.L., Vorp, D.A., Federle, M.P., Makaroun, M.S., Webster, M.W.: Wall stress distribution on three-dimensionally reconstructed models of human abdominal aortic aneurysm. Journal of Vascular Surgery 31(4), 760–769 (2000) 7. Vorp, D.A., Geest, J.P.V.: Biomechanical determinants of abdominal aortic aneurysm rupture. Arterioscler. Thromb. Vasc. Biol. 25(8), 1558–1566 (2005) 8. Leung, J.H., Wright, A.R., Cheshire, N., Crane, J., Thom, S.A., Hughes, A.D., Xu, Y.: Fluid structure interaction of patient specific abdominal aortic aneurysms: a comparison with solid stress models. BioMedical Engineering OnLine 5(33) (2006) 9. Long, Q., Xu, X.Y., Bourne, M., Griffith, T.M.: Numerical study of blood flow in an anatomically realistic aorto-iliac bifurcation generated from MRI data. Magnetic Resonance in Medicine 43(4), 565–576 (2000) 10. Szczerba, D., McGregor, R., Szekely, G.: High quality surface mesh generation for multi-physics bio-medical simulations. In: Simulation Of Multiphysics Multiscale Systems, 4th International Workshop (2007) 11. Berger, S.A., Jou, L.D.: Flows in stenotic vessels. Annual Review of Fluid Mechanics 32(1), 347–382 (2000) 12. Leuprecht, A., Perktold, K., Prosi, M., Berk, T., Trubel, W., Schima, H.: Numerical study of hemodynamics and wall mechanics in distal end-to-side anastomoses of bypass grafts. Journal of Biomechanics 35(2), 225–236 (2002) 13. Thubrikar, M.J., Roskelley, S.K., Eppink, R.T.: Study of stress concentration in the walls of the bovine coronary arterial branch. Journal of Biomechanics 23(1), 15–17 (1990) 14. Fillinger, M.F., Marra, S.P., Raghavan, M.L., Kennedy, F.E.: Prediction of rupture risk in abdominal aortic aneurysm during observation: Wall stress versus diameter. Journal of Vascular Surgery 37(4), 724–732 (2003) 15. Taylor, C.A., Hughes, T.J.R., Zarins, C.K.: Finite element modeling of threedimensional pulsatile flow in the abdominal aorta: Relevance to atherosclerosis. Annals of Biomedical Engineering V26(6), 975–987 (1998)
New Motion Correction Models for Automatic Identification of Renal Transplant Rejection Ayman El-Baz1, Georgy Gimel’farb2 , and Mohamed A. El-Ghar3 1
Bioengineering Department, University of Louisville, Louisville, KY, USA Department of Computer Science, University of Auckland, New Zealand Urology and Nephrology Department, University of Mansoura, Mansoura, Egypt 2
3
Abstract. Acute rejection is the most common reason of graft failure after kidney transplantation and early detection is crucial to survive the transplanted kidney function. In this paper, we introduce a new approach for the automatic classification of normal and acute rejection transplants from Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCEMRI). The proposed algorithm consists of three main steps; the first step isolates the kidney from the surrounding anatomical structures. In the second step, new motion correction models are employed to account for both the global and local motion of the kidney due to patient moving and breathing. Finally, the perfusion curves that show the transportation of the contrast agent into the tissue are obtained from the kidney and used in the classification of normal and acute rejection transplants. In this paper, we will focus on the second and third steps and the first step is shown in detail in [1].
1
Introduction
In the United States, approximately 12000 renal transplants are performed annually [2], and considering the limited supply of donor organs, every effort is made to salvage the transplanted kidney [3]. However, acute rejection - the immunological response of the human immune system to the foreign kidney - is the most important cause of graft failure after renal transplantation [4], and the differential diagnosis of acute transplant dysfunction remains a difficult clinical problem. Currently, the diagnosis of rejection is done via biopsy which has the downside effect of subjecting the patients to risks such as bleeding and infections. Moreover, the relatively small needle biopsies may lead to over or underestimation of the extent of inflammation in the entire graft [5]. Therefore, a noninvasive and repeatable technique is not only helpful but also needed in the diagnosis of acute renal rejection. In DCE-MRI, a contrast agent called Gd-DTPA is injected into the bloodstream, and as it perfuses into the organ, the kidneys are imaged rapidly and repeatedly. During the perfusion, Gd-DTPA causes a change in the relaxation times of the tissue and creates a contrast change in the images. As a result, the patterns of the contrast change gives functional information, while MRI provides good anatomical information which helps in distinguishing the N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 235–243, 2007. c Springer-Verlag Berlin Heidelberg 2007
236
A. El-Baz, G. Gimel’farb, and M.A. El-Ghar
diseases that affect different parts of the kidneys. However, even with an imaging technique like DCE-MRI, there are several problems such as, (i) the spatial resolution of the dynamic MR images is low due to fast scanning, (ii) the images suffer from the motion induced by the breathing patient which necessitates advanced registration techniques, and (iii) the intensity of the kidney changes non-uniformly as the contrast agent perfuses into the cortex which complicates the segmentation procedures. To the best of our knowledge, there has been limited work on the dynamic MRI to overcome the problems of registration and segmentation. For the registration problem, Gerig et al. [6] proposed, using Hough transform, to register the edges in an image to the edges of a mask and Giele et al. [8] introduced a phase difference movement detection method to correct for kidney displacements. Both of these studies required building a mask manually by drawing the kidney contour on a 2D DCE-MRI image, followed by the registration of the time frames to this mask. For the segmentation problem, Boykov et al. [7] presented the use of graph cuts using Markov models, where the energy is minimized depending on the manually exerted seed points. Giele et al. [8] used image subtraction to obtain a mask, and closed the possible gaps by the use of a hull function. For further segmenting the medulla and the cortex structures, repeated erosions were applied to the mask to obtain several rings; however, in such rings, the medulla structures were intermixed with the cortex structures, so a correlation study had to be applied to better classify the cortical and medullary pixels. Following these studies, a multi-step registration approach was introduced by Sun et al. [9]. Initially, the edges are aligned using an image gradient based similarity measure considering only translational motion. Once roughly aligned, a high-contrast image is subtracted from a pre-contrast image to obtain a kidney contour; which is then propagated over the other frames searching for the global registration parameters. For the segmentation of the cortex and medulla, a level sets based approach was used. Most of these efforts used healthy transplants in the image analysis, and edge detection algorithms were sufficient. However, in the case of acute rejection patients, the uptake of the contrast agent is decreased, so edge detection fails in giving connected contours.
2
Methods
The objective of the proposed image analysis approach is to detect acute renal rejection from DCE-MRI images. To achieve this goal, an image analysis system consisting of three steps is proposed. These steps are: i) segmentation of kidney from DCE-MRI images, ii) correcting the motion artifacts caused by breathing and patient motion, and iii) computing the perfusion curves that show the transportation of the contrast agent into the tissue of the kidney. In this paper we will focus on the second and third steps and the first step is shown in detail in [1].
New Motion Correction Models
3
237
Motion Correction Models
In this section, we introduces two models to correct both the global and local motion of the kidney due to patient moving and breathing. The main idea of the two models is as follows: from two subsequent DCE-MRI images, we will model the visual appearance of the kidney using a Markov-Gibbs random field with pairwise interaction. Our approach is based on finding the affine transformation to register target image to the reference image by maximizing a special Gibbs energy function using a gradient descent algorithm. To get the accurate appearance model, we developed a new approach to select automatically the most important cliques that describe the visual appearance of kidney images from DCE-MRI images. To handle local deformations, we propose a new approach based on deforming each pixel over evolving closed and equi-spaced contours (iso-contours) to closely match the prototype (reference kidney object). The evolution of the iso-contours is guided by an exponential speed function by minimizing the distances between the corresponding pixel pairs on the iso-contours on both images. 3.1
Global Motion Model
Basic notation. We denote Q = {0, . . . , Q − 1}; R = [(x, y) : x = 0, . . . , X − 1; y = 0, . . . , Y − 1], and Rp ⊂ R a finite set of scalar image signals (e.g. gray levels), a rectangular arithmetic lattice supporting digital images g : R → Q, and its arbitrary-shaped part occupied by the prototype (reference kidney object), respectively. A finite set N = {(ξ1 , η1 ), . . . , (ξn , ηn )} of (x, y)-coordinate offsets defines neighbors {((x + ξ, y + η), (x − ξ, y − η)) : (ξ, η) ∈ N } ∧ Rp interacting with each pixel (x, y) ∈ Rp . The set N yields a neighborhood graph on Rp to specify translation invariant pairwise interactions with n families Cξ,η of cliques cξ,η (x, y) = ((x, y), (x + ξ, y + η)). Interactionstrengths are given by a T T T : (ξ, η) ∈ N of potentials Vξ,η = Vξ,η (q, q ) : (q, q ) ∈ Q2 vector V = Vξ,η depending on signal co-occurrences; here T indicates transposition. Image normalization. To account for monotone (order-preserving) changes of signals (e.g. due to different illumination or sensor characteristics), the prototype and object images are equalized using the cumulative empirical probability distributions of their signals on Rp . Markov Gibbs random field based appearance model. In line with a generic Markov Gibbs random field with multiple pairwise interaction, the Gibbs probability P (g) ∝ exp(E(g)) of an object g aligned with the prototype g ◦ on Rp is specified with the Gibbs energy E(g) = |Rp |VT F(g) where FT (g) is the vector of scaled empirical probability distributions of signal co-occurrences over each clique family: FT (g) = [ρξ,η FT ξ,η (g) : (ξ, η) ∈ N ]
238
A. El-Baz, G. Gimel’farb, and M.A. El-Ghar
where ρξ,η =
|Cξ,η | |Rp |
is the relative size of the family and Fξ,η (g) = [fξ,η (q, q |g) : |C
(g)|
are empirical probabilities of sig(q, q ) ∈ Q2 ]T ; here, fξ,η (q, q |g) = ξ,η;q,q |Cξ,η | nal co-occurrences, and Cξ,η;q,q (g) ⊆ Cξ,η is a subfamily of the cliques cξ,η (x, y) supporting the co-occurrence (gx,y = q, gx+ξ,y+η = q ) in g. The co-occurrence distributions and the Gibbs energy for the object are determined over Rp , i.e. within the prototype boundary after an object is affinely aligned with the prototype. To account for the affine transformation, the initial image is re-sampled to the back-projected Rp by interpolation. The appearance model consists of the neighborhood N and the potential V to be learned from the prototype. Learning the potentials. The maximum likelihood estimator of V is proportional in the first approximation to the scaled centered empirical co-occurrence distributions for the prototype [1]: 1 Vξ,η = λρξ,η Fξ,η (g ◦ ) − 2 U ; (ξ, η) ∈ N Q where U is the vector with unit components. The common scaling factor λ is also computed analytically; it is approximately equal to Q2 if Q 1 and ρξ,η ≈ 1 for all (ξ, η) ∈ N . In our case it can be set to λ = 1 because the registration uses only relative potential values and energies. Learning the characteristic neighbors. To find the characteristic neighborT hood set N , the relative energies Eξ,η (g ◦ ) = ρξ,η Vξ,eta Fξ,η (g ◦ ) for the clique families, i.e. the scaled variances of the corresponding empirical co-occurrence distributions, are compared for a large number of possible candidates. Figure 1 shows a kidney prototype and its Gibbs energies Eξ,η (g ◦ ) for 5000 clique families with the inter-pixel offsets |ξ| ≤ 50; 0 ≤ η ≤ 50. To automatically select the characteristic neighbors, let us consider an empirical probability distribution of the energies as a mixture of a large “noncharacteristic” low-energy component and a considerably smaller characteristic high-energy component: P (E) = πPlo (E) + (1 − π)Phi (E). Because both the components Plo (E), Phi (E) can be of arbitrary shapes, we closely approximate them with linear combinations of positive and negative Gaussians. For both the approximation and the estimation of π, we use the efficient EM-based algorithms introduced in [1]. The intersection of the approximate mixture components gives an energy threshold θ for selecting the characteristic neighbors: N = {(ξ, η) : Eξ,η (g ◦ ) ≥ θ} where Phi (θ) ≥ Plo (θ)π/(1 − π). The above example results in the threshold θ = 28 producing 76 characteristic neighbors shown in Fig. 2(a),(b) together with the corresponding relative pixel-wise energies ex,y (g ◦ ) over the prototype: ex,y (g ◦ ) =
(ξ,η)∈N
◦ ◦ Vξ,η (gx,y , gx+ξ,y+η )
New Motion Correction Models
(a)
239
(b)
Fig. 1. Kidney image (a) and relative interaction energies (b) for the clique families in function of the offsets (η, ξ). Note that we will repeat this step to calculate the relative energy for n images in order to estimate the potentials and neighborhood system.
(a)
(b)
(c)
Fig. 2. (a) Most characteristic 76 neighbors among the 5000 candidates (a; in white), (b) the pixel-wise Gibbs energies for the prototype under the estimated neighborhood, and (c) Gibbs energies for translations of the object with respect to the prototype
Appearance-based registration. The object g is affinely transformed to (locally) maximize its relative energy E(ga ) = VT F(ga ) under the learned appearance model [N , V]. Here, ga is the part of the object image reduced to Rp by the affine transformation a = [a11 , . . . , a23 ]: x = a11 x + a12 y + a13 ; y = a21 x + a22 y + a23 . The initial transformation is a pure translation with a11 = a22 = 1; a12 = a21 = 0, ensuring the most “energetic” overlap between the object and prototype. The energy for different translations (a13 , a23 ) of the object relative to the prototype is shown in Fig. 2(c); the chosen initial position (a∗13 , a∗23 ) in Fig. 3(a) maximizes this energy. Then the gradient search for the local energy maximum closest to the initialization selects the six parameters a; Fig. 3(b) shows the final transformation aligning the prototype contour to the reference kidney. It is clear from Fig. 3(b) that the global alignment is not enough to perform perfect alignment due to the local deformation 3.2
Local Deformation Model
In DCE-MRI sequences, the registration problem arises because of the patient and breathing movements. To solve this problem, we propose a new approach to handle the kidney motion. The proposed approach is based on deforming the segmented kidney over evolving closed equispaced contours (i.e. iso–contours) to closely match the prototype. The evolution of the iso-contours is guided by an exponential speed function in the directions minimizing distances between
240
A. El-Baz, G. Gimel’farb, and M.A. El-Ghar
corresponding pixel pairs on the iso-contours of both the objects to be registered. The normalized cross-correlation is used as image similarity measure which is insensitive to intensity changes (e.g. due to tissue motion in medical imagery and the contrast agent). Unlike free–form deformation approaches based on B-spline, our technique is less expensive computationally. The first step of the proposed registration approach is to use the fast marching level set method [10] to generate the distance map inside the kidney regions as shown in Fig. 4(a)–(b). The second step is to use this distance map to generate equal space separated contours (iso-contours) as shown in Fig. 4(c)–(d). Note that the number of iso-contours depend on the accuracy and the speed required by the user. The third step of the proposed approach is to use normalized cross correlation to find the correspondence between the iso-contours. Since we start with aligned images, we limit our searching space to a small window (e.g. 10×10) to improve the speed of the proposed approach. The final step is the evolution of the iso-contours; here, our goal is to deform the iso-contours in the first image (target image) to match the iso-contours in the second image (reference image). Before we discuss the details of the evolution algorithm, let’s define the following: – bhg1 = [phk : k = 1, . . . , K] – K control points on surface h on the reference data, pk = (xk , yk , zk ) forming a circularly connected chain of line segments (p1 , p2 ), . . . , (pK−1 , pK ), (pK , p1 ). – bγg2 = [pγn : n = 1, . . . , N ] – N control points on surface γ on the target data, pn = (xn , yn , zn ) forming a circularly connected chain of line segments (p1 , p2 ), . . . , (pN −1 , pN ), (pN , p1 ). – S(phk , pγn ) denotes the Euclidean distance between a point on surface h in image g1 and its corresponding point on surface γ in image g2 , – S(pγn , pγ−1 n ) denotes the Euclidean distance between a point on surface γ in image g1 and its nearest point on surface γ − 1 in image g1 – ν(.) is the propagation speed function . The evolution bτ → bτ +1 of the deformable boundary b in discrete time, τ = 0, 1, . . ., is specified by the system of difference equations pγn,τ +1 = pγn,τ + ν(pγn,τ )un,τ ; n = 1, . . . , N , where un,τ is the unit vector along the ray between two correspondant points. The propagation speed function is selected so as to satisfy the following conditions: ν(pγn,τ ) = 0 if S(phk , pγn,τ ) = 0, other γ γ+1 wise ν(pγn,τ ) = min S(phk , pγn,τ ), S(pγn,τ , pγ−1 n,τ ), S(pn,τ , pn,τ ) . The latter condition, known as the smoothness constraint, prevents the current point from cross-passing the closest neighbor surfaces. Note that the function ν(pγn,τ ) =
−1 + exp β(pγn,τ )S(phk , pγn,τ ) ; satisfies the above conditions, where β(pγn,τ ) is the propagation term such as, at each surface point β is calclating from the following equation: ln
β(pγn,τ )
γ γ γ−1 γ γ+1 min S(ph k ,pn,τ ),S(pn,τ ,pn,τ ),S(pn,τ ,pn,τ ) +1
= . S(ph ,pγ n,τ ) k Figure. 3(c) shows the result of alignment after applying the local deformation model.
New Motion Correction Models
(a)
(b)
241
(c)
Fig. 3. (a) Initialization, (b) global alignment, and (c) local alignment
(a)
(b)
(c)
(d)
Fig. 4. The distance map of two kidneys (a, b) and the iso–contours (c, d)
4
Results and Conclusion
The ultimate goal of the proposed algorithms is to successfully construct a renogram (mean intensity signal curves) from the DCE-MRI sequences, showing the behavior of the kidney as the contrast agent perfuses into the transplant. In acute rejection patients, the DCE-MRI images show a delayed perfusion pattern and a reduced cortical enhancement. We tested the above algorithms on 100 patients; seven of which are shown in Figure 5(a). The normal patient shows the expected abrupt increase to the higher signal intensities and the valley with a small slope. The acute rejection patients show a delay in reaching their peak signal intensities. From these observations, we have been able to conclude that the relative peak signal intensity, time to peak signal intensity, the slope between the peak and the first minimum, and the slope between the peak and the signal measured from the last image in the sequence are the major four features in the renograms of the segmented kidney for classification. To highlight the advantage of the proposed motion correction models we drew the perfusion curves of the segmented kidneys before applying the motion correction models as shown in Fig. 5(a). It is clear from Fig. 5(a) that the measured signals are so noisy which will affect on the values of the extracted features which will be used in the classification step. To distinguish between normal and acute rejection, we use Bayesian supervised classifier learning statistical characteristics from a training set for the normal and acute rejection. The density estimation required in the Bayes classifier is performed for each feature by using a linear combination of Gaussians (LCG) with positive and negative components. The parameters of the LCG components are estimated using a modified EM algorithm [1]. In our approach, we used 50% of the data for the training and the other 50% for testing. For testing data, the Bayes classifier succeeds to classify 47 out of 50 correctly (94%). For the training data the Bayes classifier classifies all of them correctly, so the overall accuracy of the proposed approach is 97%. The classification results before applying the
242
A. El-Baz, G. Gimel’farb, and M.A. El-Ghar
(a)
(b)
Fig. 5. (a) Perfusion curves before applying the motion correction models and (b) perfusion curves after applying the motion correction models. Note acute rejection is shown in red color and normal transplant shown in blue color.
motion correction models are as follows: 1) for testing data, the Bayes classifier succeeds to classify 29 out of 50 correctly (58%) and 2) for the training data the Bayes classifier succeeds to classify 34 out of 50 correctly (68%). These results highlight the advantages of our motion correction models. In this paper we presented a framework for the detection of acute renal rejection from DCE-MRI which includes global and local registration of the kidneys and Bayes classification. Our present implementation on Matlab using Intel dual core processor, 3GHz each, 8 GB memory takes about 3.49 min for DCE-MRI images of size 512×512 pixels. Our future work will include testing on more patients; the results of the proposed framework are promising and might replace the current nuclear imaging tests or the invasive biopsy techniques.
References 1. El-Baz, A., Yuksel, S., Shi, H., Farag, A., El-Ghar, M., Eldiasty, T., Ghoneim, M.: 2D and 3D shape based segmentation using deformable models. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 821–829. Springer, Heidelberg (2005) 2. U.S. Department of Health and Human Services. Annual report of the U.S. scientific registry of transplant recipients and the organ procurement and transplantation network: transplant data: 1990–1999. Bureau of Health Resources Department, Richmond, VA (2000) 3. Neimatallah, M., Dong, Q., Schoenberg, S., Cho, K., Prince, M.: Magnetic resonance imaging in renal transplantation. J. Magn. Reson. Imaging 10(3), 357–368 (1999) 4. Rigg, K.M.: Renal transplantation: current status, complications and prevention. J. Antimicrob. Chem. 36(suppl.), B51-B57 (1995) 5. Yang, D., et al.: USPIO-enhanced dynamic MRI: evaluation of normal and transplanted rat kidneys. Magn. Reson. in Medicine 46, 1152–1163 (2001) 6. Gerig, G., et al.: Semiautomated ROI analysis in dynamic MRI studies: Part I: image analysis tools for automatic correction of organ displacements. IEEE Trans. Image Processing 11(2), 221–232 (1992) 7. Boykov, Y., et al.: Segmentation of dynamic N-D data sets via graph cuts using Markov models. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, pp. 1058–1066. Springer, Heidelberg (2001)
New Motion Correction Models
243
8. Giele, E.: Computer methods for semi-automatic MR renogram determination. Ph.D. dissertation, Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven (2002) 9. Sun, Y., et al.: Integrated registration of dynamic renal perfusion MR images. In: Proc. of ICIP 2004, Singapore, October, October 24–27, pp. 1923–1926 (2004) 10. Sethian, J.A.: Fast marching level set method for monotonically advancing fronts. Proc. Nat. Acad. Sci. 93, 1591–1595 (1996)
Detecting Mechanical Abnormalities in Prostate Tissue Using FE-Based Image Registration Patrick Courtis1 and Abbas Samani1,2 1
Department of Electrical and Computer Engineering, 2 Department of Medical Biophysics, University of Western Ontario {pcourtis, asamani}@uwo.ca
Abstract. An image registration-based elastography algorithm is presented for assessing the stiffness of tissue regions inside the prostate for the purpose of detecting tumors. A 3D finite-element model of the prostate is built from ultrasound images and used to simulate the deformation of the prostate induced by a TRUS probe. To reconstruct the stiffness of tissues, their Young’s moduli are varied using Powell’s method so that the mutual information between a simulated and deformed image volume is maximized. The algorithm was validated using a gelatin prostate phantom embedded with a cylindrical inclusion that simulated a tumor. Results from the phantom study showed that the technique could detect the increased stiffness of the simulated tumor with a reasonable accuracy.
1 Introduction Early detection plays a key role in the prognosis of prostate cancer. A common clinical screening procedure for detecting prostate cancer is the Digital Rectal Exam (DRE). The purpose of the DRE is to detect changes in the stiffness of prostate tissue that may indicate the presence of tumors. Tumors originating in the posterior region of the prostate account for 70% of all diagnosed cases of prostate cancer and can be detected with the DRE depending on their size and location. The DRE is a qualitative method with low sensitivity, and tumors which are not directly adjacent to the prostatic capsule may not be palpable and must be detected using more sensitive means such as medical imaging analysis. Image guided detection of tumors relies on the assumption that tissue pathology associated with cancer will correspond to changes in the appearance of medical image data. In the case of ultrasound (US) imaging, tumors can be detected based on the relative acoustic properties of normal and malignant tissue: tumors will appear as isolated hyper- (brighter) or hypoechoic (darker) regions in the US image. Although US can be a valuable tool for locating tumors, its use in the detection of prostate cancer is not well defined. The problem facing US-based detection is the presence of image artifacts or non-malignant diseases like benign prostatic hyperplasia (BPH) that resemble cancerous tumors. In the past, transrectal ultrasound (TRUS) guided biopsy protocols that directly targeted hypo- and hyperechoic regions of prostate tissue were introduced but for the aforementioned reasons, most of these N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 244–251, 2007. © Springer-Verlag Berlin Heidelberg 2007
Detecting Mechanical Abnormalities in Prostate Tissue
245
biopsies were histologically benign (cancer-free) [1]. As a result, systematic prostate biopsy protocols are now implemented which sample the entire prostate gland. The number of samples in a systematic prostate biopsy can exceed twenty core samples [1]. Currently, the primary role of TRUS in the detection of prostate cancer is to guide biopsy needles into the prostate during the systematic biopsy: specifically, TRUS is used to effectively guide biopsy needles into the posterior region of the prostate, where 80% of prostate tumors originate [2]. Elastography is an imaging technique that quantifies changes in the mechanical properties of tissue associated with diseases such as prostate cancer [3]. This technique has proven to be very useful in detecting and locating tumors. Although elastography algorithms usually require pre-computed tissue displacement data [4], Miga et al demonstrated that elasography can be performed using FE-based image registration [5] where by tissue displacements and Young’s modulus are reconstructed simultaneously. In this paper, we describe an image registration based elastography technique for recovering the Young’s modulus of a visually suspicious region in a 3D TRUS volume of the prostate. Similar to elastography techniques of Samani et al [6] and Boctor et al [7] it is assumed that the Young’s modulus of each tissue type is constant throughout its volume. This assumption provides anatomical constraints which impose a discrete Young’s modulus distribution in the reconstruction. Through the use of this constraint, it is possible to employ a simple optimization procedure (i.e. Powell’s method) to perform the Young’s modulus reconstruction. This contrasts with previous unconstrained iterative methods, which employ nonlinear least squares algorithms which are generally ill-posed [4;5]. The technique was validated using a gelatin based prostate tissue mimicking phantom with a cylindrical inclusion. The inclusion is meant to simulate a non-palpable prostate tumor.
2 Methodology The technique requires two 3D TRUS volumes: a baseline volume, and a second volume in which the US probe is displaced vertically to deform the prostate. A 3D 20noded brick FE model of the prostate, tumor, rectum, TRUS probe, and surrounding tissue is then used to simulate the deformation induced by the probe. An algorithm for generating the FE model using iso-parametric 20-noded brick elements has been developed. This type of element tends to result in better performance and more accurate FE displacement solutions compared to FE models with a similar number of tetrahedral elements. This is highly advantageous for elastography applications since the displacements computed from the FE-model are being used to reconstruct the mechanical properties of the soft-tissue. Nodal displacements from the FE model are used to register the post-deformed and baseline image volumes. The mechanical properties of each tissue type in the FE model are then varied using Powell’s optimization method [8] so that the mutual information (MI) between the transverse and sagittal slices of the registered image volumes is maximized. 2.1 FE Model Generation The mesh generating procedure is based on the concept of transforming rectilinear grids into the shape of the prostate, tumor, and surrounding tissues. The procedure
246
P. Courtis and A. Samani
involves four major steps. The first three steps, which construct the prostate and tumor are illustrated in Fig. 1. The elements in the surrounding tissue region between the prostate and rectum are generated in the final fourth step.
Fig. 1. An illustration of the prostate mesh generating procedure. An ellipsoid is mapped onto the surface of the prostate using a TPS transformation (A). The internal vertices of the mesh are generated using TFI (B). A subsection of the mesh (black) is warped into the shape of the tumor using a CSRBF transformation.
The first step, adapted from the prostate boundary segmentation algorithm developed by Hu et al [9], uses six control points on the surface of the prostate. Four points are taken from a central transverse 2D slice (two points at the lateral extremes and two points at the posterior and anterior extremes). The other two points are taken from a central sagittal 2D slice at the base and apex of the prostate. The control points are used to generate an ellipsoid [10] that approximates the shape of the prostate. Nodes on the ellipsoid are generated by mapping the six sides of an N3 logical grid onto its surface. A thin-plate spline [11] (TPS) transformation warps these nodes so that the six ends of the semi-major axes of the ellipsoid coincide with the corresponding control points. The nodes inside the prostate are generated by mapping the nodes at the internal vertices of the rectilinear grid to a set of nodes distributed inside the surface of the warped ellipsoid (Fig. 1 B). Using trans-finite interpolation, the position of the nodes on the six surfaces of the prostate are used to compute the position of the nodes inside the prostate according to the 3D vector-valued bilinear blended mapping defined in Knupp et al [12]. One advantage of this approach is that element connections determination is straight forward as it is done using the nodes of the rectilinear grid. In step three, a tumor is formed inside the prostate mesh. In this research, tumors are modeled as cylinders embedded inside the prostate tissue. This is done by replacing a subzone (black area) with a cylindrical shaped grid as illustrated in Fig. 1 - C. To avoid excessive distortion of the mesh resulting from warping the subzone to the tumor shape, the nodes surrounding the tumor are then repositioned using compact support radial basis functions (CSRBF). With CSRBFs one can specify a distance r beyond which all the transformation parameters are zero. Here, r is taken as the shortest distance between the surface of the tumor and the surface of the prostate. This enables a smooth transformation of a group of elements into the shape of the tumor while maintaining the original boundary of the prostate mesh.
Detecting Mechanical Abnormalities in Prostate Tissue
247
After the elements that form the prostate and tumor have been generated, the elements that form the tissue surrounding the prostate and rectum are generated. First, a rectilinear grid x(i, j, k) is defined where i, j = 0, 1, … , N+2, k = 0, 1, … , 2N+2, and N is the dimensionality of the grid that was used to form the vertices of the prostate. The nodes on the surface of x are mapped onto the boundary of the external tissue region. The internal vertices x = (1 ≤ I ≤ N+1, 1 ≤ j≤ N+1, N-1 ≤ k ≤ 2N+1) are vertices of the prostate mesh that were generated in steps one through three. The surface of the vertices x = (1 ≤ I ≤ N+1, 1 ≤ j ≤ N+1, 1 ≤ k ≤ N+1) is mapped onto the surface of the rectum takes the shape of a cylinder. After the rectum is defined, a set of additional vertices are generated in between the prostate, rectum, and surrounding tissue. The positions of these new vertices are determined using a tri-linear interpolation calculation based on the position of eight nodes from the original grid surrounding each vertex. 2.2 Prostate Phantom Validation A prostate tissue mimicking phantom was constructed from a mixture of gelatin, water, n-propanol and formaldehyde [13]. The mechanical and acoustic scattering properties of the phantom were varied by altering the respective concentrations of gelatin. The relative mechanical properties of the prostate, tumor, and surrounding region within the phantom were adjusted to reflect the literature values [3]. To enhance image contrast, varying amounts of cellulose agent were added to the different parts of the phantom: 0.02 g/mL in the surrounding material and 0.01 g/mL in the prostate. No cellulose was added to the inclusion. The Young’s modulus of the gelatin based prostate, inclusion, and surrounding material were measured independently from uniaxial load test data on cylindrical gelatin samples [14]. The phantom was manufactured in a Plexiglas container where the front, left and right sides could be removed to facilitate axial deformation induced by the TRUS probe. A thin layer of acoustic damping rubber was placed on the Plexiglas above the prostate in order to identify the top of the container in the US image volumes. The phantom was manufactured with a cylindrical hole below the prostate, approximately the diameter of the TRUS probe, to simulate the rectum and allow for insertion of the probe. A diagram of the phantom assembly is shown in Fig. 2. B-mode US image volumes of the phantom assembly were acquired using a 3D TRUS system developed by Fenster et al [15]. The reconstructed image volumes have a pixel spacing of 0.21 mm in the coronal and axial plane and 0.20 mm in the sagittal plane. Deformed phantom images were acquired by rescanning the phantom after an upward displacement was applied to the TRUS probe. A schematic of the phantom and a photograph of the 3D TRUS system are shown in Fig 2. The baseline TRUS volume was acquired with the probe resting flush against the cylindrical rectum of the phantom, and a second volume was acquired after a vertical displacement was applied to the TRUS probe. The displacement of the TRUS probe was determined visually by manually aligning the echo signals on the upper surface of the phantom container in the axial image plane. A frictionless contact FE formulation that modeled the deformation induced by the vertical displacement of the TRUS probe was implemented with ABAQUS 6.6 [16] and used to perform an initial FE analysis in order to estimate the displacement boundary conditions that were induced by the probe along the surface of the rectum. This initial FE analysis assumes that the phantom mesh is composed of a single Young’s modulus. Subsequent FE
248
P. Courtis and A. Samani
calculations during the reconstruction algorithm used the displacements computed from this contact analysis as boundary conditions, eliminating the need to perform contact analysis throughout the optimization process. Using Powell’s methods, the mechanical properties of the prostate and inclusion were varied and new displacement from the FE-model were computed, so that the MI between the two orthogonal slices in the registered image volumes was maximized. The Young’s modulus of the surrounding tissue is assigned its experimental value. The MI was computed using two windowed and 3x3 median filtered regions of interest (ROIs) surrounding the prostate on axial and sagittal slices through the centre of the phantom. Median filtering the US image is used to reduce the number of local maxima in the parameter search space. The two lateral extremes of the prostate were not included in the axial ROI due to the concern that poor lateral resolution at these points in the image could have an adverse affect on the MI calculation.
Fig. 2. A schematic diagram of the prostate phantom assembly (A) and a photograph of the TRUS apparatus (B)
3 Results and Discussion Fig. 3 shows the results of FE analysis within the axial and sagittal planes. The simulated post-deformed images that maximized the mutual information metric are shown in Fig. 4. The subtracted 8-bit images (Fig. 4 A3, B3) were windowed and leveled so that the minimum and maximum pixel value corresponds to 0 and 255 respectively. The reconstructed Young’s modulus values are presented in Table 1. The subtracted images in Fig. 4 demonstrate that the optimal Young’s modulus values result in well registered image volumes. By inspection of Table 1, the reconstruction determined that the cylindrical tumor is stiffer than the surrounding phantom tissue and that the reconstructed values are reasonably close to the Young’s modulus values calculated from the uniaxial load test data. The reconstructed values of E2, E3 differ by 23% and 19% from their respective independently measured values. The parameter of most interest: the relative stiffness of the simulated tumor to the surrounding prostate mimicking tissue (E3 / E2), was reconstructed with reasonable accuracy, differing by 5.6% from its experimentally determined value.
Detecting Mechanical Abnormalities in Prostate Tissue
249
Fig. 3. Tissue maximum principal strain induced by the TRUS probe in the axial (A) and sagittal (B) planes obtained from FE analysis Table 1. A comparison of the Young’s modulus values reconstructed by the image registration algorithm. E1, E2, E3, are the Young’s modulus values of the surrounding, prostate, and cancer (tumor) mimicking tissue, respectively. E1 [kPa]
E2 [kPa]
E3 [kPa]
E3 / E1
E3 / E2
E2 / E1
Experimental
8.5
12.9
22.6
2.7
1.8
1.5
Reconstructed
8.5
9.9
18.4
2.2
1.9
1.2
A1
B1
A3
A2
B2
B3
Fig. 4. A1 and B1 are simulated and A2, B2 are target- axial and sagittal post-deformed images respectively. Images A3 and B3 are the windowed and leveled subtracted images of each pair within the windowed area indicated by the white box in B1 and B2.
250
P. Courtis and A. Samani
One major source of error is the measurement of the boundary condition induced by the TRUS probe. In this research it was assumed that only translational displacements in the axial plane were applied to the probe, however, observing the tumor in the subtracted image in Fig. 4 B2, it appears that there is some bending of the apparatus used to hold the TRUS probe while it is pressed against the cylindrical rectum of the phantom. Bending of the apparatus could be observed while the phantom was being deformed and the probe had to be leveled manually using a leveling tool before each US volume was acquired. An additional source of error is the FE calculation. It is possible that the relative number of elements in each of the three material domains had an effect on the reconstructed values. Segmentation error along the surface of the prostate is a third and additional source of error since the warped ellipsoid used to approximate the boundary of the prostate may not pass exactly over the surface of the prostate.
4 Conclusions and Future Work An image registration based-elastography algorithm for evaluating the mechanical properties of a visually suspicious region in the prostate was designed and validated using a gelatin based tissue phantom. Unlike traditional elastography that relies on displacement data as input, this technique used the MIE paradigm to formulate elastography as an image registration problem. The relative value of the reconstructed Young’s modulus of the tumor mimicking inclusion with respect to the prostate phantom tissue (E3/E2) was reconstructed with an error of ~6%. The results are encouraging and show good potential for using this technique in prostate cancer detection. The error in the absolute values of E2 and E3 with respect to the surrounding tissue (~23%) is likely caused by mis-registration of the US volumes caused by rotational motion of the apparatus that held the TRUS probe. Future work will involve either designing an apparatus that can accurately measurethe transformation of the TRUS probe when it is used to deform the prostate, or, incorporating the transformation parameters of probe as additional variables in the Powell’s optimization procedure. Measuring the probe transformation parameters directly is preferable since adding additional parameters to the optimization will only increase the time required to perform the Young’s modulus reconstruction. Extending the meshing algorithm to include accurate segmentation of real prostate boundaries and tumors will also be pursued.
References [1] Raja, J., Ramachandran, N., Munneke, G., Patel, U.: Current status of transrectal ultrasound-guided prostate biopsy in the diagnosis of prostate cancer. Clin. Radiol. 61(2), 142–153 (2006) [2] McNeal, J.E., Redwine, E.A., Freiha, F.S., Stamey, T.A.: Zonal distribution of prostatic adenocarcinoma. Correlation with histologic pattern and direction of spread. Am. J. Surg. Pathol. 12(12), 897–906 (1988)
Detecting Mechanical Abnormalities in Prostate Tissue
251
[3] Krouskop, T.A., Wheeler, T.M., Kallel, F., Garra, B.S., Hall, T.: Elastic moduli of breast and prostate tissues under compression. Ultrason. Imaging 20(4), 260–274 (1998) [4] Kallel, F., Bertrand, M.: Tissue elasticity reconstruction using linear perturbation method. Medical Imaging, IEEE Transactions on 15(3), 299–313 (1996) [5] Washington, C.W., Miga, M.I.: Modality independent elastography (MIE): a new approach to elasticity imaging. IEEE Trans. Med Imaging 23(9), 1117–1129 (2004) [6] Samani, A., Bishop, J., Plewes, D.B.: A constrained modulus reconstruction technique for breast cancer assessment. IEEE Trans. Med Imaging 20(9), 877–885 (2001) [7] Boctor, E., de Oliveira, M., Choti, M., Ghanem, R., Taylor, R., Hager, G., Fichtinger, G.: Ultrasound monitoring of tissue ablation via deformation model and shape priors. Med Image Comput. Comput. Assist. Interv. Int. Conf. Med Image Comput. Comput. Assist. Interv. 9(2), 405–412 (2006) [8] Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn., pp. 36–41. Press Syndicate of the University of Cambridge, Cambridge, London, England (2002) [9] Hu, N., Downey, D., Fenster, A., Ladak, H.M.: Prostate boundary segmentation from 3D ultrasound images. Med. Phys. 30(7), 1648–1659 (2003) [10] Tietze, H.: Famous Problems of Mathematics: Solved and Unsolved Mathematics Problems from Antiquity to Modern Times, p. 28. pp. 40–41. Graylock Press, New York (1965) [11] Bookstein, F.L.: Principal Warps: Thin Plate Spline and the Decomposition of Deformations. IEEE Transactions on Pattern Analysis 11(6), 567–585 (1989) [12] Knupp, P.M., Steinberg, S.: Fundamentals of Grid Generation. CRC Press, Boca Raton (1994) [13] Hall, T.J., Bilgen, M., Insana, M.F., Krouskop, T.A.: Phantom materials for elastography. Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on 44(6), 1355– 1365 (1997) [14] Samani, A., Bishop, J., Luginbuhl, C., Plewes, D.B.: Measuring the elastic modulus of ex vivo small tissue samples. Phys. Med. Biol. 48(14), 2183–2198 (2003) [15] Tong, S., Downey, D.B., Cardinal, H.N., Fenster, A.: A three-dimensional ultrasound probe imaging system. Ultrasound in Medicine and Biology 22, 735–746 (1996) [16] Hibbit, K.S.: ABAQUS: Theory Manual. Pawtucket, RI (1998)
Real-Time Fusion of Ultrasound and Gamma Probe for Navigated Localization of Liver Metastases Thomas Wendler1 , Marco Feuerstein1 , Joerg Traub1 , Tobias Lasser1 , Jakob Vogel1 , Farhad Daghighian2 , Sibylle I. Ziegler3 , and Nassir Navab1 1
3
Computer Aided Medical Procedures (CAMP), TUM, Munich, Germany 2 IntraMedical Imaging LLC, Los Angeles, California, USA Nuclear Medicine Department, Klinikum rechts der Isar, TUM, Munich Germany
Abstract. Liver metastases are an advanced stage of several types of cancer, usually treated with surgery. Intra-operative localization of these lesions is currently facilitated by intra-operative ultrasound (IOUS) and palpation, yielding a high rate of false positives due to benign abnormal regions. In this paper we present the integration of functional nuclear information from a gamma probe with IOUS, to provide a synchronized, real-time visualization that facilitates the detection of active metastases intra-operatively. We evaluate the system in an ex-vivo setup employing a group of physicians and medical technicians and show that the addition of functional imaging improves the accuracy of localizing and identifying malignant and benign lesions significantly. Furthermore we are able to demonstrate that the inclusion of an advanced, augmented visualization provides more reliability and confidence on classifying these lesions in the presented evaluation setup.
1
Motivation
Liver metastases are a common consequence of cancer cells spreading from primary tumors. Surgical resection is the indicated therapy if possible, as it results in a cure with high probability [1]. To facilitate extraction, intra-operative localization of the tumorous regions is achieved by a combination of palpation and intra-operative ultrasound (IOUS). This technique is considered the gold standard as it has been in successful clinical practice for years already with a proved high sensitivity [2, 3]. However, in the presence of benign abnormal structures, a considerable false-positive detection rate still remains. These abnormalities may be cysts, hemangiomas, scar tissue, or even metastases, which were previously diagnosed by e.g. PET/CT and treated successfully with chemotherapy or other neoadjuvant therapies [3]. This problem, although reduced, is still present when using contrast-enhanced ultrasound [2], which has a promising potential for better image quality, but still remains a mostly anatomical imaging modality. To reduce the detection rate of false-positives, the integration of a functional modality to complement the standard localization technique is a promising approach. A prime candidate for this is nuclear imaging, as there are tracers with N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 252–260, 2007. c Springer-Verlag Berlin Heidelberg 2007
Real-Time Fusion of Ultrasound and Gamma Probe
253
high specific uptake in liver metastases [4], and in general this modality features low false-positive detection rates [5] in pre-operative diagnostics. Herein we report on combining an intra-operative nuclear probe (a gamma probe) with ultrasound (IOUS) for accurate localization of liver metastases. We believe that this combination will result in an excellent compensation of the weak sides of both technologies, achieving accurate localization and accurate classification. In order to take full advantage of this combination the relative position and orientation of both ultrasound probe and nuclear probe need to be known to allow a merged visualization of both functional gamma probe information and anatomical ultrasound images and thus a complete utilization of both technologies.
2
System Setup
The components of a combined IOUS / nuclear probe system for the detection of liver metastases, their calibration and a visualization to intuitively guide surgeons are reviewed briefly in this section.
Fig. 1. The system setup of the navigation system consists of a US system (A), a gamma probe system (B), and a tracking system (C). The laparoscopic camera (E), the US probe (F), and the gamma probe (G) are extended by targets for optical tracking. Radioactive and non-radioactive nodules have been implanted in the liver (H). The navigation system provides different modalities of visualization (D).
2.1
System Components
Tracking System. The determination of position and orientation of ultrasound and nuclear probes in a common coordinate system can be achieved by an optical tracking system, as in our previous work, where we track a beta-probe optically and generate the radioactivity surface distribution of the resection borders of a tumor [6]. A similar approach is used in this work. Both nuclear and IOUS probes are extended by infrared tracking targets (figure 1). The chosen optical tracking system (A.R.T. GmbH, Weilheim, Germany) consists of 4 ARTtrack2
254
T. Wendler et al.
infrared cameras and the DTrack software running on a desktop PC, which sends tracking data via Ethernet to the central workstation. A typical setup with four ART cameras has a root mean square (rms) error of 0.4 [mm] for the target position and 0.002 [rad] for the target orientation and a maximum error of 1.4 [mm] for positional and 0.007 [rad] for orientation measurements. Ultrasound. In order to obtain 2D anatomical images of the liver in real-time, an ultrasound probe of 3.5 [M Hz] is connected to a SonolineOmnia system by Siemens Medical Solutions (Mountain View, California, USA). Its B-scan images are captured by a frame-grabber card in the central workstation. Gamma Probe. The equipment to detect the radioactive tracers is a standard gamma probe attached to a NodeSeeker control unit, both by IntraMedical Imaging LLC (Los Angeles, California). The gamma probe is equipped with a custom-made, external collimator to reduce its field of view (FOV) to a cone of 0.1 [rad] opening. This value was chosen to allow a small FOV while still preserving a reasonable sensitivity to the chosen radiation. The measured sensitivity is 0.75 [cps/kBq] for a point source of Tc-99m located on the axis of the probe at a distance of 5 [cm]. The energy window was set to 20% centered at 140 [keV ] ideally for Tc-99m. 2.2
System Calibration
Ultrasound. For US calibration, i.e. the determination of the US plane’s pixel size and the rigid transformation from US plane to the tracking target, we adopted the single-wall calibration method of Prager et al. [7], using a nylon membrane as proposed by Langø [8]. Several positions and orientations of the probe’s tracking target as well as corresponding images of the membrane inside a water bath are acquired synchronously. The lines corresponding to the planar membrane are automatically segmented and used for the computation of all calibration parameters. As suggested by Treece et al. [9] we determine the temporal offset between US acquisition and tracking for better data synchronization. Furthermore, we adopted their calibration protocol to ensure numeric stability for all degrees of freedom of the transformation. To determine the US calibration accuracy, a tracked pointer with tip coordinates given in the tracking coordinate system was submerged into the water bath. Its tip was segmented manually in 5 regions of the US plane, which was repeated for 4 poses of the probe differing from the ones used during calibration. The pointer tip’s coordinates were transformed into the US plane coordinate system and compared to the segmented tip coordinates (scaled to millimeters). An rms error of 1.17 [mm] ± 0.40 [mm] and a maximum error of 1.58 [mm] were obtained. Gamma Probe. The calibration of the nuclear probe consists of two steps. First, the rigid transformation from the tracking target to the sensor (scintillator crystal) is determined. As the probe is symmetric along the central axis, it can be exploited by designing a tracking target along this axis (figure 1). Thus the
Real-Time Fusion of Ultrasound and Gamma Probe
(a) Gamma probe calibration.
255
(b) Preparation of phantom.
Fig. 2. (a) Schematic representation of the FOV of the gamma probe. The background color represents the measured sensitivity of the probe at the different positions (the darker the higher the sensitivity). The red and blue lines define the angle of aperture and the height of the cone respectively. (b) Liver used for experiments. The wax spheres were implanted before the experiment, some of them were radioactive. During the experiment test persons mark the detected hot nodules in the liver with needles.
problem reduces to determine the distance between sensor and origin of the tracking target, which can be fixed by construction. In a second step, the FOV of the probe is measured, which can be approximated by a cone. The main calibration task is to determine the height of the cone and its angle of aperture. Therefor, a point source of radioactivity is placed at known relative distances and angles from the sensor and the mean count rate is stored. Hence both the maximum distance and maximum angle to detect a significant count rate are determined (figure 2(a)). The criterion chosen was to have a count rate at least 2.5 times higher than the local background, which is commonly used to identify lesions in nuclear medicine [10]. 2.3
Visualization
The visualization comprises an augmentation of nuclear information on the IOUS image (figure 3(a)) as well as a 3D visualization of the relative position and orientation of both probes in an augmented reality mode (figure 3(b)). In both cases a proper synchronization is needed. For this purpose we have used and extended a framework capable of integrating and synchronizing several data streams for augmented reality applications [11]. For the overlay of nuclear information on the B-mode IOUS plane, the intersection of the plane with the FOV of the nuclear probe needs to be calculated. This can be achieved by intersecting the quadric that represents the FOV of the nuclear probe (in this case a cone) with the plane of the IOUS, resulting in a conic section. Its parameters are calculated, and whenever the conic section is closed (i.e. its eccentricity < 1), it is augmented on the IOUS image (figure 3(a)). This is done in real-time using the tracking system to acquire the pose of both, the axis of the nuclear probe and the plane of the IOUS probe. The color of the ellipse of intersection is dependent on the nuclear readings. It is visualized
256
T. Wendler et al.
(a) Augmented IOUS plane.
(b) Augmented reality view.
Fig. 3. Different visualization modes of the navigation
in red when the reading is above 2.5 times the background, otherwise it is drawn in green. The 3D visualization is achieved by projecting the IOUS plane and the cone that represents the FOV of the probe onto the image of an augmented camera. For this, a tracked calibrated camera (e.g. a laparoscope camera) is used (figure 3(b)). The color of the cone here is also encoded in red and green dependent on the nuclear activity, as described above for the conic section.
3
Experiments
For the experimental evaluation of the system a group of nuclear medicine doctors and technicians (4 each) was asked to find abnormalities in several cow livers using IOUS either exclusively or in combination with the nuclear probe. This setup simulates a clinical open liver metastasis resection. Radioactive and non-radioactive wax spheres [12] were implanted into the livers in advance, simulating malignant and benign metastases, respectively. The diameters of the wax ’tumors’ varied from 7 to 9 [mm]. They were implanted from behind, so no scars were visible from top (figure 2(b)). The concentration of the radioisotope T c99m used to label ’malignant tumors’ was 50 [kBq/ml]. In each liver 10 ’lesions’ were implanted, half of them being radioactive. Once implanted with the wax ’lesions’, the livers were scanned in a PET/CT device (Biograph16 PET/CT, Siemens Medical Solutions, Erlangen, Germany) to provide ’pre-operative’ diagnostic data. Before giving form to the wax sphere F -18 at a concentration of 20 [kBq/ml] was added to the radioactive wax mixture in order to enable detection by PET. Three variants of lesion localization and classification were compared: A) Use of pre-operative PET/CT images and IOUS. B) Use of pre-operative PET/CT images and IOUS aided by gamma probe readings (no visualization).
Real-Time Fusion of Ultrasound and Gamma Probe
257
C) Use of pre-operative PET/CT images and IOUS aided by gamma probe readings employing the system described in section 2 for navigation of the probes and visualization. The first method represents the current standard in clinical use, while the second and third method add a gamma probe as a second modality to improve accuracy and classification results. Method C also adds navigation and visualization to guide the test person and further improve the quality of the results. After explaining how the modalities to be evaluated work and what was expected from them, the test persons were allowed to study the ’pre-operative’ diagnostic PET/CT images. Then they were asked to localize the metastases in 3 differently marked livers, using one of the variants A, B or C for each liver respectively. Palpation and further evaluation of the preoperative images during the procedure was allowed in all variants. The parameters observed were the total amount of time needed to localize the metastases and the success of the localization in terms of positional accuracy, detection of abnormalities, and their characterization. Furthermore a questionnaire was filled out by each subject.
4
Results
On the one hand, a significant difference in the performance of variant A with respect to the gamma-aided variants B and C is shown (cf. table 1). On the other hand, the difference between the latter is not statistically relevant. Both gamma-aided variants achieve high sensitivity and specificity values (97.5% to 100% both), whereas variant A presents significant amounts of false-positives and false-negatives (sensitivity 80.5%, specificity 82.1%). This result holds for doctors and technicians, the latter achieving slightly better results, which is probably due to the lack of expertise with the devices and thus a more thorough use of them (mirrored in time taken, see below). Table 1. Performance evaluation (evaluated lesions) A B C True False True False True False Positive 33 7 39 0 39 1 8 40 1 39 1 Negative 32
When analyzing the time needed for the task, the results show a clear trend: The non-integrated gamma-aided variant B required 30 ± 20% more time than the pure US variant A (13.6 ± 4.1 [min]), whereas the navigated variant took 70 ± 30% longer. Although both novel variants perform equally well statistically, the qualitative evaluation shows that the inclusion of navigation made the subjects feel more confident about their results (3.0 ± 0.7 vs. 4.0 ± 0.7 and 4.4 ± 0.5 for variants A,
258
T. Wendler et al.
B, and C, respectively, on a scale from 1 to 5 for question ’Do you feel confident about the method?’; similar results for question ’Do you feel confident about the technology?’). These results were equally distributed for physicians and technical personnel. Effects of a learning curve were not evaluated and are part of current research. A common opinion expressed by test persons was that without the aid of PET/CT diagnostic data both non-navigated variants A and B would not allow precise localization and classification, in contrast to the navigated one (C). This is of high interest, as for typical clinical cases the PET/CT images are often not recent but up to several weeks old. During this interval, anatomy may show big changes, especially if the metastases undergo treatment. Thus a more realistic test setup should consider changes in the malignancy of the metastases (i.e. the radioactivity concentration) and anatomical changes like the appearance of new disease foci. In that sense the performance for variants A and B has to be considered as the best case possible.
5
Discussion
Standard localization of liver metastases does not use real-time functional information during the procedure (beyond palpation). Including intra-operative nuclear probes into the procedure would slightly change the current workflow, increase costs (radionuclides), and introduce a small dose of radioactivity into the patient (< 1 [mSv]). However, these drawbacks should be compensated by the reduction of false-positives, as resection poses a bigger burden to patients. In this implementation low-energy cancer tracers [3,5] are assumed since they require small and thus light collimators (for energies below ≈ 250 [keV ] only a couple of millimeters of lead are sufficient for shielding). Unfortunately this leaves out the possibility of using the system with several successful cancer tracers like F -18-FDG, due to their high-energy nature (energy above 400 [keV ], where some centimeters of lead are needed for effective shielding). However, new electronically collimated probes [13] are enabling the construction of light and handy devices with a small field of view for high-energy gamma detection, which will consequently allow the use of F -18-FDG and others in the near future. The choice of patients that would qualify for this treatment is also an issue. Our current clinical studies show, that patients that present hot spots in preoperative images do also present intra-operatively. This is not valid viceversa, which would enable the use of the system in more patients. However, better selection criteria are yet not available. Rigid mechanical coupling of the IOUS and gamma probes might serve as a simpler solution for the determination of the relative pose of both probes. The tracked variant however is more flexible and easy to handle, as it allows the surgeon to independently position the probes as desired and to have closer access to the patient’s anatomy, which improves the quality of both ultrasound images and nuclear readings. Furthermore, as a next development step the proposed
Real-Time Fusion of Ultrasound and Gamma Probe
259
system will also be extended to laparoscopic probes, where mechanical coupling of both laparoscopic probes is almost impossible due to space restrictions. The superior performance of the gamma-aided methods over the gold standard in this work promises further developments and suggests the evaluation of these in animal trials. The reliance of non-navigated approaches on the PET/CT images highlights a strength of the navigated approach using both modalities, namely robustness and flexibility in cases where pre-operative diagnostic images are outdated or not easily accessible, as it is often the case in the clinical setting. Moreover, due to the real-time data acquisition and visualization, dynamic changes are displayed immediately, negating the impact of deformations. Finally, the proposed system can easily be extended to also display in the visualization therapeutic instruments such as biopsy needles, RF ablation devices and others. By including this feature the system can become a therapeutic tool.
6
Conclusions
A combination of navigation IOUS imaging, and nuclear labeling and detection is feasible. It will certainly be a strong triplet in the future of localization of liver metastases. Furthermore it may serve as an example toward intra-operative realtime navigation combining both functional and anatomical imaging and thus toward a therapy, in which all available information is used intelligently for the benefit of the patient.
References 1. Fong, Y., Fortner, J., Sun, R.L., Brennan, M.F., Blumgart, L.H.: Clinical score for predicting recurrence after hepatic resection for metastatic colorectal cancer: analysis of 1001 consecutive cases. Ann. Surg. 230(3), 309–321 (1999) 2. Konopke, R., Kersting, S., Bergert, H., Bloomenthal, A., Gastmeier, J., Saeger, H.D., Bunk, A.: Contrast-enhanced ultrasonography to detect liver metastases: A prospective trial to compare transcutaneous unenhanced and contrast-enhanced ultrasonography in patients undergoing laparotomy. Int. J. Colorectal Dis. 22(2), 201–207 (2007) 3. Robinson, P.J.: Imaging liver metastases: current limitations and future prospects. Br. J. Radiol. 73(867), 234–241 (2000) 4. Froehlich, A., Diederichs, C.G., Staib, L., Vogel, J., Beger, H.G., Reske, S.N.: Detection of liver metastases from pancreatic cancer using FDG PET. J. Nucl. Med. 40(2), 250–255 (1999) 5. Schillaci, O., Spanu, A., Scopinaro, F., Falchi, A., Danieli, R., Marongiu, P., Pisu, N., Madeddu, G., Fave, G.D., Madeddu, G.: Somatostatin receptor scintigraphy in liver metastasis detection from gastroenteropancreatic neuroendocrine tumors. J. Nucl. Med. 44(3), 359–368 (2003) 6. Wendler, T., Traub, J., Ziegler, S., Navab, N.: Navigated three dimensional beta probe for optimal cancer resection. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 561–569. Springer, Heidelberg (2006) 7. Prager, R., Rohling, R., Gee, A., Berman, L.: Rapid calibration for 3-d freehand ultrasound. Ultrasound in Medicine and Biology 24(6), 855–869 (1998)
260
T. Wendler et al.
8. Lango, T.: Ultrasound Guided Surgery: Image Processing and Navigation. PhD thesis, Norwegian University of Science and Technology (2000) 9. Treece, G.M., Gee, A.H., Prager, R.W., Cash, C.J.C., Berman, L.H.: Highdefinition freehand 3-d ultrasound. Ultrasound in Medicine and Biology 29(4), 529–546 (2003) 10. Shreve, P.D., Anzai, Y., Wahl, R.L.: Pitfalls in oncologic diagnosis with FDG PET imaging: physiologic and benign variants. Radiographics 19(1), 61–77 (1999) 11. Sielhorst, T., Feuerstein, M., Traub, J., Kutter, O., Navab, N.: Campar: A software framework guaranteeing quality for medical augmented reality. International Journal of Computer Assisted Radiology and Surgery 1(suppl. 1), 29–30 (2006) 12. Bazanez-Borgert, M., Bundschuh, R.A., Herz, M., Martinez, M.J., Schwaiger, M., Ziegler, S.I.: Radioactive spheres without inactive wall for lesion simulation in pet. Z. Med. Phys. (in press) 13. Meller, B., Sommer, K., Gerl, J., von Hof, K., Surowiec, A., Richter, E., Wollenberg, B., Baehre, M.: High energy probe for detecting lymph node metastases with 18FFDG in patients with head and neck cancer. Nuklearmedizin 45(4), 153–159 (2006)
Fast and Robust Analysis of Dynamic Contrast Enhanced MRI Datasets Olga Kubassova1, Mikael Boesen2 , Roger D. Boyle1 , Marco A. Cimmino3 , Karl E. Jensen4 , Henning Bliddal2 , and Alexandra Radjenovic5 1
School of Computing, University of Leeds, UK
[email protected] 2 The Parker Institute Frederiksberg Hospital, Frederiksberg, Denmark
[email protected],
[email protected] 3 University of Genoa, Genoa, Italy
[email protected] 4 Rigshospitalet, Department of Radiology, MRI division, Copenhagen, Denmark 5 Academic Unit of Medical Physics, University of Leeds, Leeds General Infirmary, Leeds, UK
[email protected]
Abstract. A fully automated method for quantitative analysis of dynamic contrast-enhanced MRI data acquired with low and high field scanners, using spin echo and gradient echo sequences, depicting various joints is presented. The method incorporates efficient pre-processing techniques and a robust algorithm for quantitative assessment of dynamic signal intensity vs. time curves. It provides differentiated information to the reader regarding areas with the most active perfusion and permits depiction of different disease activity in separate compartments of a joint. Additionally, it provides information on the speed of contrast agent uptake by various tissues. The method delivers objective and easily reproducible results, which have been favourably viewed by a number of medical experts.
1
Introduction
Rheumatoid arthritis (RA) is an inflammatory disease associated with pathological alteration of microcirculation within the synovial lining of the diarthroidal joints. It is an extremely painful condition that might result in severe disability. More than 1 per cent of the adult population suffer from RA [1] corresponding to several million people in the USA and about 0.5 million in the UK. Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) is a technique that provides information about tissue vascularity, perfusion and capillary permeability [2,3] and has evolved as an important method for evaluating various diseases of the musculoskeletal system. Temporal changes of DCE-MRI signal intensity during and immediately after a bolus injection of a contrast agent reflect underlying changes in local concentration of the contrast agent, which are proportional to the extent of tissues’ inflammation. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 261–269, 2007. c Springer-Verlag Berlin Heidelberg 2007
262
O. Kubassova et al.
Quantitative analysis of DCE-MRI data is needed for disease diagnosis and after treatment progress evaluation. The analysis can be performed using pharmacokinetic [4] or heuristic methods [5,6]. The later quantify disease progression in terms of heuristic parameters, such as maximum rate of enhancement (M E), initial rate of enhancement (IRE), and time of onset of enhancement (Tonset ). These parameters are extracted by examining individual signal intensity vs. time curves derived from user defined regions of interest (ROI) or on a voxel-by-voxel basis. Currently, dynamic curves are derived from an approximately 2-3mm2 ROI positioned in the area of maximal visual enhancement [7]. Measurements of IRE and M E contain both spatial and temporal information making the results vulnerable to the size and position of the ROI [8]. Voxel-by-voxel analysis does not depend on ROI position, and therefore objective and suitable for automation; however such methods [9] do not fully utilise available information and cannot cope with the high degree of noise in dynamic frames. In this paper we propose a method for automated and efficient extraction of heuristics from signal intensity vs. time curves, making their estimation robust to subjective opinion of the operator and noise effects. This method has been tested on a number of DCE-MRI studies acquired by high and low field scanners and has shown robust and efficient performance.
2 2.1
Materials and Methods Patients
The following abbreviations will be used: TR – repetition time; TE – echo time; FOV – field of view; GRE – gradient echo sequence; SE – spin echo sequence; TS – total scanning time after contrast; Gd-DTPA – gadolinium diethylene triamine pentacetic acid. High field MRI: 10 patients with active RA in the metacarpophalangeal joints (MCPJs) were avaluated. Data was acquired in axial direction (Fig. 1,left) with a high field 1.5T MRI scanner (Gyroscan ACS NT, Phillips Medical Systems, Best, The Netherlands), using T1 weighted spoiled GRE sequence, TR/TE: 14/3.8; FOV/imaging matrix: 100 × 200mm / 128 × 256, slice thickness 3mm. Dynamic sequences of 20 GRE images pre-positioned in 6 planes were acquired every 7.1 s., TS = 142s. Patients were positioned prone, with arm extended in front of the head and a linear circular 11cm diameter surface coil placed on the dorsum of the hand. Low field MRI: 23 patients with active RA, 4 controls, and 1 patient with no RA but occult wrist pain were scanned with 0.2T musculoskeletal dedicated extremity scanner (E-scan, Esaote Biomedica, Genoa, Italy) (Fig. 1, mid., right). The patients were examined in supine position with the hand along the side of the body. Following the Gd-DTPA injection (0.2mmol/kg of body weight), the dynamic sequence was performed resulting in 22-30 consecutive fast SE (TR/TE 100/16, FOV/imaging matrix 150×150/ 160 ×128), or GRE
Fast and Robust Analysis of Dynamic Contrast Enhanced MRI Datasets
263
Fig. 1. From the left: post-contrast MR image of MCPJs (high field scanner), postcontrast MR images acquired by low field scanner from the wrist in axial through the first carpal row (SE sequence) and coronal direction through the wrist joint (GRE sequence)
images (TR/TE 60/6, FOV/imaging matrix 160×160mm/ 256×128) in three pre-positioned planes every 10 - 15s., slice thickness 4mm in the coronal plane and 5mm in the axial plane; TS = 300s. 2.2
Quantitative Data Analysis
The behaviour of signal intensity vs. time curves may be explained by underlying phases of the data acquisition. Starting from a baseline, the perfused tissues absorb the contrast agent, and their intensity climbs up (wash-in phase); it usually increases up to a certain point and then exhibits a plateau (of variable width) followed by wash-out phase (gradual signal intensity decrease). We propose to use this knowledge of the underlying temporal pattern of the contrast agent uptake to classify and approximate signal intensity vs. time curves ˆ as an aid to noise reduction. We imply linear normalised over a baseline (I), piece-wise approximation; based on the phases of data acquisition observed for DCE-MRI studies involved in RA assessment, we distinguish 4 models (M0 − M3 ). M0 – negligible enhancement. Some tissues within cortical and trabecular bone, inactive joints, skin and disease unaffected areas do not absorb Gd-DTPA and are not expected to show intensity enhancement in the later frames of temporal slices. M1 – base/wash-in. There is often a proportion of curves in which by the end of the scanning procedure the maximal intensity has not been reached, indicating constant leakage into locally available extra-cellular space. The GdDPTA absorption and signal intensity vs. time curves enhancing continue after the scanning has been completed. M2 – base/wash-in/plateau. Full absorption of the Gd-DTPA by the tissues. M3 – base/wash-in/plateau/wash-out. The wash-out phase is observed at the end of the scanning procedure. To decide on the ‘best’ model for a particular voxel, we firstly estimate a noise distribution in the temporal slice (in-slice noise distribution) by approximating
264
O. Kubassova et al.
the first 3 values of Iˆ (unaffected by any enhancement) and signals corresponding to voxels within the bone interiors by a constant (the local signal mean), with variations being explicit noise measurements. We have illustrated that aggregation of these two noise distributions can be taken as an in-slice noise model [10]. Then, individual Iˆ are approximated by each of the M0 − M3 shapes in a least-squares sense; each such ‘fit’ then implies a number of noise measurements, that are compared using the Kolmogorov-Smirnov statistic to the in-slice noise distribution. The model in which we have the highest confidence (according to the statistical test) is chosen. Note that we are interested in matching noise distributions and not minimising noise observation; the latter would always preclude the simpler models such as M0 − M2 in favour of M3 . Iˆ derived from the temporal slices acquired with the low field scanner and corresponding models are shown in Fig. 2.
Iˆ
M E, Tonset , and IRE are not defined
M1
Iˆ
M0
M2
Iˆ
ξ 1
ξ
ξ
1
1
T 1
1
Tonset
M3
ME
ME
ME
Iˆ
1
T 1
Tonset
T 1
Tonset
T
Fig. 2. Estimation of heuristics for each approximation model: M0 –M3 . M E has not been reached for model M1 . ξ is a slope of the Iˆ that be taken as IRE.
When each voxel is approximated by the ‘best’ model, we estimate heuristics from the approximations rather than raw data (as shown in Fig. 2) and plot parametric maps (PM) that depict activation events. PMs of M E constructed for temporal slices corresponding to images in Fig. 1 are shown in Fig. 4, each PM is accompanied by the Gd-DTPA uptake map (each voxel is colour-coded according to the model it has assumed: M0 -no colour, M1 -red, M2 -green, M3 -blue). Due to the pre-processing techniques applied [11,12] artefactual enhancement has been minimised in PMs constructed with this technique; PMs show preferable characteristics such as sharpness, permitting easier differentiation of structures of interest. Voxels allocated within blood vessels usually assume M3 , indicating the presence of the wash-out phase; an affected area is split into several clusters of blue and green, and some areas are coloured red, which allows identification of tissues at which signal intensity did not peak during the acquisition of DCEMRI data. This image allows assessment of the Gd-DTPA kinetics and tissue behaviour classification based on its temporal pattern of the contrast agent uptake. Enhanced voxel count Ntotal is used as a statistic for quantification of inflammation.
Fast and Robust Analysis of Dynamic Contrast Enhanced MRI Datasets
3
265
Results and Discussion
The algorithm for quantitative analysis of DCE-MRI data has been tested on datasets acquired by high and low field scanners, using SE and GRE sequences, depicting different joints. The disease activity was scored according to the OMERACT RAMRIS [13] evaluation standard for synovitis, bone marrow edema and erosions. Patients processed with low field MRI scanner had a reference fluid in a 10ml plexiglass, attached along the ulnar long axis of the hand. This was used as a reference for the dynamic calculation. We have considered 33 patients with active RA (129 temporal slices), 4 healthy controls (12 temporal slices) and 1 patient with no RA (3 temporal slices), but suffering from the wrist pain. Heuristics M E, IRE, and pixel count Ntotal computed with the new technique are significantly higher in the patients with active RA comparing to the controls (Fig. 3). In a temporal slice acquired from the
14 0.5
ME
Ntotal
10
6
0.3
0.1 2
Active RA, N=33
Controls, N=4
Active RA, N=33
Controls, N=4
Fig. 3. Box-and-whisker plot of the M E (left) and Ntotal (right) for patients with active RA and healthy controls N - is a number of patients
Fig. 4. 1st row: PMs of M E for patients with active RA; data from high field scanner (a.); low field scanner, SE sequence (b); low field scanner, GRE sequence (c.); PM of M E for a healthy control (d.). The colour coding here considers M E and plots lower values in red, moving to yellow then white as the values increase. 2nd row: corresponding Gd-DTPA takeup maps.
266
O. Kubassova et al.
Fig. 5. Parametric map of M E constructed for a patient re-scanned 3 times after injection of the steroid
Fig. 6. Pre-, post-contrast images; PM of M E and Gd-DTPA for patient with no RA
patients with active RA by high and low field scanners M E is on average between 2.5 and 4 with maximum reaching 14, and Ntotal (computed as a ratio between a number of active voxels and the total number of voxels within the joints’ exterior) is between 0.3 and 0.4 (depending on the degree of inflammation); for healthy controls M E is on average lower than 1.2 with maximum at 2 and Ntotal ∈ [0, 0.05]. These numbers correlate highly with number of swollen joints. We have analysed two patients with follow-up scans two weeks after intraarticular glucocorticoid injection; one scanned in the axial plane and the other in the coronal plane. PMs constructed for one of the patients suggest a diminished perfusion in the visible pannus (an expected treatment effect). Such information is not available with conventional T1 weighted images after Gd-DTPA contrast or in ultrasound (US) examination, where the patient had high and unchanged MRI synovitis and US colour fraction (inflammation) scores at follow-up. Another patient had a short clinical relief, but relapsed a few days after each injection with pain and discomfort in the wrist. US as well as static MRI showed no treatment response. This information is reflected by PMs constructed for follow-up examinations (Fig. 5).
Fast and Robust Analysis of Dynamic Contrast Enhanced MRI Datasets
267
A frequent problem is disease assessment in healthy controls. Usually postand pre-contrast images are compared and ROI is chosen presumptively based on anatomical landmarks. One of the subjects participating in our experiments had no RA, but suffered from occult wrist pain – possibly due to a ganglion in the wrist joint, that we could not find in the post contrast sequences. Pre-, post-contrast images and PM of M E and Gd-DTPA constructed for this subject are shown in Fig. 6. Visual and quantitative inspection of results obtained with this method allows us to conclude that the patient does not have an inflammatory arthritis, however some tissues exhibit reaction to the contrast agent (Ntotal = 0.06, M E is on average 1.3 with maximum at 2.4); healthy controls normally have less then 0.02 of enhanced pixels; which correspond well to the fact that the ultrasound Doppler showed mild colour Doppler activity in the wrist leading to the conclusion that the patient suffered from a mild unspecific irritation of the wrist. Our results show that the new method permits depiction of different disease activity in separate compartments of a joint. The method delivers more differentiated information to the reader regarding the areas with most active perfusion. This information is not currently achievable from static T1 weighted images, but can to some degree be obtained from the US Doppler images. US on the other hand has the disadvantage of inter observer variations as well as problems with spatial orientation between examinations. Distribution of the coloured pixels in PMs gives reliable information in the relevant and expected areas of high inflammatory activity. Based on our experiences these areas are in hierarchical order from most to least: the radial and ulnar areas of the proximal part of the wrist, around the ulnar head and to a lesser degree within the inter-carpal and carpo-metacarpal areas. Thus the PMs obtained give the reader information on areas where the patient has most perfusion, and provide more comprehensive information of localised disease activity. This is valuable for disease diagnosis and guidance of intra-articular therapy. Analysing several corresponding slices, the reader gets a 3D impression of the inflammatory distribution in the imaged area. It remains to be tested which image plane (axial or coronal) as well as sequence type (SE or GRE) provide the best platform to evaluate the patient’s treatment effect in the short and long terms. The proposed method seems to be the most suitable for such judgements.
4
Conclusion
Among the randomly chosen patient cohort in this pilot study, PMs are noticeably different, corresponding to our clinical and imaging experiences. Reduction in joints’ inflammation has been detected accurately, even when it was not clinically present. Our results demonstrate that the method is sensitive and may be useful in the diagnosis and follow-up examinations of patients who are receiving disease-controlling treatment.
268
O. Kubassova et al.
The method provides a numeric result upon which clinical and research decisions can be confidently made. It is objective and user-independent, which permits generation of easily reproducible and reliable results. Moreover, it allows visual assessment of inflammation and Gd-DTPA uptake previously unavailable. The possibility of acquisition of the data from low field MRI further extends the usability of the method as low field scanners are more patient friendly and costs are far lower compared to high field scans. Standardisation of DCE-MRI assessment techniques is crucial and further research will focus on finding correlation between the treatment response seen with different modalities (US, static MRI and dynamic MRI (ROI approach)) and this technique.
References 1. Silman, A.J., Pearson, J.E.: Epidemiology and genetics of rheumatoid arthritis. Arthritis Research 4(3), S265–S272 (2002) 2. Verstraete, K.L., Deene, Y.D., Roels, H., Dierick, A., Uyttendaele, D., Kunnen, M.: Benign and malignant musculoskeletal lesions: dynamic contrast-enhanced MR imaging–parametric ‘first-pass’ images depict tissue vascularization and perfusion. Radiology 192(3), 835–843 (1994) 3. Panting, J.R., Gatehouse, P.D., Yang, Z.G., Jerosch-Herold, M., Wilke, N., Firmin, D.N., Pennell, D.J.: Echo-planar magnetic resonance myocardial perfusion imaging: Parametric map analysis and comparison with thallium SPECT. Magnetic Resonance Imaging 13(4), 192–200 (2001) 4. Tofts, P.S.: Modelling tracer kinetics in dynamic Gd-DTPA MR imaging. Journal of Magnetic Resonance Imaging 7(1), 91–101 (1997) 5. d’Arcy, J.A., Collins, D.J., Rowland, I.J., Padhani, A.R., Leach, M.O.: Applications of sliding window reconstruction with Cartesian sampling for dynamic contrast enhanced MRI. NMR in Biomedicine 15(2), 174–183 (2002) 6. Kuhl, C.K., Mielcareck, P., Klaschik, S., Leutner, C., Wardelmann, E., Gieseke, J., Schild, H.H.: Dynamic breast MR imaging: Are signal intensity time course data useful for differential diagnosis of enhancing lesions? Radiology 211(1), 101–110 (1999) 7. Cimmino, M.A., Innocenti, S., Livrone, F., Magnaguagno, F., Silvesti, E., Garlaschi, G.: Dynamic gadolinium-enhanced MRI of the wrist in patients with rheumatoid arthritis. Arthritis and Rheumatism 48(5), 674–680 (2003) 8. McQueen, F.: Comments on the article by Cimmino et al.: Dynamic gadoliniumenhanced MRI of the wrist in patients with rheumatoid arthritis. Arthritis and Rheumatism 50(2), 674–680 (2004) 9. Radjenovic, A.: Measurement of physiological variables by dynamic Gd-DTPA enhanced MRI. PhD thesis, School of Medicine, University of Leeds (2003) 10. Kubassova, O., Boyle, R.D., Radjenovic, A.: Novel method for quantitative evaluation of segmentation outputs for dynamic contrast-enhanced MRI data in RA studies. In: Joint Disease Workshop, Medical Image Computing and Computer Assisted Intervention. vol. 1, pp. 72–79 (2006)
Fast and Robust Analysis of Dynamic Contrast Enhanced MRI Datasets
269
11. Kubassova, O., Boyle, R.D., Pyatnizkiy, M.: Bone segmentation in metacarpophalangeal MR data. In: 3rd International Conference on Advances in Pattern Recognition, Bath, UK. vol. 2, pp. 726–735 (2005) 12. Periaswamy, S., Farid, H.: Medical image registration with partial data. Medical Image Analysis 10(3), 452–464 (2006) 13. Østergaard, M., Peterfy, C., Conaghan, P., McQueen, F., Bird, P., Ejbjerg, B., Shnier, R., O’Connor, P., Klarlund, M., Emery, P., Genant, H., Lassere, M., Edmonds, J.: OMERACT rheumatoid arthritis magnetic resonance imaging studies. core set of MRI acquisitions, joint pathology definitions, and the OMERACT RAMRI scoring system. Rheumatology 30(6), 1385–1386 (2003)
Functional Near Infrared Spectroscopy in Novice and Expert Surgeons – A Manifold Embedding Approach Daniel Richard Leff, Felipe Orihuela-Espina, Louis Atallah, Ara Darzi, and Guang-Zhong Yang Royal Society/Wolfson Medical Image Computing Laboratory & Department of Biosurgery and Surgical Technology, Imperial College London, United Kingdom {d.leff, f.orihuela-espina, l.atallah, a.darzi, g.z.yang}@imperial.ac.uk
Abstract. Monitoring expertise development in surgery is likely to benefit from evaluations of cortical brain function. Brain behaviour is dynamic and nonlinear. The aim of this paper is to evaluate the application of a nonlinear dimensionality reduction technique to enhance visualisation of multidimensional functional Near Infrared Spectroscopy (fNIRS) data. Manifold embedding is applied to prefrontal haemodynamic signals obtained during a surgical knot tying task from a group of 62 healthy subjects with varying surgical expertise. The proposed method makes no assumption about the functionality of the data set and is shown to be capable of recovering the intrinsic low dimensional structure of in vivo brain data. After manifold embedding, Earth Mover’s Distance (EMD) is used to quantify different patterns of cortical behaviour associated with surgical expertise and analyse the degree of inter-hemispheric channel pair symmetry.
1 Introduction Research evaluating technical skills in surgery is in a current state of flux. Traditional assessments of technical skills based on isolated evaluations of dexterity are slowly being replaced by combined analyses of visuo-motor strategies [1], hand-eye coordination [1] and more recently cortical brain function [2]. The motivation for the latter is to elucidate longitudinal performance variations, which may assist the selection and assessment of future surgeons. Cortical regions should be probed to determine whether excitation patterns vary according to technical expertise. This will inform assessments of neuroplasticity obtained through serial experiments. The prefrontal cortex, owing to its role in supporting acquisition of novel motor skills, is likely to be preferentially recruited in surgical novices [3]. Functional Near Infrared spectroscopy (fNIRS) is a non-invasive neuroimaging technique that permits evaluations of cortical function in ambulant subjects, enabling complex motor tasks to be evaluated in realistic settings and in real-time. NIR light (700-1000nm) emitted to the scalp is detectable as attenuated light following absorption and scattering on the cortical surface[4]. Attenuated light levels are interpreted as relative changes in oxygenated haemoglobin (HbO2) and deoxygenated haemoglobin (HHb); the two dominant chromophores in cortical vasculature. The N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 270–277, 2007. © Springer-Verlag Berlin Heidelberg 2007
Functional Near Infrared Spectroscopy in Novice and Expert Surgeons
271
typical haemodynamic response to cortical excitation consists of task induced HbO2 increases coupled to HHb decreases [5]. Multi-channel fNIRS data, with configuration illustrated in Figure 1, for example, is complex and its intrinsic dimension d is given by d = c ⋅ u ⋅ h ⋅ s , where c are channels, u are users or subjects, h are number of haemoglobin components and s are number of observations. Dimensionality reduction techniques may assist visualisation of fNIRS data. Brain signal behaviour is nonlinear and dynamic [6]. Principal Component Analysis (PCA) and Multi-dimensional Scaling (MDS) are problematic in capturing nonlinearities of the in vivo data. The purpose of this study is to improve the visualisation of prefrontal haemodynamic signals obtained from novice and expert surgeons, through the use of manifold embedding. The method is based on isometric feature mapping or Isomap for recovering the intrinsic dimensionality and nonlinear structure of the dataset [7]. The most important feature of the method is that it preserves the geodesic distances between local data points in the input space, which may better reflect their intrinsic similarity than the Euclidean distance. As opposed to studies in which the intrinsic structure of the data is known a priori [8], this study demonstrates the ability of the proposed method to recover meaningful global coordinates of data with unknown dimensions. Projecting the data into a low dimensional space enables cortical behaviour to be quantitatively compared between groups of observers, differing extensively in surgical expertise.
2 Methods 2.1 Subjects and Task Paradigm Sixty-two, healthy right handed male subjects were recruited for this study (19 consultant surgeons, 21 surgical trainees, and 22 medical students). Medical students were trained in a single session to perform hand tied surgical reef knots. The paradigm investigated consists of four throws of a surgical reef knot. Following a period of baseline motor rest (30 seconds) prior to the start of the first stimulus, five experimental blocks were conducted, each consisting of ‘trial’ (self paced surgical reef knot) and ‘post trial’ rest periods (20-30 seconds). A previous feasibility study [2] demonstrated an appropriate signal to noise ratio, experiment duration and subject comfort, following five task repetitions. 2.2 Functional NeuroImaging and Dexterity Analysis Optical measurements were acquired using a 24-channel Optical Topography System (ETG – 4000, Hitachi Medical Co., Japan). A configuration of two 3×3 probe arrays was used, composing ten optical fibre sources emitting NIR light (from laser diodes) at 690 and 830 nm and eight optical fibre receivers coupled to avalanche photodiode detectors [9]. Optodes were held within a plastic helmet (inter-optode distance 30mm), affixed to the participant’s head with a surgical bandage. Optodes were placed bilaterally over the prefrontal cortices, positioned according to the International 10-20 system of electrode placement [10], as illustrated in Fig. 1. Detected light intensity data was sampled at 10Hz. Technical performance was monitored using a surgical assessment device [11], which translates positional data into dexterity measures.
272
D.R. Leff et al.
Fig. 1. Schematic illustration of the positions of NIR light sources (dotted circles), detectors (interrupted squares) and locations of corresponding channels numerically labelled (shaded boxes). The approximate 10/10 topographic location of each array is illustrated.
All optical data was processed using the functional Optical Signal Analysis program (fOSA, University College London, UK [12]). Relative changes in light intensities were converted to changes in haemoglobin (HbO2, HHb and their sum, total haemoglobin HbT), applying the Modified Beer Lambert Law. Data was baseline corrected, decimated to 1Hz, and detrended to remove system drift and unrelated physiological signals. Data was averaged across experimental blocks. To overcome temporal variations in knot-tying durations, data was resampled applying a low pass filter. Final datasets comprised 20 baseline values, 37 trial values, and 20 post trial rest values for each NIR channel. Data was cleaned by visual assessment from two investigators blinded to each others results, followed by matching and agreement to eliminate noisy channels. A 74-D feature space F was constructed by using resampled HbO2 and HHb trial values. 2.3 Non Linear Dimensionality Reduction For manifold embedding, the Isomap algorithm [7] was used, which combined the algorithmic efficiencies of other dimensionality reduction techniques including global optimality, computational efficiency and guaranteed asymptotic convergence with the ability to learn a broad class of non-linear manifolds. Unlike other locally linear techniques, Isomap is able to represent the global structure of a dataset within a single coordinate system, which is particularly useful for group-wise comparison. The three stages of the Isomap embedding begins with constructing the nearest neighbour graph in F space based on Euclidian distances. The two standard approaches are to connect each point with its K nearest neighbours, or to connect all points within a radius ε. These neighbourhood relations are represented in a weighted graph G over the data points, with edges of weight dG (i, j ) between neighbouring points. In the second step, Isomap estimates the geodesic distances dM (i, j ) between all pairs of points on the manifold M by computing their shortest path distances in graph G . The matrix of the final values of DG contains the shortest path distances between all pairs of points in G. By applying classical MDS to matrix DG and constructing embedding data in n -dimensional Euclidean space P that best preserves the intrinsic geometry, the embedded results is derived. MDS uses a function
Functional Near Infrared Spectroscopy in Novice and Expert Surgeons
273
minimization algorithm which evaluates different possible arrangements of points in the P space aiming to maximizing the goodness of fit. The cost function E used by Isomap is given by the L2 norm of the difference expressed as:
E = τ(DG ) − τ(DP )
(1)
where DP is the matrix of Euclidean distances in P space, and τ is an operator to convert the distances matrix to scalar products. 2.4 Earth Mover’s Distance (EMD) For assessing cortical excitation patterns of different subject groups, EMD is used as a metric for quantitative comparison. EMD is based on the transportation problem [13] and evaluates the dissimilarity between two multidimensional distributions, which has been used to analyse visual search behaviours using eye-tracking[14]. Each distribution is composed of a set of points each one with an associated weight w . This is referred to as the signature of the distribution on the manifold. The EMD between two signatures is the minimum amount of ‘work’ needed to transform one signature into another. The work required is the proportion of weight being moved, multiplied by the distance between the old and new locations. Consider two distributions X = {(x1 , w x 1 ),(x 2 , wx 2 ), …,(x m , w xm )} and Y = {(y1, wy 1 ),(y2 , wy 2 ), …,(yn , wyn )} . It is possible to define a cost matrix C such that cij represents the cost of transforming a point i = 1…m into a point j = 1…n (known as ground distance). Mathematically, EMD can be defined as: m
n
∑ ∑c
f
ij ij
EMD(X ,Y ) =
i =1 j =1 m n
∑ ∑ fij
(2)
i =1 j =1
where each fij is a flow representing the amount of weight moved from i to j . The ground distances are the L2 norm, and hence EMD is a lower bounded metric. EMD aims to find a flow set f = {fij } subject to the following constraints:
fij ≥ 0,
1 ≤ i ≤ m,1 ≤ j ≤ n
m
∑f
ij
≤ wyj ,
j = 1…n
∑f
ij
≤ w xi ,
i = 1…m
i =1 n
j =1
m
n
∑∑f
ij
i =1 j =1
That minimises function
n ⎛m ⎞ = min ⎜⎜⎜∑ w xi , ∑ wyj ⎟⎟⎟ ⎜⎝ i =1 ⎠⎟ j =1
∑ ∑c f
ij ij
i =1…m j =1…n
(3)
274
D.R. Leff et al.
3 Results Typical haemodynamic responses to knot-tying are illustrated in Fig. 2. Each subject produces as many points in the 74 dimensional feature space F as there are channels, i.e. 24 in this study. Isomap was used to map points from F to P using k = 7 neighbours to construct graph G . The intrinsic dimensionality, as estimated from the flexure point of the residual variance was located at the third component. The ability of the embedding technique to resolve meaningful global co-ordinates is highlighted in Fig. 3, where each component of the embedding correlates well with one degree of freedom of the underlying data, left to right, HbO2 ( r 2 = 0.97 ), up and down, HHb ( r 2 = 0.65 ). Fig. 4 further illustrates the results of two-dimensional embedding for each individual NIR channel. It is evident that consultant surgeons and trainees appear tightly clustered in P , whereas greater dispersion is observed in novices.
Fig. 2. Topographic maps illustrating HbO2 intensity change observed in a novice (top row) and consultant (bottom row). Columns represent baseline rest, 2, 6, and 10 seconds into a knottying trial. Emitters (red circles), detectors (blue circles) and corresponding channels (pink circles) are overlaid on to T1 weighted anatomical MRI images.
Once data is mapped to the embedded space, EMD was used to compare distributions between groups for a specific channel of interest, as well as to compare a given channel and its symmetrical opposite in the contralateral hemisphere. In both instances, the Euclidean distances between points in P space were used as the ground distance. Euclidean distances in the embedded space P are an approximation of the geodesic distance in F space. EMD weights are calculated as 1/z with z being the number of points in the single cluster of each signature. Table 1 summarises the relevant EMD results for comparison between groups. Greater similarity is observed between trainees and consultants, than between these groups and novices. The degree of inter-hemispheric channel pair symmetry is illustrated in Fig. 5 (a). In order to contextualize inter-hemispheric symmetry in terms of cortical excitation, further analysis was performed specifically comparing averaged resting Hb values with averaged stimulus values using non-parametric tests of significance (Mann Whitney U). The results are illustrated in Fig. 5 (b). In novices, patterns consistent with ‘activation’ were only observed in left hemispheric channels, indicating an asymmetrical response.
Functional Near Infrared Spectroscopy in Novice and Expert Surgeons
275
Fig. 3. Manifold embedding results showing the distribution of channel response and how it varies along the first two principal dimensions for all the subjects studied. The locations of the original signals (a-f), representing resampled task related Hb signals, are marked on the embedded space. This demonstrates the intrinsic trend captured by the embedding technique.
Fig. 4. Embedding per NIR channel where novices (Group 1) are labelled as plus signs, trainees (Group 2) as squares and consultants (Group 3) as open triangles. All subplots are constructed on identical scales. The abscissa corresponds to Component 1 and the ordinate to Component 2.
276
D.R. Leff et al. Table 1. Comparison of between-group EMD results Parameter Mean St.Dev
Novices to Consultants 81.62 32.62
Novices to Trainees 79.50 39.53
Trainees to Consultants 24.19 6.88
Fig. 5. (a) Illustration of inter-hemispheric channel pair symmetry as quantified using EMD. (b) Task induced Hb changes across all 24 channels. Significant ( p ≤ 0.05 ) increases in HbO2 and decreases in HHb were assigned 1.0. All other patterns of change were assigned 0.2 except for nonsignificant HbO2 changes, which were assigned 0.5.
Knot tying performance data, as measured by ICSAD, was analysed to determine if dissimilarity in cortical signal behaviour was associated with variations in surgical expertise. Dexterity data was analysed using non-parametric tests of significance. For each dexterity measure of interest, comparisons between groups were made using the Mann Whitney U test. A value of p ≤ 0.05 was assumed to be statistically significant. Consultants and trainees were significantly faster ( p ≤ 0.001 ), made fewer unnecessary movements ( p ≤ 0.001 ) and used shorter path-lengths ( p ≤ 0.001 ) to complete trials compared to medical students.
4 Discussion and Conclusion In this paper, a novel approach to understanding complex fNIRS data has been proposed. The technique places no assumption regarding the functionality of the dataset, yet is capable of identifying meaningful patterns of cortical signal behaviour in subjects with different surgical experience. The results suggest that patterns of prefrontal behaviour are comparable in consultants and trainees, in whom technical performance could not be discriminated. In contrast, medical students exhibit dissimilar patterns of cortical behaviour and significantly poorer performance. Moreover, EMD results strongly suggest an asymmetrical cortical response, most pronounced in novices. This is validated by conventional statistical analysis, which
Functional Near Infrared Spectroscopy in Novice and Expert Surgeons
277
demonstrated lateralised left prefrontal excitation in medical students. Future work should aim to classify patterns of cortical activation in low-dimensional space, and to clarify the functional significance of hemispheric lateralisation. It is anticipated that these techniques may discriminate observers based upon surgical expertise.
References 1. Leong, J.J.H., Nicolaou, M., Atallah, L., Mylonas, G., Darzi, A.W., Yang, G.Z.: HMM Assessment of Quality of Movement Trajectory in Laparoscopic Surgery. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 752–759. Springer, Heidelberg (2006) 2. Leff, D.R., Koh, P.H., Aggarwal, R., Deligiani, F., Elwell, C., Delpy, D.T., Darzi, A.W., Yang, G.Z.: Optical Mapping of the Frontal Cortex During a Surgical Knot-Tying Task, a Feasibility Study. In: Medical Imaging Augmented Reality (MIAR), vol. 4091, pp. 140– 147 (2006) 3. Halsband, U., Lange, R.K.: Motor learning in man: a review of functional and clinical studies. J. Physiol. Paris 99, 414–424 (2006) 4. Villringer, A., Planck, J., Hock, C., Schleinkofer, L., Dirnagl, U.: Near infrared spectroscopy (NIRS): a new tool to study hemodynamic changes during activation of brain function in human adults. Neurosci. Lett. 154, 101–104 (1993) 5. Strangman, G., Boas, D.A., Sutton, J.P.: Non-invasive neuroimaging using near-infrared light. Biol. Psychiatry 52, 679–693 (2002) 6. Friston, K.J.: Human Brain Fuction, 2nd edn. (2003) 7. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000) 8. Geng, X., Zhan, D.C., Zhou, Z.H.: Supervised Nonlinear Dimensionality Reduction for Visualization and Classification. IEEE Transactions on systems, man and cybernetics, Part B: cybernetics 35, 1098–1107 (2005) 9. Sato, H., Kiguchi, M., Kawaguchi, F., Maki, A.: Practicality of wavelength selection to improve signal-to-noise ratio in near-infrared spectroscopy. NeuroImage 21, 1554–1562 (2004) 10. Jasper, H.H.: The ten-twenty electrode system of the International Federation. Electroencephalogr. Clin.Neurophysiol. 10, 371–375 (1958) 11. Taffinder, N., Smith, S., Mair, J., Russell, R., Darzi, A.: Can a computer measure surgical precision? Reliability, validity and feasibility of the ICSAD. Surg. Endosc. 13, 81 (1999) 12. Koh, P.H., Delpy, D.T., Elwell, C.E.: fOSA: A software tool for NIRS processing. In: Proceedings of Optical Society of America (2006) 13. Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: ICCV 1998. International Conference on Computer Vision, pp. 59–66 (1998) 14. Dempere-Marco, L., Hu, X.P., Ellis, S.M., Hansell, D.M., Yang, G.Z.: Analysis of visual search patterns with EMD metric in normalized anatomical space. IEEE Trans. Med. Imaging. 25, 1011–1021 (2006)
A Hierarchical Unsupervised Spectral Clustering Scheme for Detection of Prostate Cancer from Magnetic Resonance Spectroscopy (MRS) Pallavi Tiwari1 , Anant Madabhushi1 , and Mark Rosen2 Department of Biomedical Engineering, Rutgers University, USA
[email protected] Department of Surgical Pathology, University of Pennsylvania, USA
[email protected]
1
2
Abstract. Magnetic Resonance Spectroscopy (MRS) along with MRI has emerged as a promising tool in diagnosis and potentially screening for prostate cancer. Surprisingly little work, however, has been done in the area of automated quantitative analysis of MRS data for identifying likely cancerous areas in the prostate. In this paper we present a novel approach that integrates a manifold learning scheme (spectral clustering) with an unsupervised hierarchical clustering algorithm to identify spectra corresponding to cancer on prostate MRS. Ground truth location for cancer on prostate was determined from the sextant location and maximum size of cancer available from the ACRIN database, from where a total of 14 MRS studies were obtained. The high dimensional information in the MR spectra is non linearly transformed to a low dimensional embedding space and via repeated clustering of the voxels in this space, non informative spectra are eliminated and only informative spectra retained. Our scheme successfully identified MRS cancer voxels with sensitivity of 77.8%, false positive rate of 28.92%, and false negative rate of 20.88% on a total of 14 prostate MRS studies. Qualitative results seem to suggest that our method has higher specificity compared to a popular scheme, z-score, routinely used for analysis of MRS data.
1
Introduction
Prostatic adenocarcinoma (CAP) is the second leading cause of cancer related deaths in America, with an estimated 220,000 new cases every year (Source: American Cancer Society). The current standard for detection of prostate cancer is transrectal ultrasound (TRUS) guided symmetrical needle biopsy which has a high false negative rate associated with it [1]. Over the past few years, Magnetic Resonance Spectroscopic Imaging (MRSI) has emerged as a useful complement to structural MR imaging for potential screening of prostate cancer. Kurhanewicz
This work was supported by grants from the Coulter foundation, Busch Biomedical Award, Cancer Institute of New Jersey, New Jersey Commission on Cancer Research, and the National Cancer Institute (R21CA127186-01, R03CA128081-01).
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 278–286, 2007. c Springer-Verlag Berlin Heidelberg 2007
A Hierarchical Unsupervised Spectral Clustering Scheme
279
et al. [2] have suggested that the integration of MRS and MRI could potentially improve specificity and sensitivity for screening of CAP, compared to what might be obtainable from any individual modality. MRS is a non-invasive analytical technique for measuring the chemical content of living tissues which is used to detect changes in the concentrations of specific molecular markers in the prostate, such as citrate, creatine, and choline. Variations in the concentrations of these substances can detect the presence of CAP. Most automated analysis work for MRS for cancer detection has focused on developing fitting techniques that yield peak areas or relative metabolic concentrations of different metabolites like choline, creatine and citrate as accurately as possible. The automated peak finding algorithms suffer from problems associated with the noisy data which worsens when a large baseline is present along with low signal to noise ratio. Mcnight, et al. [3] have looked at z-score (ratio of difference between population mean and individual score to the population standard deviation) analysis as an automated technique for quantitative assessment of 3D MRSI data for glioma. A predefined threshold value of the z score is used to classify spectra in two classes: malignant and benign. Kurhanewicz, et al. [4] have worked on the quantification of prostate MRSI by model based time fitting and frequency domain analysis. Some researchers have applied linear dimensionality reduction methods such as independent component analysis (ICA), principal component analysis (PCA) in conjunction with classifiers [5] [6] to separate different tissue classes from brain MRS. However, we have previously demonstrated that due to inherent non linearity in high dimensional biomedical studies, linear reduction methods are limited for purposes of classification [7]. In this paper we present a novel automated approach for identification of cancer spectra on prostate MRS via the use of manifold learning and hierarchical clustering. Figure 1 illustrates the modules and the pathways comprising our automated quantitative analysis system for identifying cancer spectra on prostate MRS. In the first step, a manifold learning [8] scheme (spectral learning or graph embedding) is applied to embed the spectral data in a low dimensional space so that objects that are adjacent in the high dimensional ambient space are mapped to nearby points in the output embedding. Hierarchical unsupervised k -means clustering is applied to distinguish non-informative (zero-padded spectra and spectra lying outside the prostate) from informative spectra (within prostate). The objects in the dominant cluster, which correspond to the spectra lying outside the prostate, are pruned and eliminated from subsequent analysis. The recursive scheme alternates between computing the low dimensional manifold of all the spectra in the 3D MRS scene and the unsupervised clustering algorithm to identify and eliminate non-informative spectra. This scheme is recursed until the sub-clusters corresponding to cancer spectra are identified. The primary contributions and novel aspects of this work are, • The use of non-linear dimensionality reduction methods (spectral clustering) to exploit the inherent non-linearity in the high dimensional spectral data and embed the data in a reduced dimensional linear subspace.
280
P. Tiwari, A. Madabhushi, and M. Rosen
• A hierarchical clustering scheme to recursively distinguish informative from non-informative spectra in the lower dimensional embedding space. • The cascaded scheme enables accurate identification of cancer spectra efficiently, and is also qualitatively shown to perform better compared to the popular z-score method.
Input N MRS spectra
Manifold learning to embed spectra
Identify and eliminate dominant non-informative cluster
Unsupervised clustering of spectra in lower dimension
Fig. 1. Flowchart showing various system components and methodological overview
The rest of this paper is organized as follows. In Section 2 we present a detailed description of our methods while in Section 3 we describe our results. Concluding remarks and future directions are presented in Section 4.
2 2.1
Methods Data Description
We represent the 3D prostate T2 weighted scene by C = (C, f ) where C is a 3D grid of voxels c ∈ C and f is a function that assigns an intensity value to every c ∈ C. We also define a spectral image C s = (G, g) where G is also a 3D grid superposed on C and G ⊂ C. For every spatial location u ∈ G, there is an associated spectra g(u). Hence while f (c) is a scalar, g(u) is a 256 dimensional vector valued function. Note that the size of the spatial grid locations u ∈ G is equivalent to 8 × 8 voxels c ∈ C. The spectral datasets used for the study were collected during the ACRIN multi-site trial of prostate MRS acquired with 1.5 Tesla GE Medical Systems through the PROSE(c) package (voxel width 0.4 x 0.4 x 3 mm). Datasets were obtained from 14 patients having CAP with different degrees of severity. The spectral grid was contained in DICOM images from which the 16 × 16 grid containing N =256 spectra was obtained using IDL. Of these N spectra, over half are zero-padded or non-informative lying outside the prostate. Figures 2 (a)-(b) show a spectral grid superimposed on a T2 MR image and the corresponding spectra. Figure 2(c) shows the magnified version of a single 256 dimensional MRS spectra from within G. Note that due to noise in the spectra, it is very difficult to identify individual peaks for creatine, choline and citrate from g(u).
A Hierarchical Unsupervised Spectral Clustering Scheme
281
Fig. 2. (a) Slice of T2 weighted MR image with overlay of MRS grid (G), (b) Individual MRS spectra acquired from each u ∈ G, and (c) a single 256 dimensional MRS spectra (g(u)). Note that the prostate is normally contained in a 3 × 6 or a 3 × 7 grid which varies in different studies based on the prostate size. Radiologists look at relative peak heights of creatine, choline and citrate within g(u) to identify possible cancer presence.
2.2
Determining Ground Truth for Cancer Extent on Prostate MRS
Since whole mount histological sections corresponding to the MR/MRS studies were not available for the ACRIN database, we were only able to determine approximate spatial extent of cancer within G [9]. During the MRSI ACRIN study, arbitrary divisions were established by the radiologists to obtain a rough estimate of the location of cancer. The prostate was first divided into two regions: Left(L) and Right(R) and the slices were then further divided into three regions: Base(B), Midgland(M) and Apex(A). Thus a total of 6 potential cancer locations were defined: Left Base(LB), Left Midgland(LM), Left Apex(LA), Right Base(RB), Right Midgland(RM) and Right Apex(RA). Presence or absence of cancer in each of these 6 candidate locations, determined via needle biopsy, was recorded. The maximum diameter of the cancer was also recorded in each of the 6 candidate locations. For a MRS scene C s , with known cancer in left midgland (LM), the prostate contained in a 3×6 grid and prostate midgland extending over 2 contiguous slices, we define a potential cancer space GP ⊂ G, within which the cancer is present. If we separate G into two equal right and left halves of 3 × 3, the total number of voxels u ∈ GP is 18 (3 × 3 × 2). The total number of actual cancer voxels of maximum diameter of within the cancer space, GP , is obtained by knowledge (MaxDiameter)2 , where refers cancer and given as: No of candidate slices × ΔxΔy to the ceiling operation and Δx, Δy refer to the size of voxel u in the x and y dimensions. Hence for a study with a cancer with maximum diameter of 13.75 mm in LM, 8 voxels within GP correspond to cancer. Note that the cancer ground truth we determine, does not convey information regarding the precise spatial location of cancer voxels within GP , only the number. 2.3
Manifold Learning Via Spectral Clustering
The spectra g(u), for u ∈ G lies in a 256 dimensional space. Hence, our aim is to ˆ find a embedding vector X(u) for each voxel u ∈ G, and its associated class ω
282
P. Tiwari, A. Madabhushi, and M. Rosen
such that the distance between g(u) and ω is monotonically related to G in the lower dimensional space. Hence if voxels u, v ∈ G both belong to class ω, then 2 ˆ ˆ [X(u) − X(v)] should be small. To compute the optimal embedding, we first define a matrix W representing the similarity between any two objects u, v ∈ G in high dimensional feature space. W (u, v) = e||g(u)−g(v)|| ∈ R|G|×|G|,
(1)
is obtained from where |G| is the cardinality of set G. The embedding vector X the maximization of the function: T ˆ = 2γ X (D − W )X , EW (X) (2) T DX X where D(u, u)= v W (u, v) and γ = |G| − 1. The embedding space is defined by the eigenvectors corresponding to the smallest A eigenvalues of (D − W ) = λDW for every u ∈ G, the embedding X(u) ˆ X contains the coordinates of u ˆ in the embedding space and is given as, X(u) = [ˆ eA (u)A ∈ {1, 2, · · · , β}] where eˆA (u), is a A dimensional vector of eigen values associated with u. 2.4
Hierarchical Cascade to Prune Non-informative Spectra ˜ t ⊂ G is obtained by eliminating At each iteration t, for a subset of voxels u, G ˜ t are aggregated into clusters the non-informative spectra g(u). The voxels u ∈ G VT1 , VT2 , VT3 by applying k -means clustering to all u ∈ G in the low dimensional embedding X(u). The number of clusters k = 3 was chosen empirically to correspond to cancer, benign and classes whose attributes are intermediate to normal tissue and cancer (e.g. benign hyperplasia (BPH), high-grade prostatic intraepithelial neoplasia (HGPIN)). Initially, most of the locations u ∈ G correspond to zero padded or non informative spectra and hence the scheme proceeds by eliminating the dominant cluster. Clusters corresponding to cancer and areas within the prostate only become resolvable at higher levels of the cascade scheme after elimination of the dominant non informative spectra. The algorithm below describes precisely how our methodology works. Algorithm HierarclustM RSprostate Input: g(u) for all u ∈ G, T , G ˜T , V 1, V 2, V 3 Output: G T T T begin 0. Initialize G˜0 = G; 1. for t = 0 to T do ˆ t (u); 2. Apply Graph Embedding to g(u), for all u ∈ G˜t to obtain X 1 2 3 3. Apply k -means clustering to obtain clusters Vt , Vt , Vt ; 4. Identify largest cluster Vtmax ; ˜ t+1 ⊂ G˜t by eliminating all u ∈ Vtmax from G˜t ; 5. Create set G 6.endfor end
A Hierarchical Unsupervised Spectral Clustering Scheme
283
Note that since we employ an unsupervised learning approach, it is not clear which of VT1 , VT2 , VT3 actually represents the cancer cluster. The motivation behind the HierarclustM RSprostate algorithm however is to obtain clusters VT1 , VT2 , VT3 which represent, to the extent possible, distinct tissue classes.
3 3.1
Results Qualitative Results
The images in Figure 3 demonstrate qualitative results of our hierarchical cascade scheme for distinguishing informative from non-informative spectra. Figure ˜ 0 (16 × 16) superimposed on the correspond3(a) represents a spatial map of G ˜ 0 is assigned one of three colors, in turn ing T2 weighted scene and every u ∈ G corresponding to one of the three classes determined based on the embedding X(u). Note that the dominant cluster(spatial locations in blue in Figure 3(a)) ˜ 1 (16 × 8) (Figure 3(b)) and from G ˜ 2 (8 × 4). The has been eliminated in G ˜ ˜ lowest level in the cascade (G3 in 3(d)) is obtained from G2 after the spectra on the periphery of the prostate (blue locations) have been removed. The cancer spectra are visible as a distinct cluster (blue cells) on Figure 3(d). The cascade scheme permits the resolvability of the 3 tissue classes (one of which is cancer) which were collapsed together within the informative cluster at the higher levels ˜ 1, G ˜ 2 ). ˜0, G in the cascade (G
(a)
(b)
(c)
(d)
Fig. 3. Spectral grids for a single slice within C S are shown superimposed on T2 for ˜ 1 , (c) G ˜ 2 , and (d) G ˜ 3 . Note that the size of the grid reduces from 16 × 16 ˜ 0 , (b) G (a) G (a) to 6×3 (d) by elimination of non-informative spectra on the 16×16 grid on T2. In 3 (d) cancer class could clearly be discerned as the blue class, since the cancer is located in the right MG slice. Note that the right/left conventions in radiology are reversed.
284
P. Tiwari, A. Madabhushi, and M. Rosen
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 4. (a)-(c) represents the potential cancer locations in blue. Figures 4(d)-(f) shows our classification results for 3 different studies with the cancer voxels shown in blue, while Figures 4(g)-(i) shows the z-score results. Note that for the last study in 4(f), our method has 100% sensitivity with just 2 FP voxels.
Figures 4(a)-(c) shows the potential cancer location volume within G on a single 2D slice of a T2 weighted prostate MR scene for 3 different studies and Figures 3 (d)-(f)show the corresponding result of our hierarchical scheme. In all 3 studies, the cancer cluster can be appreciated as a distinct class on the grid, corresponding to the location specified in the pathology reports. In Figures 4 (d)-(i) we compare the grid maps obtained via our cascaded clustering scheme (4(d)-(f)) with the corresponding plots obtained using the z-score (4(g)-(i)), a popular MRS analysis scheme. In this method, the z-score value z(u), is assigned to each u ∈ G. The spectra g(u) is then assigned to one of the two classes based on whether z(u) is lower or greater than a pre determined threshold. It is apparent from the plots in 4(g)-(i) that none of the classes appear to represent the cancer class. These results clearly suggest that our cascaded scheme has higher specificity compared to the z-score method. 3.2
Quantitative Results
Table 1 shows the quantitative results for 14 different studies. True positive (TP), False positive (FP) and False negative (FN) fractions for every dataset were obtained by comparing the automated results with the ground truth voxels for all the 3 classes obtained. The class which corresponded to maximum TP and minimum FP and FN rates was identified as the cancer class and the respective TP, FP and FN values were reported for that particular class. TP,
A Hierarchical Unsupervised Spectral Clustering Scheme
285
FP and FN percentage values for each of the dataset were then calculated by dividing the TP, FP and FN fraction by the total number of ground truth voxels determined as described in Section 2.2. Average results over 14 studies have been reported. Clearly our scheme appears to have high cancer detection sensitivity and specificity. Table 1. Table showing the average percentage values of TP, FP and FN for our automated detection scheme, averaged over a total of 14 MRS studies Average TP Average FP Average FN 77.80 28.97 20.92
4
Concluding Remarks
In this paper we have presented a novel application of manifold learning and hierarchical clustering for the automated identification of cancer spectra on prostate MRS. Main contributions of our work are: • The integration of unsupervised clustering with a manifold learning scheme to identify and eliminate non informative spectra. • The hierarchical cascade detection scheme to efficiently and accurately identify prostate cancer spectra. • Comparison of our scheme against a popular current MRS analysis scheme (z-score) with respect to an approximately detected ground truth suggests that our hierarchical detection algorithm has comparable sensitivity and higher specificity. In future work we intend to extend our scheme for prostate MRS analysis to incorporate corresponding structural MRI data in order to develop better predictors for identifying CAP.
References 1. Catalona, W., et al.: Measurement of Prostate-Specific Antigen in serum as a Screening Test for Prostate Cancer. J.Med. 324(17), 1156–1161 (1991) 2. Kurhanewicz, J., et al.: Analysis of a Complex of Statistical Variables into Principal Components. Magnetic Resonance in Medicine 50, 944–954 (2003) 3. McKnight, T., et al.: An Automated Technique for the Quantitative Assessment of 3D-MRSI Data from Patients with Glioma. Journal of Magnetic Resonance Imaging 13, 167–177 (2001) 4. Pels, P., Ozturk-Isik, et al.: Quantification of Prostate MRSI Data by Model-Based Time Domain fitting and Frequency Domain Analysis. NMR in Biomedicine 19, 188–197 (2006) 5. Ma, J., Sun, J.: MRS Classification based on Independent Component Analysis and Support Vector Machines. In: IEEE Intl. Conf. on Hybrid Intel. Syst., pp. 81–84. IEEE Computer Society Press, Los Alamitos (2005)
286
P. Tiwari, A. Madabhushi, and M. Rosen
6. Simonetti, A., Melssen, W., Edelenyi, F., et al.: Combination of Feature-Reduced MR Spectroscopic and MR Imaging Data for Improved Brain Tumor Classification. NMR in Biomedicine 18, 34–43 (2005) 7. Lee, G., Madabhushi, A., Rodriguez, C.: An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets. ISBRA (2007) 8. Anant Madabhushi, J.S., et al.: Graph Embedding to Improved Supervised Classification and Novel Class Detection: Application to Prostate Cancer. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 729–737. Springer, Heidelberg (2005) 9. Madabhushi, A., Feldman, M., et al.: Automated Detection of Prostatic Adenocarcinoma from High-Resolution Ex Vivo MRI. IEEE Transactions on Medical Imaging 24(12) (2005)
A Clinically Motivated 2-Fold Framework for Quantifying and Classifying Immunohistochemically Stained Specimens Bonnie Hall1,2,3, Wenjin Chen1,3, Michael Reiss3,4, and David J. Foran1,2,3 1
Center for Biomedical Imaging and Informatics 2 Graduate School of the Biomedical Sciences 3 The Cancer Institute of New Jersey, UMDNJ-Robert Wood Johnson Medical School 4 Dept. of Internal Medicine and Dept. of Molecular Genetics, Microbiology, and Immunology 195 Little Albany St. New Brunswick, NJ 08903 USA {huangbo, reissmi, djf}@umdnj.edu,
[email protected]
Abstract. Motivated by the current limitations of automated quantitative image analysis in discriminating among intracellular immunohistochemical (IHC) staining patterns, this paper presents a two-fold approach for IHC characterization that utilizes both the protein stain information and the surrounding tissue architecture. Through the use of a color unmixing algorithm, stained tissue sections are automatically decomposed into the IHC stain, which visualizes the target protein, and the counterstain which provides an objective indication of the underlying histologic architecture. Feature measures are subsequently extracted from both staining planes. In order to characterize the IHC expression pattern, this approach exploits the use of a non-traditional feature based on textons. Novel biologically motivated filter banks are introduced in order to derive texture signatures for different IHC staining patterns. Systematic experiments using this approach were used to classify breast cancer tissue microarrays which had been previously prepared using immuno-targeted nuclear, cytoplasmic, and membrane stains. Keywords: Quantitative IHC analysis, texture descriptors, expression signatures, automated classification, breast cancer.
1 Introduction The capacity to quantify and characterize protein expression reliably is central to several key areas of investigative cancer research and discovery. Biomarkers can provide tremendous insight into the underlying mechanisms of disease progression and can have significant impact with regard to patient prognosis, treatment, and therapy planning. For instance, the presence of biomarkers such as the Her-2/neu (also called ErbB2) receptor in breast cancer indicates that a given patient may respond to treatment with Trastuzumab. Immunohistochemistry (IHC) is used to visualize these proteins by labeling them with a stain, such as diaminobenzidine (DAB). However, manual scoring of IHC stained pathology suffers from several N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 287–294, 2007. © Springer-Verlag Berlin Heidelberg 2007
288
B. Hall et al.
drawbacks. The most striking limitations include the inability to reproduce results, subjectivity of analyis, and intra- and inter-observer variability. It has already been reported in the literature that quantitative IHC image analysis provides more consistent scoring than traditional manual scoring[1, 2] and better concordance with existing gold standards e.g. fluorescence in situ hybridization (FISH)[3, 4]. However, discrepancies between IHC scoring and actual clinical outcomes often arise irrespective of the means by which analysis is performed i.e. by eye or computer-assisted [3-5]. To address these ambiguities, this paper proposes a fresh approach for characterizing in-situ protein expression by utilizing texture-based feature metrics.
Fig. 1. Images taken at 20x resolution for breast tissue TMA cores stained with DAB (brown regions) and counter stained with hematoxylin (blue regions). Inserts (upper left) show representative regions of staining from each of these cores at high resolution. Note that ER and PR are generally nuclear proteins, keratin is generally found in the cytoplasm, and Her-2/neu is active in disease processes when concentrated on the cytoplasmic membrane.
Constructing a model for quantifying in-situ protein expression required a close examination of the design specifications for our experiments. The first criteria was that the proposed model would reliably and reproducibly capture and quantify the underlying staining characteristics of the protein. Second, in order to be clinically meaningful, a means for detecting and objectively representing the contextual cellular and tissue-level architecture would need to be incorporated into the model since many proteins are only considered active or clinically relevant when they are localized in specific subcellular compartments[4, 6, 7]. Using these criteria we have developed a model that uses the texton signatures of the immunstains to characterize the protein expression pattern, and histological features to determine intracellular localization of target proteins. This information provides important insight into the in-vivo status of these proteins in tumors, such as indicating whether they are being overexpressed and specifying their functional location. In this paper, we report the design, development, and evaluation of a two-fold model for IHC characterization that utilizes both the information provided by the targeted protein (visualized by the DAB stain), and the histological context (visualized by the hematoxylin stain). The model is evaluated on a breast cancer tissue microarray (TMA) stained for four proteins: keratin (indicating regions of carcinoma) and three clinically used markers: estrogen receptor (ER), progesterone receptor (PR), and Her-2/neu. Proficient characterization of IHC with this model is
Framework for Quantification of Immunohistochemistry
289
demonstrated by its ability to classify ER and PR, keratin, and Her-2/neu into three respective classes: nuclear, cytoplasmic, and membrane staining. Highlights of this paper include the use of texton-based features to characterize protein expression, the design and systematic investigation of a new set of filter banks, and the use of a twofold model which combines information derived from both protein staining pattern and underlying tissue architecture. Throughout the course of these experiments a stringent 1/2 cross-validation analysis was carried out to ensure the reliablility and robustness of the results.
2 Two-Fold Modeling of In-situ Protein Expression The two-fold modeling approach we present (Fig. 2) utilizes color decomposition[8] to separate the immunostained tissue disc into its constituent staining maps: the diaminobenzidine (DAB) staining plane which indicates level and distribution of the targeted protein, and the hematoxylin plane, which reveals cellular and tissue architecture.
Fig. 2. The color image is decomposed into the targeted protein (DAB) and the hematoxylin (H) staining maps. Features are extracted from each map and combined for classification.
The DAB staining map is characterized using textons, a texture feature composed of a vocabulary of consensus filter responses derived through cluster analysis [9, 10]. The vocabulary is learned directly from the image by convolving a filter bank with image patches from the immunohistochemically stained map and clustering the response vectors in order to obtain IHC (in this case, DAB) textons. IHC textons are derived for each class of staining pattern and combined to form the vocabulary. An IHC texture histogram is derived by convolving the segmented immunohistochemically stained regions with the same filter bank and matching the resulting filter responses to the closest IHC texton. A normalized histogram of textons detected in the stained area is then generated. In the tissue architecture map, regions corresponding to the segmented DAB stain are analyzed with various features such as an intensity histogram, mean, standard deviation, median intensity, and Haralick’s 2nd order features[11]. The intensity
290
B. Hall et al.
histogram is a normalized tally of intensities (0-255) where 2 intensities are placed per bin for smoothness. Since the counterstain usually does not stain very strongly, our preliminary results show that using intensities 0 through 31 (a 16 dimensional vector) was sufficient to capture the discriminating information present in the intensity histogram.
D = D DAB + D H .
(1)
The DAB stain feature and the hematoxylin (H) feature distances were computed between training and test images and summed according to Eq.1 in order to arrive at the total combined feature distance, D. The χ2 distance metric was used for DDAB and DH . A weighted K-Nearest Neighbor classifier was implemented to perform this operation. 2.1 Filter Bank Construction and Development A range of filter banks was systematically applied in order to derive IHC textons that optimize classification. Based on its superior classification performance when used to assess a natural image database, the first filter bank chosen was the MR8 Filter Bank[10]. Subsequently, a set of novel filter banks were introduced based on the observation that cellular features tended to be more isotropic and less like edges in nature: such as blob-like nuclei and circlular membranes. Adapted from the MR8 filter bank, the first biologically based filter bank 1 (BF1) also contains isotropic filters developed by Schmid[12] as well as two sizes of Gaussian and Laplacian of Gaussian (LOG) filters. This filter bank produces an 8 dimensional response vector (Fig. 3).
Fig. 3. The BF1 filter bank (left), includes edge filters at 2 scales and 6 rotations (σx, σy) = {(1,3), (2,6)}, a black and white-centered bar filter at one scale (1,3) and 6 rotations, 2 sizes of isotropic Gaussian and LOG at 2 scales σ = {5,10}, and 4 Schmid filters where (σ, τ) = {(4,1), (4,2), (6,1), (6,2)}. In order to achieve rotational and size invariance, the six rotations for bar and edge filters and the two scales for the Gaussian and LOG were collapsed down to their maximum response, resulting in an 8 dimensional response vector. Similarly, the BF2 filter bank (right) produces a 10 dimensional response vector through rotation invariance of the smallest BF1 bar and edge filters, but retaining responses from each isotropic filter.
Similarly, the second biologically based filter bank (BF2) was derived through principle component analysis of a subset of IHC texton histograms generated with
Framework for Quantification of Immunohistochemistry
291
BF1. Based on this analysis, the number of bar and edge type filters were reduced, while isotropic filters would retain size differences. The result of BF2 is a 10 dimensional response vector (Fig.3). The filters are L1 normalized so that outputs are of similar range. 2.2 Breast TMA Data Set Tissue microarray datasets comprised of 186 breast carcinoma cases, with two tissue cores extracted from each case. Consecutive sections 34 and 35 were stained with anti-keratin 5/8. Sections 36, 38, 39 were stained with anti-estrogen receptor antibody, Her-2/neu, and progesterone receptor, respectively. These specimens were all counter-stained with hematoxylin. Images were digitized utilizing a Trestle MedMicro (Irvine, CA) automatic whole slide scanner system using a 40x volume scan setting. The TMA image was automatically registered, parsed into separate tissue cores at 20x equivalence, and decomposed into DAB and hematoxylin maps [8]. In order to treat stained regions as texture, background pixels near DAB pixels were segmented using a stain mask derived by blurring the DAB region with an 11x11 pixel averaging filter and applying Otsu’s method[13] for thresholding. 212 cytoplasmic-stained images were obtained from tissues that stained specifically for anti-keratin 5/8 and 212 nuclear-stained images from ER and PR positive tissues. Two image patches (150x150 pixels) taken from twenty (~10%) randomly chosen images per class were dedicated to vocabulary formation, leaving 192 per class for training and classification. 50 Her-2/neu images with positive membrane staining were obtained. Similarly, 2 image patches taken from five randomly chosen images were used for vocabulary formation, and reused solely as training.
3 Results In order to demonstrate robustness of our method, 1/2 cross-validation was used throughout the experiments, wherein 50% of our images from each class were used for training and 50% were used for testing. Due to the relatively limited number of specimens from the membrane staining class (only 15-30% of breast cancer patients overexpress Her-2/neu[14]) two sets of experiments were carried out. The two-class classification experiment contains 212 nuclear-staining images and 212 cytoplasmicstaining images. The three-class classification experiments used a balanced set of nuclear-staining, cytoplasmic-staining, and membrane-staining images of 50 each. Three filter banks, MR8, BF1, and BF2, were used to derive texton libraries of a range of sizes based on DAB information. Results are reported for performance with DAB information alone and in combination with hematoxylin stained tissue information (a 16 dimensional truncated intensity histogram). All three filter banks classified the two-class case with >90% accuracy (Figure 4, left). When using the BF1 filter bank, the texton feature from DAB stain alone allowed us to classify the two class case with 94% correct classification. When tissue information was added to the experiment, the result modestly increased to about 96%.
292
B. Hall et al.
Fig. 4. Nuclear and cytoplasmic staining were classified by DAB textons with and without the addition of information from hematoxylin stained tissue (H)(left).The membrane staining class was added in the three class experiment (right). The same legend is used for both figures.
Fig. 5. Three sets of test images (top row) from the three class experiment and their closest matches(bottom row). High resolution regions (upper left inserts) are shown.
Figure 4 (right) shows the results from the three class (nuclear, cytoplasmic, and membrane staining) experiments. The results for DAB textons derived from the BF1 filter bank are similar to those of MR8 filter bank, while those from BF2 clearly provide better classification performance. Significant improvement is noted in the three class case with the addition of tissue information to DAB textons derived from the MR8 and BF1 filter banks.
Framework for Quantification of Immunohistochemistry
293
Experiments were also conducted on a range of features extracted from the hematoxylin map including 1st order features: mean, standard deviation, and median, as well as 2nd order texture features: Entropy, Angular Second Moment, Contrast, and Correlation. The combination of the first five features resulted in about 70% correct classification for the 2-class case. However, when combined with the DAB features according to Eq.1, there was no significant improvement in performance. Figure 5 shows performance results from the three-class experiment utilizing 45 DAB textons (BF2, 15 textons per class) and hematoxylin information. Please note that since all cores were stained and imaged under identical conditions, the resulting texton library carries normalized staining intensity information. As a result, our model also captures the staining intensity level (Fig. 5, right panel).
4 Discussion and Conclusion In this paper, a system for extracting biologically distinct feature sets was developed in order to classify and aid assist in characterizing immunohistochemically stained tissue specimens. The basic framework of this algorithm decomposes an IHC stained image into two clinically motivated parts: the protein of interest and its tissue context. A range of different feature sets are extracted from these images at both the expression and contextual levels and subsequently combined to improve classification performance. This approach allows the quantification of two key components for reliably interpreting immunohistochemistry, namely the protein of interest and its histological context. The texton library-based texture feature was chosen in our studies to characterize the staining pattern of the protein of interest because of its ability to adapt to complex tissue textures, which have repetitive structures at multiple scales. We introduced a set of new biologically oriented filter banks into the algorithm that proved to reliably capture the subtle morphological differences among various staining patterns and im-prove classification performance. We have also demonstrated that by incorporating the underlying tissue architecture into the formulation that performance is further enhanced. This paper presents a proof-of-concept study that demonstrates the feasibility of discriminating among IHC staining patterns based on the simultaneous consideration of texture and tissue architecture. In its present form the model and associated algorithms may be used for investigating cases where there exists discrepencies between what is expected based on standard IHC analysis and response to therapy or other measures of protein amplification. Cases where discrepencies exist can potentially lead to the withholding of targeted therapies or the unwarranted exposure to drug toxicity. The textural information underlying the protein expression pattern may be an additional factor in elucidating these clinical inconsistencies and thus provide patients with the proper treatment protocol. In the next phase of our studies we plan to investigate the usefulness of this approach in analyzing a broader range of immunostains and disease states. Acknowledgments. This research was funded, in part, by grants from the NIH through contracts 5R01LM007455-03 from the National Library of Medicine and 1R01EB003587-01A2 from the NIBIB. Additional funds and technical support was provided by The Cancer Institute of New Jersey.
294
B. Hall et al.
References 1. Hilbe, W., Gachter, A., Duba, H.C., Dirnhofer, S., Eisterer, W., Schmid, T., Mildner, A., Bodner, J., Woll, E.: Comparison of automated cellular imaging system and manual microscopy for immunohistochemically stained cryostat sections of lung cancer specimens applying, p. 53, ki-67 and p. 120. Oncol Rep 10, 15–20 (2003) 2. Camp, R.L., Chung, G.G., Rimm, D.L.: Automated subcellular localization and quantification of protein expression in tissue microarrays. Nat. Med. 8, 1323–1327 (2002) 3. Ellis, C.M., Dyson, M.J., Stephenson, T.J., Maltby, E.L.: HER2 amplification status in breast cancer: a comparison between immunohistochemical staining and fluorescence in situ hybridisation using manual and automated quantitative image analysis scoring techniques. J. Clin. Pathol. 58, 710–714 (2005) 4. Wang, S., Saboorian, M.H., Frenkel, E.P., Haley, B.B., Siddiqui, M.T., Gokaslan, S., Wians, F.H., Hynan Jr, L., Ashfaq, R.: Assessment of HER-2/neu status in breast cancer. Automated Cellular Imaging System (ACIS)-assisted quantitation of immunohistochemical assay achieves high accuracy in comparison with fluorescence in situ hybridization assay as the standard. Am. J. Clin. Pathol. 116, 495–503 (2001) 5. Chung, K.Y., Shia, J., Kemeny, N.E., et al.: Cetuximab shows activity in colorectal cancer patients with tumors that do not express the epidermal growth factor receptor by immunohistochemistry. J. Clin. Oncol. 23, 1803–1810 (2005) 6. Liang, J., Zubovitz, J., Petrocelli, T., et al.: PKB/Akt phosphorylates p. 27, impairs nuclear import of p. 27 and opposes p27-mediated G1 arrest. Nat. Med. 8, 1153–1160 (2002) 7. Rosen, D.G., Yang, G., Cai, K.Q., Bast Jr, R.C., Gershenson, D.M., Silva, E.G., Liu, J.: Subcellular localization of p. 27 kip 1 expression predicts poor prognosis in human ovarian cancer. Clin. Cancer Res. 11, 632–637 (2005) 8. Chen, W., Reiss, M., Foran, D.J.: A prototype for unsupervised analysis of tissue microarrays for cancer research and diagnostics. IEEE Trans. Inf. Technol. Biomed. 8, 89–96 (2004) 9. Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput Vision 43, 29–44 (2001) 10. Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int. J. Comput Vision 62, 61–81 (2005) 11. Haralick, R.: Texture features for image classification. IEEE Trans. Sys. Man. Cyber. SMC-3, 610–621 (1973) 12. Schmid, C.: Constructing models for content-based image retrieval. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 39–45 (2001) 13. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Sys. Man. Cyber. 9, 62–66 (1979) 14. Slamon, D.J., Godolphin, W., Jones, L.A., Holt, J.A., Wong, S.G., Keith, D.E., Levin, W.J., Stuart, S.G., Udove, J., Ullrich, A., et al.: Studies of the HER-2/neu proto-oncogene in human breast and ovarian cancer. Science 244, 707–712 (1989)
Cell Population Tracking and Lineage Construction with Spatiotemporal Context Kang Li1 , Mei Chen2 , and Takeo Kanade1 1 2
Carnegie Mellon University Intel Research Pittsburgh
Abstract. Automated visual-tracking of cell populations in vitro using phase contrast time-lapse microscopy is vital for quantitative, systematic and high-throughput measurements of cell behaviors. These measurements include the spatiotemporal quantification of migration, mitosis, apoptosis, and cell lineage. This paper presents an automated cell tracking system that can simultaneously track and analyze thousands of cells. The system performs tracking by cycling through frame-by-frame track compilation and spatiotemporal track linking, combining the power of two tracking paradigms. We applied the system to a range of cell populations including adult stem cells. The system achieved tracking accuracies in the range of 83.8%–92.5%, outperforming previous work by up to 8%.
1
Introduction
Automated tracking of cell populations in vitro in time-lapse microscopy images can provide high-throughput spatiotemporal measurements of a range of cell behaviors, including migration (translocation), mitosis (division), apoptosis (death), and lineage (parent-daughter relations). This capability is valuable to research in genomics, proteomics, stem cell biology, and tissue engineering. Traditional approaches for tracking include tracking by detection and tracking by model-evolution, each with its advantages and disadvantages. Recently, efforts were made to combine the strengths of both approaches and mitigating against their weaknesses [1]. The solution was to integrate four collaborative modules, including: 1) cell detector, which detects and labels candidate cell regions in the input image; 2) cell tracker, which propagates cell regions and identities across frames; 3) motion filter, which performs motion prediction and filtering using Kalman filter; and 4) track arbitrator, which manages the tracking task by incoporating newly-entered cells, removing departed/dead cells, establishing cell lineages, and recovering lost tracks. The system can track thousands of living cells imaged with phase-contrast microscopy in realtime [1]. However, several limitations are inherent in the aforementioned tracking system. First, all of its modules operate in a frame-by-frame manner. Hence, only very limited spatiotemporal context is considered, hindering the capability in handling complete or long term occlusions. Secondly, the track arbitrator module makes immediate, hard decisions for each frame, precluding the possibility for retrospective error detection and correction. Thirdly, the Kalman filter used for N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 295–302, 2007. c Springer-Verlag Berlin Heidelberg 2007
296
K. Li, M. Chen, and T. Kanade
motion filtering is bound to use only one dynamic model, which is problematic as the dynamics of cells vary frequently with time. We propose an improved tracking system to address the above issues. Specifically, we divide the track arbitrator into two submodules: track compiler and track linker. Track compiler operates in a frame-by-frame manner and produces intermediate tracking results called track segments. Track linker oversees the entire tracking history and establishes final cell trajectories and lineages only when enough information is available. We also adopt the interacting multiple models (IMM) filter [2], which allows multiple dynamics models in parallel, and was shown to be more biologically relevant than a Kalman filter [3]. We focus on reliable long-term tracking of cell centroid locations and lineages. Accurate segmentation of cell boundaries is a plus, but not the emphasis of this paper.
2
Methods
The proposed tracking system has five major modules (Fig. 1).
Fig. 1. System Overview
The trajectory of one cell may have multiple track segments. The system associates each cell track segment with a unique positive-integer label n. We identify each cell using the label of its first track segment. Let k = 0, . . . , K − 1 be the frame index. The cell regions in an image frame Ik (x, y) are represented using a region labeling function ψk (x, y). Wherein ψk (x, y) = n if pixel (x, y) is part of cell n, and ψk (x, y) = 0 if pixel (x, y) belongs to the background. To initialize tracking, the system generates initial cell labeling ψ0 (x, y) by running the cell detector on the first frame I0 (x, y). For each subsequent frame Ik (x, y): Step 1: The cell detector classifies image pixels into cell (C) and background (B) classes based on histograms learned off-line and updated online [1]. The output is a binary map of cell regions, denoted ζk (x, y). Each connected foreground component is considered a cell candidate in frame k. Step 2: The cell tracker propagates cell region labeling ψk−1 (x, y) from frame k − 1 to frame k, denoted ψk∗ (x, y), using a real-time level set method [4]. Step 3: The track compiler compares the output of the cell detector and cell tracker, and take one of the following actions: to create a new or daughter track
Cell Population Tracking and Lineage Construction
297
segment, to update an existing track, or to terminate a track. Meanwhile, the motion filter updates the cell motion state in frame k, and predicts its state for frame k + 1. The output includes the track segments, an updated region labeling ψk (x, y), and updated cell and background histograms. Step 4: The track linker examines all track segments up to frame k, and detects whether two or more track segments may correspond to one cell. It attempts to link track segments in the spatiotemporal image volume, and to form more complete cell trajectories. The updated cell trajectories are fed back to the track compiler for subsequent tracking in frame k + 1. The following sections elaborate on the motion filter, track compiler, and track linker. We refer readers to [1] for details on the other modules. 2.1
IMM Motion Filter
Suppose cell motion consists of a finite number of modes, each of which can be described by a linear model with additive Gaussian noise. The IMM filter [2] assumes that the transition between models is regulated by a finite state Markov chain with probability pij of switching from model i to model j in successive frames, where i, j ∈ {1, . . . , M } is the model index. We define the state vector sn,k to be a concatenation of centroid locations and mean intensities of cell n in frames k, k − 1 and k − 2, and the measurement vector zn,k to consist of the measured centroid and mean intensity. We adopted the random walk, first-order, and second-order linear extrapolation models given in [3], and estimated the process and noise covariance matrices Qi and Ri by the expectation maximization (EM) algorithm [5] utilizing three manually-tracked sequences. The filtering cycle has two recursive stages: prediction and correction. Prediction: Starting from M weights ρin,k−1 , states ˆsin,k−1 and covariances Pin,k−1 from the previous iteration, we compute the mixed initial condition: i|j ˆs0j ρn,k−1 ˆsin,k−1 , (1) n,k−1 = i
P0j n,k−1 =
T i|j 0j i ˆ ˆ s ρn,k−1 Pin,k−1 + ˆsin,k−1 − ˆs0j − s , n,k−1 n,k−1 n,k−1
(2)
i i|j
where ρk−1 = pij ρik−1 /ρjk|k−1 , and ρjk|k−1 =
i i pij ρk−1 . These are input to M ˆsjn,k|k−1 and covariance Pjn,k|k−1 .
Kalman filters to compute the state prediction The combined state and covariance predictions can be determined by: j ˆsn,k|k−1 = ρk|k−1 ˆsjn,k|k−1 , j
Pn,k|k−1 =
ρjk|k−1 Pjn,k|k−1 + (ˆsjn,k|k−1 − ˆsn,k|k−1 )(· · · )T .
(3) (4)
j
These are fed to the cell tracker to guide the level set evolution in frame k [1].
298
K. Li, M. Chen, and T. Kanade
Correction: Given the predicted states, covariances, and measurement zn,k (Fig. 1), we use the Kalman filters to obtain the updated state ˆsjn,k and covariance Pjn,k . The likelihood that model j is activated in frame k is
1 j T j −1 j j λn,k = exp − (yn,k ) (Sn,k ) yn,k / 2π det(Sjn,k ), (5) 2 j where yn,k = (zn,k − zˆn,k|k−1 ) is the innovation of Kalman filter j, and Sjn,k is the associated covariance. Then, the combined state ˆsn,k and covariance Pn,k estimates can be computed by Equations (3) and (4), with ρjn,k|k−1 replaced by ρjn,k = ρjn,k|k−1 λjn,k /( i ρin,k|k−1 λin,k ). To initialize the motion filter, the system tracks each cell without motion filtering in the first three frames that it appears, and use the concatenation of measurements in these frames as its initial state ˆsn,0 . We set the initial covariance Pin,0 of model i to be the Kronecker product of a 3×3 identity matrix and Ri .
2.2
Track Compilation
The track compiler coordinates cell detector, cell tracker and motion filter to produce track segments. We use Nk to denote the set of labels of all track segments created up to frame k. A track segment is active in frame k if it was successfully tracked in frame k − 1, otherwise it becomes inactive. Let Ω0 denote the background region, and Ωn denote the cell region with label n. An outline of the track compilation algorithm is shown in Algorithm 1. Algorithm 1. TrackCompilation 1
2 3 4
5
6
Ω0 ← {(x, y)|ψk∗ (x, y) = 0} foreach cell candidate ω ⊂ ζk do if ω ⊂ Ω0 then AddTrack (nnew , k, ω) foreach active track n ∈ Nk−1 do Ωn ← {(x, y)|ψk∗ (x, y) = n} if Ωn = ∅ then DeactivateTrack (n) else if IsDivided(Ωn ) then if IsMitotic(n, k) then foreach connected component ω ⊂ Ωn do AddDaughterTrack (ndaughter , n, k, ω) else ω ∗ ←SelectBestMatch(n, k, Ωn ) UpdateTrack (n, k, ω ∗ ) foreach connected component ω ⊂ Ωn \ ω ∗ do AddTrack (nnew , k, ω) else UpdateTrack (n, k, Ωn )
The compiler first compares the output of the cell detector and cell tracker, ζk (x, y) and ψk∗ (x, y). Each cell candidate in ζk (x, y) that does not overlap with any propagated cell region in ψk∗ (x, y) is considered a new cell. A new track segment will be initialized, and ψk∗ (x, y) will be updated accordingly.
Cell Population Tracking and Lineage Construction
299
Next, the algorithm scans through all active track segments, and deactivates track segments whose labels are not found in the propagated region labeling ψk∗ (x, y). A track segment whose corresponding propagated cell region contains only one connected component will be updated directly. If a cell region consists of more than one well-separated connected components, the track compiler will judge between two possibilities: 1) the cell divided into daughter cells; or 2) one or more of these components are from occluded cells or close-by newly-entered cells. The algorithm will either create daughter tracks or continue tracking using the component that best matches the cell trajectory, depending on whether the cell is previously detected to be mitotic. Details of several key operations are as follow. UpdateTrack(n, k, ω) updates the track segment n using the features of region ω, including centroid location, mean intensity, area, and eccentricity. We feed the centroid and mean intensity to the motion filter to obtain a filtered state of cell n in frame k. We use the last three features to classify a cell as normal, mitotic, or apoptotic, using nearest neighbor matching with Mahalanobis distance to a set of training samples obtained off-line. SelectBestMatch(n, k, Ωn ) selects component ω ∗ ∈ Ωn that best matches the dynamics of cell n, i.e., the one which maximizes the innovation likelihood given by Equation (5) among all dynamic models. IsDivided(Ωn ) returns true if region Ωn has multiple connected components, and the minimum distance between any two points in different components is greater than a preset threshold D. Otherwise, it returns false. IsMitotic(n, k) determines if cell n is mitotic during the past T frames. 2.3
Track Linking
The track linker detects potential problems among all track segments up to frame k based on two physical constraints: 1) a cell does not vanish unless it leaves the field-of-view, dies and releases into the media, or is occluded; and 2) a cell does not appear unless it enters from outside, is produced as a daughter cell, or moves out of occlusion. The linker attempts to correct these problems by linking track segments into complete cell trajectories using spatiotemporal context. Algorithm 2 outlines the track linking algorithm. Wherein, Nlost denotes the label set of track segments that ended before frame k, and Nfound is the label set of track segments whose starting point is after frame 0. Most operations in the algorithm are self-explanatory. One vital step of the algorithm is the matching between lost and appeared track segments, MatchTracks (Line 5). In MatchTracks, we first create a bipartite graph G, whose nodes correspond to the labels in Nlost and Nfound . We construct the arcs of G as follows. We create an arc nl , nf between node nl and node nf if the last centroid (x l , yl , kl ) of track nl is related to the first centroid (xf , yf , kf ) of track nf by (xl − xf )2 + (yl − yf )2 ≤ R, and |kl − kf | ≤ H/2, where H and R are userdefined parameters. Each arc nl , nf is assigned a weight wlf = λmax nl ,kf (nf ), which is the maximum innovation likelihood of track nl on the measurement of track nf in frame kf (Eq. 5). Intuitively, wlf indicates how likely track nf is a continuation of track nl based on the dynamics of nl .
300
K. Li, M. Chen, and T. Kanade
Algorithm 2. TrackLinking 1 2 3 4 5 6
Nlost , Nfound ← ∅ foreach track n ∈ Nk do if IsShort(n, k) then DeleteTrack (n) else if LostInField(n, k) then Add n to Nlost else if FoundInField(n, k) then Add n to Nfound MatchTracks(Nlost , Nfound ) foreach (nl , nf ) ∈ (Nlost , Nfound ) do if IsMatched(nl , nf ) then MergeTrack (nl , nf )
Next, we obtain a maximum-likelihood matching between tracks nl and nf . This correspond to selecting the subset of arcs in graph G with maximum total weight, subject to the constraint that no two arcs share a common node. We adopt the efficient Jonker-Volgenant (JV) algorithm [6]. Specifically, we define a square cost matrix C with dimension d = max(|Nlost |, |Nfound|), where | · | denotes the size of a set. The entry of the cost matrix C(l, f ) equals 1 − wlf if nl , nf ∈ G, or 1 otherwise. Taking C as input, the JV algorithm outputs a minimum-cost column-to-row (or vise versa) assignment. Track segments nl and nf are matched if column f is assigned to row l and C(l, f ) < 1.
3
Experiments and Results
We quantitatively analyzed the performance of our system on two image sequences (A and B) of MG-63 osteosarcoma cells used previously in [1], and two sequences (C and D) of proprietary amnion epithelial (AE) stem cells. Sequences A and B were acquired with a 12-bit Qimaging Retiga EXi Fast 1394 CCD camera mounted on a Zeiss Axiovert 135 TV microscope, at an interval of 4 minutes/frame for 10 hours. Each sequence consists of 150 frames, with 512×512 pixels/frame, and 1.9 μm/pixel at 4.9x magnification. The cells were seeded randomly on a polystyrene dish. Sequences C and D were acquired using the same protocol, aside from a frame interval of 10 minutes/frame. Each sequence spans 42.5 hours, and consists of 256 frames with 1280×1024 pixels/frame. The cell population is roughly 20005000 cells/frame, and is nearly confluent towards the end. A human operator manually tracked the cell centroids in Sequences A and B, and two randomly-selected 256×256-pixel subregions in Sequences C and D, respectively (Fig. 2). Only those cells that appear in the initial frame of each sequence and their children were tracked. A cell trajectory is valid only if it followed the same cell through all frames that the cell is visible. The operator also manually identified all mitosis events. We compared the tracking results produced by the current and the previous systems, as shown in Table 1. We visually compared the current tracking results with those produced by the previous system [1] for more than 30 sequences of AE stem cells. The new system showed superior robustness in handling long-term occlusion and against cell detection error. Fig. 3 shows an example where cell 116 is occluded by cell 47 in frame 36 and reappeared in frame 46. The new system (top row) correctly
Cell Population Tracking and Lineage Construction
301
Table 1. Tracking Accuracy Comparison
Sequences A B C D
Trajectory Validity
Division Tracking Correctness
Current Previous 74/81 (91.4%) 70/81 (86.4%) 86/93 (92.5%) 82/93 (88.2%) 78/92 (84.8%) 70/92 (76.1%) 98/117 (83.8%) 90/117 (76.9%)
Current Previous 1/1 (100%) 1/1 (100%) 0/0 (N/A) 0/0 (N/A) 45/55 (81.8%) 43/55 (78.2%) 44/52 (84.6%) 41/52 (78.8%)
Fig. 2. Tracking results of AE cells in the subregion used for quantitative validation. Left: original image. Middle: the image with cell trajectories overlaid. Red rectangles indicate cells that were detected mitotic in the past T = 10 frames. Right: lineage map for selected cells. Black squares indicate cell entrance or departure. Blue text shows division time.
Fig. 3. Tracking AE cells through occlusion. Top row: The new system correcly tracked cell 116, which was completely occluded by cell 47 and reappeared later. Bottom row: Incorrect result was produced by the previous system, where cell 116 switched with cell 47 and was lost eventually. The numbers at the top-right corner are the frame indices. The trailing curves represent cell trajectories. Different colors represent different cell lineages.
302
K. Li, M. Chen, and T. Kanade
recovered the trajectory of cell 116 after occlusion, whereas the previous system (bottom row) switched the identities of cells 47 and 116 in frame 16, detected a false mitosis in frame 36, and eventually lost cell 47 after frame 36. Another application of the tracking system that is valuable to stem cell research is to automatically reconstruct cell lineage maps. We used the system to construct the lineages for the whole population of AE cells. Fig. 2(c) shows a sample set of the lineage trees with cells undergoing multiple divisions. Our system (implemented in ISO C++) runs at an average speed of 90 frames/hour for tracking approximately 3000 cells in a 1280×1024 pixels/frame image sequence on an Intel Xeon 2.66GHz workstation.
4
Conclusion and Future Work
We developed and validated an automated system capable of tracking thousands of individual cells in dense cell populations in phase contrast microscopy image sequences. The system incorporated spatiotemporal track linking and a biologically relevant motion filter, and achieved performance boosts of up to 8% compared to its predecessor with nominal computational overhead. We plan to incorporate more effective segmentation algorithms and graphical models to cope with more complex intercellular interactions.
Acknowledgements We would like to thank Eric Miller, Dr. Phil Campbell and Dr. Lee Weiss for their help. This work was supported partially by NIH Grant R01 EB007369, and the Pennsylvania Infrastructure Technology Alliance Grant 1C76 HF 00381-01.
References 1. Li, K., Miller, E.D., Weiss, L.E., Campbell, P.G., Kanade, T.: Online tracking of migrating and proliferating cells imaged with phase-contrast microscopy. In: Proc. IEEE Conf. Comp. Vision and Patt. Recog. Workshop, p. 65. IEEE Computer Society Press, Los Alamitos (2006) 2. Blom, H.A.P.: An efficient filter for abruptly changing systems. In: Proc. 23rd IEEE Conference on Decision and Control, pp. 656–658. IEEE Computer Society Press, Los Alamitos (1984) 3. Genovesio, A., Liedl, T., Emiliani, V., Parak, W.J., Coppey-Moisan, M., OlivoMarin, J.C.: Multiple particle tracking in 3-D+t microscopy: Method and application to the tracking of endocytosed quantum dots. IEEE Transactions of Medical Imaging 15, 1062–1070 (2006) 4. Shi, Y., Karl, W.C.: Real-time tracking using level sets. In: Proc. IEEE Conf. Comp. Vision and Patt. Recog., vol. 2, pp. 34–41. IEEE Computer Society Press, Los Alamitos (2005) 5. Shumway, R., Stoffer, D.: An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis 3, 253–264 (1982) 6. Jonker, R., Volgenant, A.: A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38, 325–340 (1987)
Spatiotemporal Normalization for Longitudinal Analysis of Gray Matter Atrophy in Frontotemporal Dementia Brian Avants, Chivon Anderson, Murray Grossman, and James C. Gee Depts. of Radiology and Neurology University of Pennsylvania Philadelphia, PA 19104-6389
[email protected] Abstract. We present a unified method, based on symmetric diffeomorphisms, for studying longitudinal neurodegeneration. Our method first uses symmetric diffeomorphic normalization to find a spatiotemporal parameterization of an individual’s image time series. The second step involves mapping a representative image or set of images from the time series into an optimal template space. The template mapping is then combined with the intrasubject spatiotemporal map to enable pairwise statistical tests to be performed on a population of normalized time series images. Here, we apply this longitudinal analysis protocol to study the gray matter atrophy patterns induced by frontotemporal dementia (FTD). We sample our normalized spatiotemporal maps at baseline (time zero) and time one year to generate an annualized atrophy map (AAM) that estimates the annual effect of FTD. This spatiotemporal normalization enables us to locate neuroanatomical regions that consistently undergo significant annual gray matter atrophy across the population. We found the majority of annual atrophy to occur in the frontal and temporal lobes in our population of 20 subjects. We also found significant effects in the hippocampus, insula and cingulate gyrus. Our novel results, significant at p < 0.05 after false discovery rate correction, are represented in local template space but also assigned Talairach coordinates and Brodmann and Anatomical Automatic Labeling (AAL) labels. This paper shows the statistical power of symmetric diffeomorphic normalization for performing deformation-based studies of longitudinal atrophy.
1
Introduction
Neurodegeneration is a family of progressive disease processes affecting both the middle-aged and elderly. The economic impact of these diseases on families and society grows along with the size of the aging population. The relative number of at-risk individuals in the United States is projected to increase over the coming years. Developing biomarkers for early detection, assessment and treatment of this class of diseases is therefore of great significance. Longitudinal studies of patient change are particularly valuable because of the difficulty in interpreting isolated observations in aging individuals. The most N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 303–310, 2007. c Springer-Verlag Berlin Heidelberg 2007
304
B. Avants et al.
common method for following the demise of patients longitudinally is clinical assessment, such as monitoring performance during the repeated presentation of a cognitive task like confrontation naming or category naming fluency. Cognitive testing thus is able to use changing cognitive abilities to track disease progression. Because FTD can be difficult to monitor longitudinally with clinical measures, it is important to identify an objective method to support a clinical diagnosis. Jack [1] showed that segmentation-based, structure specific atrophy estimates derived from serial MRI are more statistically powerful than cognitive tests for tracking AD progress. In this study, we use a disease-specific optimal template and symmetric diffeomorphic normalization to examine FTD induced longitudinal atrophy at the full MRI resolution. FTD is an early-onset neurodegenerative condition with an average age of onset in the sixth decade of life. The conditions is as common as AD in individuals less than 65 years of age. The major clinical features of FTD include progressive aphasia or a disorder of social comportment and personality together with limited executive resources. Survival is typically eight years from onset. The disease is due to a disorder of tau metabolism or the accumulation of a ubiquinated protein known as TDP-43. Patterns of longitudinal atrophy are increasingly investigated in a patient specific manner (a patient is used as his/her own control). A variety of methods is used for measuring brain change over time [2]. Four quadrants defined in the brain were used to assess the global and regional atrophy rate in both FTD and Alzheimer’s Disease (AD) [3]. Fox used deformable image registration to map the annual regional atrophy caused by AD [4] and to uncover presymptomatic changes in the medial temporal lobes. Scahill used statistical parametric mapping (SPM) [5] to detail AD atrophy [6], which was found to affect the hippocampus before clinical signs of AD appeared. Similar (but not symmetric) studies may also measure longitudinal change of specific structures [7]. Voxel compression mapping found early involvement of the posterior cingulate, temporoparietal cortex, and medial temporal lobes [4]. Sowell [8] used surface-based methods to track pediatric brain development longitudinally. We focus on using our latest methodological advances to examine longitudinal FTD change in this study. The core contribution of this work is a strategy for using symmetric normalization to perform a template-based longitudinal analysis of whole brain gray matter atrophy. The integral and novel components of our method include: (1) a pure intrasubject measurement of longitudinal atrophy; (2) a spatiotemporal interpolation of the estimated annual change; and (3) leveraging the ability to compose diffeomorphic solutions to bring necessary variables into a common template space. To our knowledge, this is the first statement of a consistent, large deformation, image-based processing protocol for longitudinal studies of gray matter atrophy.
2
Methods
The goal of our processing is to use imaging data to estimate intrasubject annual change in cortical anatomy. We choose to use diffeomorphisms (differentiable
Spatiotemporal Normalization for Longitudinal Analysis
305
and invertible maps with differentiable inverse) for this study due to their ability to capture both subtle intrasubject shape changes and the large deformations required when transforming a neurodegenerative brain into a normalized template space. Our diffeomorphisms are maps that take a position x in domain Ω back to Ω. A path (or set) of such maps, parameterized by t, may be written φ : x ∈ Ω × t ∈ [0, 1] → Ω where, for each t we have a unique diffeomorphism. Furthermore, a small change in t gives a small change in the diffeomorphism as each new diffeomorphism is generated by integrating an ordinary differential equation dφ(x, t)/dt = v(φ(x, t), t). The v gives the tangent to the diffeomorphism at time t and is a smooth vector field [9]. In image registration, the v at each time is determined by the minimization and regularization of the similarity term relating two images, I and J. We use a novel symmetric formulation for diffeomorphic image registration, referred to as symmetric normalization (SyN) [10], which does not require one to choose a “fixed” and “moving” image. Rather, both images deform smoothly along the shortest length path of diffeomorphisms connecting them. This property eliminates bias towards a specific, arbitrary reference frame and also enables features from both images to guide the mapping. Furthermore, SyN naturally includes a spatiotemporal parameterization of the image to image transformation allowing one to estimate transformations “in between” the end-points. These interpolated diffeomorphic transformations come from the large deformation analogy to a (small deformation) linear estimate of atrophy. This is because they are based on the notion of a shortest path in the space of diffeomorphisms. We write the SyN energy as, 0.5 {v1 2L + v2 2L }dt ESyN (I, J) = inf inf φ1 φ2 t=0 + ω|I(φ1 (x, 0.5)) − J(φ2 (x, 0.5))|2 dΩ. Ω
with each φi the solution of: dφi (x, t)/dt = v i (φi (x, t), t) with φi (x, 0) = x,
(1)
where the value of ω weights the regularization versus the image matching term. Minimization with respect to φ1 and φ2 , upholding a constant arc length constraint, provides an intrinsically symmetric image registration solution. The full map from I to J (or similarly from J to I) is gained by composition, φ−1 2 (φ1 (x, 0.5), 0.5). SyN builds on prior work on diffeomorphic representations [9] of image populations. SyN may also be used to symmetrically generate dataset-specific optimal local templates [11], known to improve localization accuracy and statistical significance [12]. These advances eliminate the template selection process and enable statistically fair comparisons while also guaranteeing the ability to capture the finest shape differences in one’s data. We use a previously derived disease and scanner specific optimal template for the analyses in this manuscript. This template appears in figures 1 and 2 and was derived by iteratively optimizing equation 1 over a full image population, along with a parameter that controls
306
B. Avants et al.
for the template shape and appearance [11]. The template includes three tissue segmentation, a cortical labeling and segmented deep cortical structures, all of which were prepared with semi-automated tools. Our study is restricted to the gray matter cortex. Similar (not symmetric) local template construction has been done in [13,14,15]. Longitudinal Analysis via Intrasubject and Template Normalization: Our novel, symmetric diffeomorphic image registration tools permits us to deform the later time point structural MRI of each patient, referred to as the endpoint image, into the shape at the earlier time point, referred to as the baseline image, as in equation 1. We refer to the output of this normalization, for image i, as φi (x, t) where φi (x, 0) maps to the baseline image and φi (x, 1) maps to the last image in the time series. This physically based registration gives the structural change on a voxel-by-voxel basis, as shown in figure 1. The diffeomorphism is then sampled at time (with respect to the diffeomorphic parameterization) t = (1 year )/( imaging time interval in years ), where the time interval is greater than 1. In figure 1, the value for t would be 0.25 as the imaging time interval is four years. This sampling provides a model of the annualized atrophy occurring in the patient cortex. The Jacobian of this transformation gives us a measure of annual atrophy throughout the individual brain space and is adjusted for the total intracranial volume of each individual. We refer to this image as the annualized atrophy map (AAM) and represent it in the baseline i , for image i. image space as JAAM Subsequent to the estimate of longitudinal change, the annualized structural atrophy map must be warped into a template space where it may be evaluated statistically. We also achieve this goal with SyN by symmetrically and diffeomorphically mapping the baseline image into our local template space. The baseline to template mapping, for subject i, is referred to as φiT . The mapping from any image in an individual time series to the template is then uniquely determined by φi (φiT (x, 1), t). We represent the Jacobian of φiT as J i . We then have, for each individual, a spatially varying normalized representation of the relative neuroanatomical volumes at time zero, given by J i , and time one year later, given iT i = J i × JAAM (φiT ). A paired T-test may then be performed over the by JAAM iT dataset between the time zero, {J i }, and time one year sets of images, {JAAM } where i ranges from 1 to n. Our novel, composition-based, “intrasubject-first” approach eliminates inconsistencies that may occur when mapping both baseline and later time images directly to a template. Rather, we generate intrasubject structural measures by solving the much more clearly framed intrasubject correspondence problem. Consider assessing the longitudinal change by mapping directly to the template. In this case, the measure of longitudinal change is confounded by the difficulty of making the baseline to template and endpoint to template intersubject normalizations exactly consistent with each other. Our strategy prefers to generate a “pure” measure of longitudinal change from intrasubject data alone. Normalization to the template is then performed separately on the baseline images. We choose baseline images because the FTD disease process increases brain
Spatiotemporal Normalization for Longitudinal Analysis
307
abnormalities over time, including the degree of hyper/hypointensity, atrophy and lesions present in the brain. Furthermore, we represent the volume change from baseline to a later time in the baseline reference frame. Subjects and Image Acquisition. We applied our methods to 20 patients diagnosed with an FTD spectrum disorder at the Department of Neurology at the University of Pennsylvania. Initial clinical diagnosis was established by an experienced neurologist (M.G.) using published criteria. Subsequently, at least two trained reviewers of a consensus committee confirmed the presence of specific diagnostic criteria based on an independent review of the semi-structured history, mental status examination and neurological examination. All subjects’ images were acquired with a Siemens 3.0 T MRI scanner. Each study began with a rapid sagittal T1-weighted scan to determine patient position. A T1 structural acquisition was then acquired with TR (repetition time) = 1620ms, TE (echo time) = 3s, slice thickness: 1 mm, in-plane resolution: .9766mm x .9766mm and matrix 256 × 256 × 192. The same protocol was used at baseline and later time points.
3
Results and Discussion
We now present the results of our novel spatiotemporal approach to analyzing longitudinal atrophy via symmetric normalization. We evaluated the statistical significance of the difference between baseline and time one log-Jacobian images with the statistical parametric mapping software, SPM2 [5]. We used explicit masking to restrict the analysis to the gray matter cortex and used pairwise Ttests to uncover significant regions of atrophy. Regions that undergo significant annual atrophy, due to the FTD effect on gray matter, are overlaid on our rendered template in figure 2. Details of our results, including Brodmann areas and AAL labels, are in table 1. Significance is defined as a minimum cluster size of 50 and false discovery rate corrected p-value < 0.05. Note that we found no significant correlations between atrophy rate and global brain atrophy, nor between age and atrophy rate. The significantly affected areas are located primarily in the temporal and frontal lobes, preferentially on the left. Affected subcortical regions include the left hippocampus, left insula and the cingulum bilaterally. The most significant atrophic effects were found in the left and right cingulum, left insula, left temporal pole and left hippocampus. Atrophy in these regions is also present on the right, but with less significance. The average annual atrophy over significant voxels was approximated as 6.5% and over all cortical gray matter as 3.3%. Note that we also provide Brodmann and AAL labels at the most significant location in each cluster. We highlight that AAL and Brodmann labels do not necessarily agree, as there is some inconsistency in the coordinate systems used by each protocol. Many of the indicated structures are involved in the cognitive resources used to monitor patients longitudinally. Cortical loss in these structures would sensibly affect cognitive performance, such as the ability to name animals or recall
308
B. Avants et al.
Fig. 1. We parameterize a single subject’s temporal series with symmetric diffeomorphisms (top). This longitudinal transformation is the basis for our estimate of the annual atrophy map. The panel below shows the successful normalization of the baseline image for a single-subject to the template image, as well as the inverse mapping of the template to the baseline image. This mapping is used to normalize the full intrasubject image time series and the associated longitudinal transformations to the template. Normalizing the population of longitudinal transformations to the template space enables a paired T-test for significant atrophy between any two time points in the series.
the appropriate names for objects. In fact, patients with FTD are known to have difficulties with cognitive tasks like naming because of disease in the temporal lobe and executive functioning because of frontal lobe disease [16]. In particular, the left temporal lobe is compromised in semantic dementia, one presentation of FTD, and this is related to profound naming difficulty. The left insula and inferior frontal region is related to progressive non-fluent aphasia, another FTD presentation that involves effortful speech and reduced language fluency. Bilateral frontal regions such as the cingulum and orbital frontal cortex are most involved in the social and executive disorder associated with FTD [16]. The results of this study suggest that structural MRI, combined with sensitive symmetric normalization techniques, has the ability to act as a reliable biomarker for patient studies of progressive neurodegeneration. Our future work will investigate the effect of FTD on the cognitive network and the relationship of atrophy in this network to loss of cognition. We also hope to gain further insight into the relationship between brain and behavior. Finally, we are currently evaluating our SyN-based structural measurements with respect to measures of annual atrophy gained by cortical thickness.
Spatiotemporal Normalization for Longitudinal Analysis
309
Fig. 2. Average annual FTD atrophy rate greater than or equal to 5% is shown at bottom, along with the associated atrophy rate scale (brighter means greater atrophy rate). The p < 0.05 FDR corrected significant regions are shown at top, along with a brightness scale that reflects the local T-statistic value. Overall, the effect of atrophy, in this dataset, is more apparent on the left. Atrophy is most bilaterally present in the temporal lobes and cingula. Table 1. Significantly Atrophic Regions with False Discovery Rate corrected p < 0.05 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14
max T Clust Size 8.71 5171 7.28 2499 5.83 101 5.73 278 5.36 69 5.25 110 5.19 79 5.17 77 5.02 110 4.84 98 4.83 154 4.77 106 4.50 113 4.36 61
MNI Coord Brodmann AAL Label 11 -28 41 23 Cingulum Mid R -3 -25 36 23 Cingulum Mid L -29 18 3 Insula L -7 31 7 25 Cingulum Ant L -33 13 -39 20 Temporal Pole Mid L -15 -6 -16 28 Hippocampus L -55 -24 44 3 Parietal Inf L -31 51 12 10 Frontal Mid L -56 12 18 44 Frontal Inf Oper L 11 33 -16 11 Frontal Mid Orb R 25 -3 -10 34 Amygdala R 26 0 -24 34 Amygdala R -20 38 -13 11 Frontal Lat Orb L -43 45 -1 46 Frontal Mid Orb L
310
B. Avants et al.
References 1. Jack, C.R., Shiung, M.M., Gunter, J.L., O’Brien, P.C., Weigand, S.D., Knopman, D.S., Boeve, B.F., Ivnik, R.J., Smith, G.E., Cha, R.H., Tangalos, E.G., Petersen, R.C.: Comparison of different mri brain atrophy rate measures with clinical disease progression in ad. Neurology 62(4), 591–600 (2004) 2. Cardenas, V.A., et al.: Comparison of methods for measuring longitudinal brain change in cognitive impairment and dementia. Neurobiology of Aging 24(4), 537– 554 (2003) 3. Chan, D., Fox, N.C., Jenkins, R., Scahill, R.I., Crum, W.R., Rossor, M.N.: Rates of global and regional cerebral atrophy in AD and frontotemporal dementia. Neurology 57, 1756–1763 (2001) 4. Fox, N., Crum, W., Scahill, R., Stevens, J., Janssen, J., Rossor, M.: Imaging of onset and progression of alzheimer’s disease with voxel-compression mapping of serial magnetic resonance images. Lancet 358, 201–205 (2001) 5. Ashburner, J., Hutton, C., Frackowiak, R., Price, C., Johnsrude, I., Friston, K.: Identifying global anatomical differences: Deformation-based morphometry. Hum. Brain Mapp. 6, 348–357 (1998) 6. Scahill, R.I., Schott, J., Stevens, J.: Mapping the evolution of regional atrophy in alzheimer’s disease: unbiased analysis of fluid-registered serial mri. Proc. Natl. Acad. Sci. 99, 4135–4137 (2002) 7. van de Pol, L.A., Barnes, J., Scahill, R.I., Frost, C., Lewis, E.B., Boyes, R.G., van Schijndel, R.A., Scheltens, P., Fox, N.C., Barkhof, F.: Improved reliability of hippocampal atrophy rate measurement in mild cognitive impairment using fluid registration. Neuroimage 34(3), 1036–1041 (2007) 8. Sowell, E.R., Thompson, P.M., Leonard, C.M., Welcome, S.E., Kan, E., Toga, A.W.: Longitudinal Mapping of Cortical Thickness and Brain Growth in Normal Children. J. Neurosci. 24, 8223–8231 (2004) 9. Miller, M., Trouv’e, A., Younes, L.: On the metrics and Euler-Lagrange equations of computational anatomy. Annu. Rev. Biomed. Eng. 4, 375–405 (2002) 10. Avants, B., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration: Evaluating automated labeling of elderly and neurodegenerative cortex. Medical Image Analysis (in press, 2007) 11. Avants, B., Gee, J.C.: Geodesic estimation for large deformation anatomical shape and intensity averaging. Neuroimage (Suppl. 1), S139–150 (2004) 12. Senjem, M., Gunter, J.L., Shiung, M.M., Petersen, R., Jack, C.R.: Comparison of different methodological implementations of voxel-based morphometry in neurodegenerative disease. Neuroimage 26(2), 600–608 (2005) 13. Lorenzen, P., Prastawa, M., Davis, B., Gerig, G., Bullitt, E., Joshi, S.: Multi-modal image set registration and atlas formation. Medical Image Analysis 19(3), 440–451 (2006) 14. Beg, M.F., Khan, A.: Computing an average anatomical atlas using LDDMM and geodesic shooting. In: ISBI, pp. 1116–1119 (2006) 15. Twining, C.J., Cootes, T., Marsland, S., Petrovic, V., Schestowitz, R., Taylor, C.J.: A unified information-theoretic approach to groupwise non-rigid registration and model building. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 1–14. Springer, Heidelberg (2005) 16. Grossman, M., et al.: What’s in a name?: voxel-based morphometric analyses of MRI and naming difficulty in alzheimer’s disease, frontotemporal dementa and corticobasal degeneration. Brain 127(3), 628–649 (2004)
Population Based Analysis of Directional Information in Serial Deformation Tensor Morphometry Colin Studholme1,2 and Valerie Cardenas1,2 1
2
Department of Radiiology, University of California San Francisco, U.S.A
[email protected] Northern California Institute for Research and Education, VAMC San Francisco
Abstract. Deformation morphometry provides a sensitive approach to detecting and mapping subtle volume changes in the brain. Population based analyses of this data have been used successfully to detect characteristic changes in different neurodegenerative conditions. However, most studies have been limited to statistical mapping of the scalar volume change at each point in the brain, by evaluating the determinant of the Jacobian of the deformation field. In this paper we describe an approach to spatial normalisation and analysis of the full deformation tensor. The approach employs a spatial relocation and reorientation of tensors of each subject. Using the assumption of small changes, we use a linear modeling of effects of clinical variables on each deformation tensor component across a population. We illustrate the use of this approach by examining the pattern of significance and orientation of the volume change effects in recovery from alcohol abuse. Results show new local structure which was not apparent in the analysis of scalar volume changes.
1
Introduction
Repeated structural magnetic resonance imaging (MRI) of the brain [1], when combined with image analysis tools, is an increasingly useful tool in the study of neurodegenerative conditions [2,3,4,5,6,7,8,17]. In particular, non-rigid registration based methods have been developed to map subtle geometric changes in brain anatomy, and separate true volume changes from local tissue displacements [17]. This is important in both brain development and degeneration where volume change is a key physical property of interest, whereas displacements of tissue may only be a secondary surrogate marker of tissue integrity change and collapse. In this paper we are interested in studying common patterns of volume change across a population by using accurate spatial normalisation to bring individual volume change maps into a common space. Previous studies have focused on examining the determinant of the deformation tensor at each point, which provides a scalar measure summarizing change. Such scalar data can be evaluated using univariate voxelwise statistical parametric mapping [10] to examine the relationship between local atrophy rate and variables of interest (such as diagnosis) together with other confounding variables (such as age). N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 311–318, 2007. c Springer-Verlag Berlin Heidelberg 2007
312
C. Studholme and V. Cardenas
Critically, these studies cannot reveal orientation specific characteristics in the pattern of volume changes and their relationship to clinical variables. For example: whether contractions associated with a particular anatomical region in a clinical condition are predominantly anterior-posterior or medial-lateral. Such characteristics may reveal changes that are related to underlying tissue properties, and on a more basic level, they may be important from a purely signal detection view point. For example: in regions where volume changes at a given point in anatomy are only well defined in one axis and are poorly defined in other directions, the determinant of the deformation tensor may be corrupted by the noise from the poorly defined directions. This may reduce the strength of the statistical relationship with clinical variables of interest. Thus, looking at modeling specific orientation components of the deformation tensor may provide a more sensitive correlation with clinical variables of interest. In this paper we describe the basic steps used to form a multi-variate linear model of the elements of the deformation tensor of anatomical change, and their relationship to clinical variables across a population of subjects. This analysis includes the process of re-orienting each subject’s deformation change tensor into a common space and then building a statistical model of the relationship between clinical variables and the elements of the deformation tensor matrix at each voxel.
2 2.1
Method The Deformation Tensor of Anatomical Change
Given a pair of images of a subject n, using a fluid registration algorithm, we can estimate a transformation TΔn (xn ) = x + u(xn ) that captures the anatomical changes from the earlier to the later time point. The volume changes at a given location can then be characterized by the deformation tensor [11,12,13]: ⎤ ⎡ ∂x Δn ∂xΔn ∂xΔn ∂xn ∂yn ∂zn ∂TΔn (xn ) ⎢ Δn ∂yΔn ∂yΔn ⎥ (1) = ⎣ ∂y JΔn (xn = [xn , yn , zn ]T ) = ∂xn ∂yn ∂zn ⎦ , ∂xn ∂zΔn ∂zΔn ∂zΔn ∂xn
∂yn
∂zn
where the transformed coordinates at the second time point are: xΔn = [xΔn , yΔn , zΔn ]T = TΔn (xn ).
(2)
1 This Jacobian can be normalised by the scan interval Δtscan to give the rate of deformation over time in studies where the interval varies between subjects. For a population of subjects, we can also estimate a transformation TRn (xR ) which maps from a location xR in a reference anatomy to the first time point for each subject, as illustrated in figure 1. To analyze the deformation tensor matrix (1) describing the change in individual subjects in a common reference coordinate system, we need to both spatially relocate and reorient JΔn (xn ) into the reference coordinate system. Reorientation of the tensor from a locally affine
Population Based Analysis of Directional Information
Subject 1
Subject 2
Subject 3
313
Subject n
Time 2
Tn
T1 Time 1
Tr1
Reference Anatomical Space
Trn
Fig. 1. Using non-rigid registration to capture local shape differences between subjects from the transformations TRn . To examine common patterns across subjects, maps of shape measures derived from these transformations may be transformed and compared in the common anatomical space.
transformation is achieved by using information provided by the deformation tensor of the spatially normalizing transformation, TRn , denoted by JRn (xR ) =
∂TRn (xr ) ∂xr
. We can follow a similar approach to the analysis of diffusion tensor
image data [14] and apply a normalisation transformation matrix S to the subject change tensor JΔn (xn ): JΔn (xn ) = SJΔn (xn )ST
(3)
The required form of this normalisation transformation is influenced by our interests in analyzing the pointwise volume change rate across subjects. If S is a full affine transformation, then it will account for changes in the relative size and shape of this element of anatomy when mapping from reference to subject space. Thus, for a subject with a temporal lobe which is twice a big as another subject, their atrophy rate will be increased by a factor of two when mapping the change deformations into the reference space. Here we are interested only in the pointwise rate of change of a given tissue. i.e. we are investigating the equivalent rate of expansion of a tissue element at xR across different subjects. We thus use the rigid components of the local deformation given by the decomposition [15]: R = (JRn JTRn )−1/2 JRn
(4)
This locally describes the reorientation of an element of tissue from the reference coordinates to the subject coordinates, without changing its local shape or size. To bring the subject change tensor back into the coordinate system of the reference anatomy we therefore set S(xR ) = R(xR )−1 and apply equation (3).
314
C. Studholme and V. Cardenas
In terms of common reference anatomy coordinates xR , the deformation matrix JΔn for subject n, in reference coordinates is then: JΔn (xR ) = SJΔn (TRn (xR ))ST 2.2
(5)
Modeling of Differences in the Deformation Tensor Components
After spatial normalisation, we have a set of maps of deformation tensor matrices, the elements of which describe the rates of contraction or expansion of points of tissue in each of the three axes with respect to the three axes in the reference anatomy. We want to examine whether there is a relationship between one or more of these directions of volume change and variables of interest related to each subject (such as age or clinical criteria). This can be explored using a multivariate general linear model such that at a given voxel: Y(xR ) = XB(xR ) + U(xR ),
(6)
where Y(xR ) are the deformation parameters at each voxel, XR are the clinical variables associated with each subject, B(xR ) are the parameters to be estimated, determining the strength of the linear relationships, and U are the errors. Here, in general, there are n subjects, 9 deformation variables at each voxel (the elements of the 3x3 deformation tensor) and p numbers of parameters to estimate. We form matrix Y from the elements of the spatially normalized Jacobian matrix, from each subject. The right hand side of the equation is conventionally divided into the variable of interest and the p = p − 1 confounding variables such that: Y(xR ) = (n × [3 × 3])
X1 B1 (xR )+ (n × 1)(1 × [3 × 3])
X2 B2 (xR )+ (n × p )(p × [3 × 3])
U(xR ) (7) (n × [3 × 3])
Standard linear least squares methods are used to solve for B(xR ) of the full model and B2 xR of the reduced model. Statistical inference on B is obtained by computing the Wilks Λ test statistic, where Λ is the determinant of the error sum of squares and products of the full model divided by determinant of the error sum of squares and products of the reduced model [13]. Significance and p-values are based on transforming Λ to an approximate F statistics using Rao’s approximation [16]. The final estimated model B for each voxel consists of matrix for each model parameter (age, grouping and offset). Each of these matrices holds the estimate of the increase or decrease of the rate of contraction or expansion, in elements of (1) associated with a subject variable X. 2.3
Implementation and Reduction of Spatial Normalisation Variance
For this work we have used a robust fluid based non rigid registration to map changes over time in each subject dataset [17]. The derivatives of this deformation
Population Based Analysis of Directional Information
315
field were then evaluated using finite differences in the coordinate system of each subject’s first time point. We then employed a fine scale B-Spline based spatial normalisation [18] regularized to prevent folding, to estimate a mapping between a single subject reference brain and the first time point scans of each individual. This deformation field, parameterized using a 1.8mm regular B-Spline lattice, was converted to a voxel displacement field and local derivatives of this were then evaluated using finite differences. One of the key factors in the population based analysis is the spatial normalisation step. The transformation between subject anatomies for spatial normalisation can be ill defined in many regions. Critically, in regions of uniform tissue (for example in uniform white matter), the local orientation may be poorly defined. In our orientation based analysis this can introduce significant unwanted variance into the serial deformation tensor morphometry data (derived from within subject registration). We have therefore used a pre-filtering step on the deformation field, using a Gaussian kernel applied to the three directional components just prior to calculation of the derviatives used to form JRn . (we note that this does not influence the spatial location since that uses the unfiltered deformation field). In this initial work we choose a filter size experimentally to reduce orientation variance and improve the final quality of the fitting. 2.4
Application to the Study of Recovery from Alcoholism
We applied the analysis to a study of brain volume changes in alcohol abuse and recovery. The data consisted of 24 pairs of high resolution T1W MPRAGE MRI scans of a group of subjects recovering from alcohol abuse, imaged using a 1.5T MRI scanner. The subjects were imaged twice, approximately 8 months apart. The baseline study was conducted within a week of entering treatment for alcoholism. The 24 subjects were divided into 16 consistent abstainers (188 ± 66 days since last drink), and 8 relapsers (8 ± 6) days since last drink) who failed to abstain from alcohol. Collectively they had a mean age of 48 years. We analyzed the data using deformation tensor morphometry and formed a voxelwise multivariate analysis with the grouping as the variable of interest, and age as a covariate.
3
Results
Figure 2 shows a comparison of F statistic maps of the relationship between the grouping (abstainer vs relapser) to the deformation parameters: using the Jacobian determinant (scalar) and the full deformation tensor. Larger areas of improved model fitting are shown for the model containing directional information. Figure 3 shows the corresponding maps of the estimated group effect for the scalar determinant model and the direction effects. For the directional effects, differing directional patterns of volume change are revealed in the deeper white matter and sub-cortical grey matter structures.
316
C. Studholme and V. Cardenas
Fig. 2. A comparison of the voxel F statistics showing the local quality of fits for conventional analyses using the scalar Jacobian determinant (bottom row) and the individual Jacobian matrix elements (top row)
4
Discussion
We have described an approach to population based analysis of directional information in deformation tensor morphometry data from multi-subject serial MRI studies. The approach takes into account the reorientation of deformation tensors evaluated in subject coordinates and maps them into a common space for analysis. We then employ a multi-variate linear model to examine relationships between clinical variables and directional volume changes. One step in this process is to transform deformation tensors to a common coordinate system. We use approaches derived from the transformation and analysis of diffusion tensor data. An alternative for this step is to examine the transformation of the underlying deformation fields as in [19]. However we have focussed on the deformation tensor because of the underlying interest in volume changes, rather than displacements. By using directional information about the volume changes over time, we may reveal additional relationships with underlying tissue properties, and additionally provide an improved model fitting in regions of anatomy for which volume changes are poorly constrained, because of anatomical structure, in one or more axes. Preliminary results on an imaging study of brain changes in recovering alcoholics show both improved significance of model fits and the ability to reveal hidden directional characteristics in the volume changes over time. It is also not clear for which clinical applications this methodology will be most useful: it will certainly depend on the disease being studied and how it influences brain tissue. Our aim in this paper is simply to present the methodology. Further work is underway to examine how these directional patterns relate to the shape of regional brain anatomy and to any underlying tissue properties.
Population Based Analysis of Directional Information
317
Fig. 3. (a) A comparison of the effect maps for the difference between groups (abstainers vs relapsers) for the scalar volume change maps (bottom) and the directional models (top). Directional effects are shown by three effect vectors whose length indicates the relative size of the effect and the colour indicates direction of effect. Enlargements of an area are shown for one slice in (b) and (c).
Acknowledgements. This methods development work was primarily funded by grant NIH R01 NS 055064. The work would not have been possible without imaging acquired by Dr Dieter Meyerhoff (NIH R01 AA 10788) and help from the faculty and staff of the Center for Imaging of Neurodegenerative Disease at the VASFC.
References 1. Hajnal, J., Saeed, N., Oatridge, A., Williams, E., Young, I., Bydder, G.: Detection of subtle brain changes using subvoxel registration and subtraction of serial MR images. Journal of Computer Assisted Tomography 19(5), 677–691 (1995)
318
C. Studholme and V. Cardenas
2. Kikinis, R., Guttmann, C., Metcalf, D., Wells, W., Ettinger, G.J., Weiner, H., Jolesz, F.A.: Quantitative follow-up of patients with multiple sclerosis using mri: Technical aspects. Journal of Magnetic Resonance Imaging 9, 519–530 (1999) 3. Bosc, M., Heitz, F., Armspach, J., Namer, I., Gounot, D., Rumbach, L.: Automatic change detection in multimodal serial MRI: application to multiple sclerosis lesion evolution. NeuroImage 20(2), 643–656 (2003) 4. Gerig, G., Welti, D., Guttmann, C., Colchester, A., Szekely, G.: Exploring the discrimination power of the time domain for segmentation and characterization of active lesions in serial MR data. Medical Image Analysis 4(1), 31–42 (2000) 5. Meier, D., Guttmann, C.: Time-series analysis of MRI intensity patterns in multiple sclerosis. NeuroImage 20(2), 1193–1209 (2003) 6. Smith, S., Zhang, Y., Jenkinson, M., Chen, J., Matthews, P., Federico, A., Stefano, N.D.: Accurate, robust and automated longitudinal and cross-sectional brain change analysis. NeuroImage 17, 479–489 (2002) 7. Freeborough, P., Fox, N.: The boundary shift integral: An accurate and robust measure of cerebral volume changes from registered repeat MRI. IEEE Transactions on Medical Imaging 16(3) (1997) 8. Fox, N.C.P.F.: Brain atrophy progression measured from registered serial MRI: Validation and application to Alzheimer’s disease. J. Magn. Reson. Imaging 7, 1069–1075 (1997) 9. Freeborough, P., Fox, N.: Modeling brain deformations in Alzheimer’s disease by fluid registration of serial 3D MR images. Journal of Computer Assisted Tomography 22(5), 838–843 (1998) 10. Friston, K., Holmes, A., Worsley, K., Poline, J., Frith, C., Frackowiak, R.S.J.: Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping 2, 189–210 (1995) 11. Davatzikos, C., Vaillant, M., Resnick, S., Prince, J., Letovsky, S., Bryan, R.: A computerised approach for morphological analysis of the corpus callosum. Journal of Computer Assisted Tomography 20(1), 88–97 (1996) 12. Chung, M.K., Worsley, K.J., Paus, T., Cherif, C., Collins, D.L., Giedd, J.N., Rapoport, J.L., Evans, A.C.: A unified statistical approach to deformation-based morphometry. Neuroimage 14, 596–606 (2001) 13. Gaser, C., Voltz, H., Kiebel, S., Riehemann, S., Sauer, H.: Detecting structural changes in whole brain based on nonlinear deformations- application to schizophrenia research. Neuroimage 10, 107–113 (1999) 14. Alexander, D., Pierpaoli, C.P.B., Gee, J.: Spatial transformations of diffusion tensor magnetic resonance images. IEEE transactions on Medical Imaging 20(11), 1131– 1139 (2001) 15. Malvern, L.: Introduction to the Mechanics of a Continuous Medium. Prentice Hall, Englewood Cliffs, NJ (1969) 16. Rao, C.: Linear Statistical Inference and its Applications. John Wiley and Sons, Inc., New York (1973) 17. Studholme, C., Drapaca, C., Iordanova, B., Cardenas, V.: Deformation based mapping of volume change from serial brain MRI in the presence of local tissue contrast change. IEEE transactions on Medical Imaging 25(5), 626–639 (2006) 18. Studholme, C., Cardenas, V., Song, E., Ezekiel, F., Maudsley, A., Weiner, M.: Accurate template based estimation of brain MRI bias fields with application to dementia and aging. IEEE Transactions on Medical Imaging 23(1), 626–639 (2004) 19. Rao, A., Chandrashekara, R., Sanchez-Ortiz, G.I., et al.: Spatial Transformation of motion and deformation fields using nonrigid registration. IEEE Transactions on Medical Imaging 23(9), 1065–1076 (2004)
Non-parametric Diffeomorphic Image Registration with the Demons Algorithm Tom Vercauteren1,2 , Xavier Pennec1 , Aymeric Perchant2, and Nicholas Ayache1 1
Asclepios Research Group, INRIA Sophia-Antipolis, France Mauna Kea Technologies, 9 rue d’Enghien Paris, France
2
Abstract. We propose a non-parametric diffeomorphic image registration algorithm based on Thirion’s demons algorithm. The demons algorithm can be seen as an optimization procedure on the entire space of displacement fields. The main idea of our algorithm is to adapt this procedure to a space of diffeomorphic transformations. In contrast to many diffeomorphic registration algorithms, our solution is computationally efficient since in practice it only replaces an addition of free form deformations by a few compositions. Our experiments show that in addition to being diffeomorphic, our algorithm provides results that are similar to the ones from the demons algorithm but with transformations that are much smoother and closer to the true ones in terms of Jacobians.
1
Introduction
With the development of computational anatomy and in the absence of a justified physical model of inter-subject variability, statistics on diffeomorphisms becomes an important topic [1]. Diffeomorphic registration algorithms are at the core of this research field since they often provide the input data. They usually rely on the computationally heavy solution of a partial differential equation [2,3,4] or use very small optimization steps [5]. In [6], the authors proposed a parametric approach by composing a set of constrained B-spline transformations. Since the composition of B-spline transformations cannot be expressed on a B-spline basis, the advantage of using a parametric approach is not clear in this case. In this work, we propose a non-parametric diffeomorphic image registration algorithm based on the demons algorithm. It has been shown in [7,8] that the original demons algorithm could be seen as an optimization procedure on the entire space of displacement fields. We build on this point of view in Section 2. The main idea of our algorithm is to adapt this optimization procedure to a space of diffeomorphic transformations. In Section 3, we show that a Lie group structure on diffeomorphic transformations that has recently been proposed in [1] can be used in combination with some optimization tools on Lie groups to derive our diffeomorphic image registration algorithm. Our approach is evaluated in Section 4 in both a simulated and a realistic registration setup. We show that in addition to being diffeomorphic, our algorithm provides results that are similar to the ones from the demons but with transformations that are much smoother and closer to the true ones in terms of Jacobians. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 319–326, 2007. c Springer-Verlag Berlin Heidelberg 2007
320
2 2.1
T. Vercauteren et al.
Non-parametric Image Registration Image Registration Framework
Given a fixed image F (.) and a moving image M (.), non-parametric image registration is treated as an optimization problem that aims at finding the displacement of each pixel in order to get a reasonable alignment of the images. The transformation s(.), p → s(p), models the spatial mapping of points from the fixed image space to the moving image space. The similarity criterion Sim (., .) measures the resemblance of two images. In this paper we will only consider the mean squared error which forms the basis of intensity-based registration: Sim (F, M ◦ s) =
1 1 F − M ◦ s2 = |F (p) − M (s(p))|2 , 2 2 |ΩP |
(1)
p∈ΩP
where ΩP is the region of overlap between F and M ◦ s. A simple optimization of (1) over a space of dense deformation fields leads to a ill-posed problem with unstable and non-smooth solutions. In order to avoid this and possibly add some a priori knowledge, a regularization term Reg (s) is introduced to get the global energy E(s) = σi−2 Sim (F, M ◦ s) + σT−2 Reg (s), where σi accounts for the noise on the image intensity, and σT controls the amount of regularization we need. This energy indeed provides a well-posed framework but the mixing of the similarity and the regularization terms leads in general to computationally intensive optimization steps. On the other hand an efficient algorithm was proposed in [9] and has often been considered as somewhat ad hoc. The algorithm is inspired from the optical flow equations and the method alternates between computation of the forces and their regularization by a simple Gaussian smoothing. In order to cast the demons algorithm to the minimization of a well-posed criterion, it was proposed in [7] to introduce a hidden variable in the registration process: correspondences. The idea is to consider the regularization criterion as a prior on the smoothness of the transformation s. Instead of requiring that point correspondences between image pixels (a vector field c) be exact realizations of the transformation, one allows some error at each image point. Considering a Gaussian noise on displacements, we end-up with the global energy: 1 1 1 2 Sim (F, M ◦ c) + 2 dist (s, c) + 2 Reg (s) (2) 2 σi σx σT where σx accounts for a spatial uncertainty on the correspondences. We classi2 cally have dist (s, c) = c − s and Reg (s) = ∇s but the regularization can also be modified to handle fluid-like constraints [7]. E(c, s) =
2.2
Demons Algorithm as an Alternate Optimization
In order to register the fixed and moving images, we need to optimize (2) over a given space of spatial transformations. With the original demons algorithm, the optimization is performed over the entire space of displacement fields. These spatial transformations form a vector space and transformations can thus simply be added. This implies that we can use classical descent methods based on
Non-parametric Diffeomorphic Registration with the Demons Algorithm
321
additive iterations of the form s ← s+u. The interest of the auxiliary variable c is that an alternate optimization over c and s decouples the complex minimization into simple and very efficient steps: Algorithm 1 (Demons Algorithm) – Choose a starting spatial transformation (a vector field) s – Iterate until convergence: • Given s, compute a correspondence update field u by minimizing σ2 Escorr (u) = F − M ◦ (s + u)2 + σi2 u2 with respect to u x • If a fluid-like regularization is used, let u ← Kfluid u. The convolution kernel will typically be Gaussian • Let c ← s + u • If a diffusion-like regularization is used, let s ← Kdiff c (else let s ← c). The convolution kernel will also typically be Gaussian In this work, we focus on the first step of this alternate minimization and refer the reader to [7] for a detailed coverage of the regularization questions. By using classical Taylor expansions, we see that we only need to solve,at each pixel p, the following normal equations: J p T .J p + σi2 (p)σx−2 I .u(p) = − F (p) − M ◦ s(p) .J p T , where J p = −∇Tp (M ◦ s) with a standard Taylor expansion or J p = −∇Tp F with Thirion’s rule. From the Sherman-Morrison formula we get: u(p) = −
F (p) − M ◦ s(p) 2 J p
+
σi2 (p) 2 σx
J pT
(3)
We see that if we use the local estimation σi (p) = |F (p) − M ◦ c(p)| of the image noise we end up with the expression of the demons algorithm. Note that σx then controls the maximum step length: u(p) ≤ σx /2.
3
Diffeomorphic Image Registration
In this section, we show that the alternate optimization scheme we presented can be used in combination with a Lie group structure on diffeomorphic transformations to adapt the demons algorithm. 3.1
Newton Methods for Lie Groups
Like most spatial transformation spaces used in medical imaging, diffeomorphisms do not form a vector space but only a Lie group. The most straightforward way to adapt the demons algorithm to make it diffeomorphic is to optimize (2) over a space of diffeomorphisms. We thus perform an optimization procedure on a Lie group such as in [10,11]. Optimization on Lie groups can often be related to constrained optimization by using an embedding. In this work we use an alternative strategy known as geometric optimization which uses the local canonical
322
T. Vercauteren et al.
coordinates [11]. This strategy intrinsically takes care of the geometric structure of the group and allows us to use unconstrained optimization routines. Let G be a Lie group for the composition ◦. To any Lie group can be associated a Lie algebra g. G and g are related through the group exponential which is a diffeomorphism from a neighborhood of 0 in g to a neighborhood of Id in G. The exponential map can be used to get the Taylor expansion of a smooth 2 ∂ ϕ(s◦ function ϕ on G: ϕ (s ◦ exp(u)) = ϕ(s)+Jsϕ .u+O(u ), where [Jsϕ ]i = ∂u i exp(u)) u=0 . This approximation is used in [11] to adapt the classical NewtonRaphson method by using an intrinsic update step: s ← s ◦ exp(u).
(4)
One of the main advantages of this geometric optimization is that it has the same guaranteed convergence as the classical Newton methods on vector spaces. 3.2
A Lie Group Structure on Diffeomorphisms
The Newton methods for Lie groups are in theory really well fit for diffeomorphic image registration. In practice however it can only be used if a fast and tractable numerical scheme for the computation of the exponential is available. We would indeed have to use it at each iteration. Such an efficient scheme clearly relies on a good parameterization of the Lie group and the Lie algebra. In the context of image registration, it has been proposed in [4] to parameterize diffeomorphic transformations using time-varying speed vector fields. This has the advantage of fully using the group structure. However the computation of a deformation field requires the numerical integration of a time-varying ODE. In [1] the authors proposed a practical approximation of such a Lie group structure on diffeomorphisms by using stationary speed vector fields only. This has the significant advantage of yielding very fast computations of exponentials. It becomes indeed possible to use the scaling and squaring method and compute the exponential with just a few compositions. On a strictly theoretical level, many technicalities arise when dealing with infinite dimensional Lie groups and further work is necessary to evaluate the well-posedness of this algorithm. By generalizing to vector fields the equivalence that exists in the finitedimensional case between one-parameters subgroups and the exponential map, the exponential exp(u) of a smooth vector field u is defined in [1] as the flow at time one of the stationary ODE, ∂p(t)/∂t = u(p(t)). From the properties of one-parameters subgroups (t → exp(tu)), we see that for any integer K we have exp(u) = exp(K −1 u)K . This yields the following efficient algorithm for the computation of vector fields exponentials: Algorithm 2 (Fast Computation of Vector Field Exponentials) – Choose N such that 2−N u is close enough to 0, e.g. max 2−N u(p) ≤ 0.5 – Perform an explicit first order integration: v(p) ← 2−N u(p) for all pixels – Do N (not 2N !) recursive squarings of v: v ← v ◦ v
Non-parametric Diffeomorphic Registration with the Demons Algorithm
323
Fig. 1. Original image (FCM) of a normal human colonic mucosa image (image courtesy of PD. Dr. A. Meining, Klinikum rechts der Isar, Munich) and one example random warp used in our controlled experimental setup
3.3
Efficient Diffeomorphic Demons
We now have all the tools to derive our non-parametric diffeomorphic image registration algorithm. For the registration problem (2), the tools presented in Section 3.1 can be used to get the following approximation: F (p) − M ◦ s ◦ exp(u)(p) ≈ F (p) − M ◦ s(p) + J p .u(p) where J p = −∇Tp (M ◦ s) or J p = −∇Tp F with Thirion’s rule. Due to space constraints, we omit the technical details necessary to derive this approximation and refer the reader to [10] for a derivation on projective transformations. As in Section 2.2, we focus on the first step of the minimization rather than on the regularization step. In order to get a computationally tractable expression of the correspondence energy, we chose the following distance between two diffeomorphisms: dist (s, c) = Id −s−1 ◦ c. We then get dist (s, s ◦ exp(u)) = Id − exp(u) ≈ u. These approximations can be used to rewrite the correspondence energy used in the alternate optimization framework: p 2 J 1 F (p) − M ◦ s(p) corr Es (u) ≈ (5) + σi (p) .u(p) . 0 2 |ΩP | σx I p∈ΩP
We see from (5) that we get the same expression as with the classical demons but that we consider u as a speed vector field instead of a deformation field. We thus obtain our non-parametric diffeomorphic image registration algorithm: Algorithm 3 (Diffeomorphic Demons Iteration) – – – –
Compute the correspondence update field u using (3) If a fluid-like regularization is used, let u ← Kfluid u. Let c ← s ◦ exp(u), where exp(u) is computed using Algorithm 2 If a diffusion-like regularization is used, let s ← Kdiff c (else let s ← c).
324
T. Vercauteren et al. Mean Harmonic energy (100 trials)
Mean MSE (100 random trials) 0.4
Thirion Demons Diffeomorphic
10000
8000
0.35 0.3 0.25
6000
0.2 0.15
4000
0.1 2000
Thirion Demons Diffeomorphic
0.05 0
5
10
15
20
25
0
0
5
10
15
20
25
Iteration number
Iteration number Mean dist to Jac(true field) (100 trials) Mean min|Jac| and max|Jac| (100 trials)
Mean dist to true field (100 trials) Thirion Demons Diffeomorphic
2.6 2.4
5
Thirion Demons Diffeomorphic
0.7
4 3
0.65
2.2
Thirion Demons Diffeomorphic
2
2
1 0.6
0
1.8
−1
0.55
1.6
−2 1.4
−3
0.5 1.2 0
5
10
15
20
25
0
Iteration number
5
10
15
Iteration number
20
25
0
5
10
15
20
25
Iteration number
Fig. 2. Registration on 100 random experiments such as the one presented in Fig.1. Note that for similar performance in terms of MSE and distance to the true field, the diffeomorphic demons algorithm provides much smoother results and smaller distance to the true Jacobian of the transformation than the original demons algorithm. Most importantly we see that we provide diffeomorphic transformations whereas min(|Jac(s)|) goes way below zero with the original demons algorithm.
4
Registration Results
To evaluate the performance of the diffeomorphic demons algorithm with respect to the original demons algorithm, two sets of results are presented. We used the same set of parameters for all the experiments: Thirion’s rule with a maximum step length of 2 pixels was used in the demons force (3), a Gaussian fluid-like regularization with σfluid = 1 and a Gaussian diffusion-like regularization with σdiff = 1 were used. Since the emphasis is on the comparison of the various schemes and not on the final performance, no multi-resolution scheme was used. The first experiments provide a completely controlled setup. We use a fibered confocal microscopy image. For each experiment, we generate a random diffeomorphic deformation field (by passing a Markov random field through the exponential) and warp the original image. We add some noise both to the original and the warped image. We then run the registration algorithms starting with an identity spatial transformation. We can see on Fig. 2 that in terms of MSE and distance to the true field, the performance of Thirion’s demons algorithm and of the diffeomorphic demons algorithm are similar. However the harmonic energy and the minimum and maximum values of the determinant of the Jacobian of the transformation show that our algorithm provides much smoother spatial transformations. We also see that our algorithm provides better results in terms of distance to the true Jacobian of the transformation. Moreover this is accomplished with a reasonable 50% increase of computation time per iteration with respect to the computationally efficient demons algorithm.
Non-parametric Diffeomorphic Registration with the Demons Algorithm
325
Fig. 3. Registration of two synthetic T1 MR images of distinct anatomies. For visually similar results, our algorithm provides smoother diffeomorphic transformations.
Our second setup is a more realistic case study were a gold standard is still available. We use synthetic T1 MR images from two different anatomies available from BrainWeb [12]. These datasets are distributed along with a segmentation of eleven different tissue classes. We can see on Fig. 4 and Table 1, that on this dataset also, the demons algorithm and our algorithm provide very similar Table 1. Comparison (Dice similarity coefficient * 100) of the discrete segmentations obtained from the registration of the synthetic T1-weighted MR images shown in Fig. 3 CSF GM WM Initial 41.73 63.06 61.51 Thirion 63.41 78.99 79.23 Diffeo 64.37 78.94 78.43
Fat 19.30 47.74 47.22
Muscle 20.14 36.40 36.11
MSE
Skin 66.65 78.57 79.39
Skull 42.75 64.91 65.02
Thirion Demons Diffeomorphic
10000
0.25 0.2
8000
0.15
Dura 14.74 23.13 24.56
Marrow 28.19 45.05 43.92
10
0.3
9000
Fat2 6.06 14.75 14.70
min|Jac| and max|Jac|
Harmonic energy
12000 11000
Vessels 14.26 27.21 27.25
Thirion Demons Diffeomorphic
5
0
7000
0.1
Thirion Demons Diffeomorphic
6000
0.05 5000 0
5
10
15
20
Iteration number
25
30
0
−5
0
5
10
15
20
Iteration number
25
−10
30
0
5
10
15
20
25
30
Iteration number
Fig. 4. Comparison of Thirion’s demons algorithm with the diffeomorphic demons algorithm on the BrainWeb images shown in Fig. 3. For similar performance in terms of MSE, our algorithm provides much smoother transformations than the original demons algorithm. Most importantly we see that we provide diffeomorphic transformations whereas min(|Jac(s)|) goes way below zero with the original demons.
326
T. Vercauteren et al.
results in terms of visual appearance, MSE and segmentation accuracy. However we see that our algorithm does it with much better spatial transformations. We indeed get smoother deformations that are diffeomorphic.
5
Conclusion
We have proposed an efficient non-parametric diffeomorphic registration algorithm. We first showed that the demons algorithm could be seen as an optimization procedure on the entire space of displacement fields. By combining a recently developed Lie group framework on diffeomorphisms and an optimization procedure for Lie groups, we showed that the framework in which we cast the demons algorithm could be adapted to provide non-parametric free-form diffeomorphic transformations. Our experiments have shown that our algorithm provides, with respect to the demons algorithm, very similar results in terms of MSE. This is however achieved with diffeomorphic transformations that are smoother and closer to the true transformations in terms of Jacobians.
References 1. Arsigny, V., Commowick, O., Pennec, X., Ayache, N.: A Log-Euclidean framework for statistics on diffeomorphisms. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 924–931. Springer, Heidelberg (2006) 2. Miller, M.I., Joshi, S.C., Christensen, G.E.: Large deformation fluid diffeomorphisms for landmark and image matching. In: Toga, A. (ed.) Brain Warping (1998) 3. Marsland, S., Twining, C.: Constructing diffeomorphic representations for the groupwise analysis of non-rigid registrations of medical images. IEEE Trans. Med. Imag. 23(8), 1006–1020 (2004) 4. Beg, M.F., Miller, M.I., Trouv´e, A., Younes, L.: Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int’l. J. Comp. Vision 61(2) (February 2005) 5. Chefd’hotel, C., Hermosillo, G., Faugeras, O.: Flows of diffeomorphisms for multimodal image registration. In: Proc. ISBI 2002, pp. 753–756 (2002) 6. Rueckert, D., Aljabar, P., Heckemann, R.A., Hajnal, J.V., Hammers, A.: Diffeomorphic registration using B-splines. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 702–709. Springer, Heidelberg (2006) 7. Cachier, P., Bardinet, E., Dormont, D., Pennec, X., Ayache, N.: Iconic feature based nonrigid registration: The PASHA algorithm. CVIU — Special Issue on Nonrigid Registration 89(2-3), 272–298 (2003) 8. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004) 9. Thirion, J.P.: Image matching as a diffusion process: An analogy with Maxwell’s demons. Medical Image Analysis 2(3), 243–260 (1998) 10. Malis, E.: Improving vision-based control using efficient second-order minimization techniques. In: Proc. ICRA 2004 (April 2004) 11. Mahony, R., Manton, J.H.: The geometry of the Newton method on non-compact Lie-groups. J. Global Optim. 23(3), 309–327 (2002) 12. Aubert-Broche, B., Griffin, M., Pike, G.B., Evans, A.C., Collins, D.L.: Twenty new digital brain phantoms for creation of validation image data bases. IEEE Trans. Med. Imag. 25(11), 1410–1416 (2006)
Three-Dimensional Ultrasound Mosaicing Christian Wachinger1,2 , Wolfgang Wein1,2 , and Nassir Navab1 1
Computer Aided Medical Procedures (CAMP), TUM, Munich, Germany {wachinge, wein, navab}@cs.tum.edu 2 Siemens Corporate Research, Princeton, NJ, USA
Abstract. The creation of 2D ultrasound mosaics is becoming a common clinical practice with a high clinical value. The next step coming along with the increasing availability of 2D array transducers is the creation of 3D mosaics. In the literature of ultrasound registration, the alignment of multiple images has not yet been addressed. Therefore, we propose registration strategies, which are able to cope with problems arising by multiple image alignment. Among others, we use simultaneous registration which urges the usage of multivariate similarity measures. In this paper, we propose alternative multivariate extensions based on a maximum likelihood framework. Experimental results show the good performance of the proposed registration strategies and similarity measures.
1
Introduction
At the moment, a paradigm shift takes place in ultrasound (US) imaging, moving from 2D to 3D image acquisition. The next generation of 2D array US transducers with CMUT1 technology could accelerate this shift by offering superior and efficient volumetric imaging at a lower cost. From a current perspective, the only drawbacks that remain are the limited field-of-view (FOV) of the acquired images and the reflectance of the beam from structures with high acoustical impedance causing occlusion. The idea of mosaicing is to address these issues by combining the information of several images taken from different poses. The focus can rest on quality improvement by imaging the same scene from different directions, or the extension of the FOV by stitching together consecutively taken images. Whatever we are interested in, the first step is to calculate the correct global alignment for which we propose solutions in this report. The rigid intensity-based registration that we use for the alignment is not trivial to compute because of the limited amount of overlap between the images. For mosaicing the registration scenario changes since the perfect alignment does not correspond to a maximal overlap, like it is in most cases, putting a special interest on the overlap invariance of the measures. An additional difficulty lies in the interface enhancing nature of ultrasound images, making acquisitions of the same object but from varying viewing angles not necessarily look the same. Feature-based registration methods like in [1] were discarded due to the problems of automatic salient feature point identification. 1
Capacitive Micromachined Ultrasound Transducer.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 327–335, 2007. Springer-Verlag Berlin Heidelberg 2007
328
1.1
C. Wachinger, W. Wein, and N. Navab
Clinical Value of Ultrasound Mosaicing
The usage of ultrasound mosaicing provides the sonographers not just with a compounded volume of higher quality; recent studies also state a couple of other clinical advantages that come along with the extended FOV. First, the spatial relationship among structures that are too large for a single volume is easier to understand [2]. Second, sonographers have the flexibility to visualize anatomical structures from a variety of different angles [3]. Third, size and distance measurements of large organs are possible [2]. Fourth, individual structures within a broader context can be identified by having an image of the whole examination area [4]. And last, because of the increased features in the compounded view, specialists that are used to other modalities than ultrasound can better understand the spatial relationships of anatomical structures [5]; helping to bridge the gap between the modalities and making it easier to convey sonographic findings to other experts. But it is not just the improvement of already existing workflows, the creation of high quality mosaics may also create new medical applications for ultrasound that do not yet exist at all or are reserved for other modalities. 1.2
Problems Statement
In the literature of ultrasound mosaicing, the global alignment of multiple images is deduced from a sequence of pairwise ones. Gee et al. [6] reduce the 3D-3D registration problem to a 2D-2D one by registering the dividing planes to each other. Poon et al. [7] use a block-based rigid and block-based warping approach for the registration. The disadvantages that come along with the usage of pairwise registrations for ultrasound mosaicing are twofold. First, by stitching together pairwise aligned images, registration errors can be accumulated leading to a non-consistent global alignment, see figure 1. Second, during the pairwise registrations only a fraction of the available information is taken into account making it prone to misregistrations. The registration is further complicated by the viewing angle dependent US images and the high demands on the overlap invariance by mosaicing.
2
Mosaicing Strategies
In this section, we present registration strategies that directly address the problems that arise during the mosaic creation as mentioned in section 1.2. We denote the n images by U = {u1 , . . . , un } with the global transformations T = {T1 , . . . , Tn }, and the pairwise transformation Ti,j between each overlapping image pair ui and uj . 2.1
Pairwise Registration with Lie Normalization
The first strategy is based on pairwise registrations and uses a consecutive normalization to reduce the accumulated error. Supposing that we would have all
Three-Dimensional Ultrasound Mosaicing
329
correct global transformations, we could express the pairwise registration error εi,j as εi,j = Ti−1 · Tj · Ti,j . In practice, the opposite holds since we know the pairwise registrations Ti,j and use them to estimate the global transformations Ti . The best estimation of the global alignment is reached when the overall error is minimized. The minimization is not trivial because rigid transformations do not belong to a vector space but rather lie on a non-linear manifold forming a Lie group [8]. We use the Lie group based normalization framework, as it was proposed by [9] for the alignment of 2D optical images, to align the 3D ultrasound images. An error function με is introduced to assign each transformation εi,j a distance value serving as score for the optimization. Assuming εi,j being a sample of the random error ε with Fr´echet matrix mean identity and covariance matrix Σεε , the Mahalanobis distance that we use as error function is −1 μ2ε (εi,j ) = logId (εi,j )T · Σεε · logId (εi,j ).
The global pose estimation is expressed by the following least-squares criterion [Tˆ1 , . . . , Tˆn ] = arg
1 [T1 ,...,Tn ] 2
ωi,j · μ2ε (εi,j ).
min
(i,j)
with the quality weights ωi,j . These weights model the quality of each pairwise registration. Since we are interested in an automated registration we use the amount of overlap as an indicator of the registration quality. The final algorithm using the Lie normalization is stated in the following listing. The registration is accepted if the total error εt = (i,j) ωi,j · μ2ε (εi,j ) is below a scenario dependent threshold δ.
1. Start with initial global transformations T = {T1 , . . . , Tn } 2. Do 2.1 Deduce initial pairwise transformations Ti,j from T 2.2 Compute all pairwise registrations Ti,j 2.3 Estimate new T from calculated Ti,j with Lie normalization 3. While (εt > δ) 4. Return T
2.2
Simultaneous Registration
The second strategy is based on simultaneous registration which is an active field of research and has so far mainly been used for population studies [10] in medical imaging. The principle of simultaneous registration is to consider all available images at the same time during the registration process. The registration framework has to be extended to deal with multivariate similarity measures and the simultaneous optimization of n · 6 parameters. Up to now, only a limited number
330
C. Wachinger, W. Wein, and N. Navab
Table 1. Summary of bi- and multivariate similarity measures in shortened notation Pairwise
Semi-Simultaneous Full-Simultaneous
Voxel-Wise
n
ω1,i E[(u1 - u↓i )2 ]
SSD E[(u - v ↓ )2 ] i=2 n
NCC CR
ω1,i E[˜ u1 · i=2 n
Var[E(u|v )] Var(u)
u ˜↓i ]
ω1,i
i=2
·
ωi,j i<j
ωk E[˜ u↓1 · u ˜↓2 · · · u ˜↓n ] xk ∈Ω
↓
ω1,i MI(u1 , u↓i )
MI(u,v ↓ )
u ˜↓j ]
i<j
i=2 n
MI
ωi,j E[˜ u↓i
↓ Var[E(u1 |ui )] Var(u1 )
ωk Ei [(μk - u↓i (xk ))2 ] xk ∈Ω
i<j
E[˜ u · v˜↓ ] ↓
ωi,j E[(u↓i - u↓j )2
↓
Var[E(ui |uj )]
-
↓
Var(ui )
ωi,j MI(u↓i , u↓j ) i<j
ωk H(P k ) xk ∈Ω
of multivariate extensions for popular measures have been proposed, which we discuss together with our own extensions in section 3. The reason for choosing a simultaneous registration approach is twofold, like the problems occurring during registration. First, the accumulated registration error that was treated in a separated normalization step by the above mentioned registration approach, is now handled intrinsically during the registration. Second, the multivariate similarity measures create more robust cost functions for the optimizer to run on because each image is put into its global context trying to get the maximum out of the depicted structures. For our mosaicing framework we use two variants of the simultaneous approach that we refer to as full-simultaneous and semi-simultaneous registration, both using multivariate similarity measures but differing in their optimization strategy. While for the full-simultaneous registration the optimization is performed in the n · 6 dimensional parameter space, the semi-simultaneous registration focuses on the optimization of the 6 pose parameters of one image at a time. During one cycle each image is registered for a limited number of registration steps. Several of these cycles yield a stepwise simultaneous convergence to the best global alignment. The reason for working with two versions lies in the increased computational complexity of simultaneous methods, which is a logical consequence of the higher dimensional parameter space and multivariate similarity metrics. The semi-simultaneous approach has lower complexity because of the reduced parameter space and the measures, which need only to be evaluated within the grid of the currently optimized image, in contrast to the whole compounding volume for the full-simultaneous one. A complete drift of the scene is avoided by normalizing the transformations so that one of them be the identity.
3
Multivariate Similarity Measures
Multivariate similarity measures have not yet been used for the registration of multiple ultrasound images in spite of their already mentioned advantages. We
Three-Dimensional Ultrasound Mosaicing
331
focus our analysis on four popular measures, whose applications are not limited to ultrasound registration: sum of squared differences (SSD), normalized cross-correlation (NCC), mutual information (MI), and correlation-ratio (CR). A maximum-likelihood estimation (MLE) framework is commonly used to mathematically model the registration process. For the bivariate case the imaging process is described by u(x) = f (v(T (x))) + μ, with the images u and v, the transformation T , the stationary white Gaussian noise μ and the intensity mapping f . The negative log-likelihood function is − log L(T, μ, f ) = − log P (u|v, T, μ, f ) = − log P (μ = u(x) − f (v(T (x)))) (1) with P the probability density function (PDF). In the work of Viola [11] and Roche et al. [12] the deduction of the four measures based on this equation is shown by varying the assumptions for the intensity mapping. We are extending this approach to multiple images under the assumption of conditional independent images. The extended MLE denoting the transformed images u↓i = ui (Ti (.)) is − log L(T , μ, f ) = − log P (u↓1 |u↓2 , . . . , u↓n , μ, f ) = − log P (μ2 = u↓1 − f2 (u↓2 ), . . . , μn = u↓1 − fn (u↓n )) n
log P (μi = u↓1 − fi (u↓i ))
=− i=2
with intensity mappings f = (f2 , . . . , fn ) and Gaussian noises μ = (μ2 , . . . , μn ). Each summand corresponds to the bivariate formula in equation 1 and the deduction of the four similarity measures can therefore be done analogously as in [11,12]. This shows that we directly obtain multivariate extensions of that form by summing up the bivariate measures. In this type of extension we pick one reference image, in the formulae above u1 , which suits very well for the semisimultaneous registration approach. Setting up a similarity matrix M with the entries Mi,j = SM (ui , uj ), this corresponds to summing up its first row. An adaptation of this approach to the full-simultaneous registration is obtained by summing up the whole similarity matrix, which can often be limited to the upper triangular part because of the symmetry of the measures. Additionally, the pairs are weighted by an overlap dependent factor ωi,j emphasizing pairs with high overlap. The final criteria are shown in figure 1. A second type of extension, the voxel-wise one, that we are using is based on the idea of congealing [10] and puts the focus on a voxel location at a time. In the MLE framework, it is integrated by estimating PDFs for each voxel under the assumption of independent but not identical distributed coordinate samples
− log L(T ) = − log P (u↓1 , u↓2 , . . . , u↓n ) 1 =− log P k (u↓1 (xk ), . . . , u↓n (xk )) |Ω| x ∈Ω k
≈−
1 |Ω|
n
log xk ∈Ω
i=1
P k (u↓i (xk ))
(2)
332
C. Wachinger, W. Wein, and N. Navab
(a) Pairwise
(b) Lie Norm
(c) Semi-Sim
(d) Full-Sim
(e) Setup
Fig. 1. Error sum up by pairwise reg. Simultaneous reg intrinsically deals with it
with the grid Ω. By further assuming a Gaussian distribution of values at each location with mean μk and variance σk2 the negative log-likelihood function is 1 − log L(T ) = − |Ω| x 1 ≈ |Ω|
n
log k ∈Ω
xk ∈Ω
i=1
1 σk2
n
− 12 1 √ e 2πσ
↓ (u (xk )−μk )2 i σ2 k
(u↓i (xk ) − μk )2 .
(3)
i=1
We consider this criterion as a voxel-wise extension of SSD because similar assumptions as for its pairwise deduction in [11] were used. We also use a voxel-wise criterion for NCC that, in our opinion, captures the basic idea of it by multiplying the values at each voxel location of the normalized images u˜i . We added the congealing criterion [10] as an extension of MI to the figure 1, because they are both based on the estimation of entropy H, although they have different properties. For all, we added the weighting factor ωk emphasizing locations with a higher number of overlapping images. The usual extensions based on higherdimensional PDFs are not applicable to mosaicing because they are not flexible enough to allow for varying numbers of overlapping images.
4
Results
We tested the mosaicing strategies and multivariate similarity measures on two data sets. First, 3D images of a heart clay model in the water bath were acquired from six different angles. The imaging setup is shown in fig. 1(e). We use a cutting plane through the reconstruction volume to visualize the registration error. When using pairwise registration the summed up error leads to a large displacement between the first and sixth volume, fig. 1(a). The pairwise registration with a successive Lie normalization corrects this error, but the alignment is not perfect, fig. 1(b). The semi-simultaneous registration provides good results, fig. 1(c), but superior results are obtained with the full-simultaneous registration, fig. 1(d). The second data set consists of four sequentially taken acquisitions from a baby phantom, see fig. 3(d) for the compounded result. For this data set we plot all the proposed similarity measures by moving the second image along the cranio-caudal axis, see fig. 2. One clearly sees the high overlap dependence of the bivariate measures, being a source for misregistrations (total overlap at -37mm
Three-Dimensional Ultrasound Mosaicing
333
Fig. 2. Similarity plots of the measures in figure 1 on the baby phantom
displacement). The multivariate measures provide a smooth cost function with a clear maximum at the correct position 0. We also ran a registration study, with an initial random deviation of maximal ±20 mm in translation and ±20 in rotation from the correct pose. The mean and standard deviation of each pose parameter of the three moving images after the registration are shown in fig. 3. The pairwise registration leads to a misalignment because of the total overlap of the images 2 and 3, indicated in fig. 3(a) by a mean of -34.9 mm of parameter 7. The distribution of the mean values around 0 after the simultaneous registration, together with low variances, indicates good registration results, see fig. 3(b) and 3(c). For a demonstration of the performance of the simultaneous registration on the baby phantom, see the video material.
5
Conclusion
We have described three registration strategies for ultrasound mosaicing which are put into relationship to the standard pairwise sequential one. Our experiments clearly show that these advanced strategies are necessary to address the problems that can occur during ultrasound mosaicing. The best registration result was obtained with the full-simultaneous approach but this comes with a high computational cost. Moreover, we set up a MLE framework to deduce extensions of popular similarity measures. This allows us to derive a new class of multivariate measures by summing up the pairwise ones and also to deduce a voxel-wise
334
C. Wachinger, W. Wein, and N. Navab
(a) Pairwise registration
(c) Voxel-wise registration
(b) Full-Simultaneous registration
(d) 3D Baby Phantom
Fig. 3. Mean and standard deviation of pose parameters after 100 registrations
extension of SSD. Our results show the good performance of these measures in contrast to the bivariate ones. Seamless extension to affine and deformable transformation models is possible especially using the proposed voxel-wise SSD.
References 1. Brown, M., Szeliski, R., Winder, S.: Multi-image matching using multi-scale oriented patches. Computer Vision and Pattern Recognition (CVPR) 1 (2005) 2. Kim, S.H., Choi, B.I., Kim, K.W., Lee, K.H., Han, J.K.: Extended FOV Sonography: Advantages in Abdominal Appl. J. Ultrasound Med. 22(4), 385–394 (2003) 3. Peetrons, P.: Ultrasound of muscles. European Radiology 12(1), 35–43 (2002) 4. Dietrich, C., Ignee, A., Gebel, M., Braden, B., Schuessler, G.: Imaging of the abdomen. Z. Gastroenterol. 40, 965–970 (2002) 5. Henrich, W., Schmider, A., Kjos, S., Tutschek, B., Dudenhausen, J.W.: Advantages of and applications for extended field-of-view ultrasound in obstetrics. Archives of Gynecology and Obstetrics V268, 121–127 (2003) 6. Gee, A.H., Treece, G.M., Prager, R.W., Cash, C.J.C., Berman, L.H.: Rapid registration for wide field-of-view freehand 3d ultrasound. IEEE Trans. Med. Imaging 22(11), 1344–1357 (2003) 7. Poon, T., Rohling, R.: Three-dimensional extended field-of-view ultrasound. Ultrasound in Medicine and Biology 32(3), 357–369 (2005) 8. Pennec, X.: Statistical Computing on Manifolds for Computational Anatomy. Habilitation ` a diriger des recherches, Universit´e Nice Sophia-Antipolis (December 2006)
Three-Dimensional Ultrasound Mosaicing
335
9. Vercauteren, T., Perchant, A., Malandain, G., Pennec, X., Ayache, N.: Robust mosaicing with correction of motion distortions and tissue deformation for in vivo fibered microscopy. Medical Image Analysis 10(5), 673–692 (2006) 10. Zoellei, L., Learned-Miller, E., Grimson, E., Wells III, W.M.: Efficient population registration of 3d data. In: ICCV (2005) 11. Viola, P.A.: Alignment by Maximization of Mutual Information. Ph.d. thesis, Massachusetts Institute of Technology (1995) 12. Roche, A., Malandain, G., Ayache, N.: Unifying maximum likelihood approaches in medical image registration. Int J. of Imaging Syst. and Techn. 11(1), 71–80 (2000)
Automated Extraction of Lymph Nodes from 3-D Abdominal CT Images Using 3-D Minimum Directional Difference Filter Takayuki Kitasaka1 , Yukihiro Tsujimura1 , Yoshihiko Nakamura1 , Kensaku Mori1 , Yasuhito Suenaga1 , Masaaki Ito2 , and Shigeru Nawano2 1
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8603, Japan {kitasaka, kensaku, suenaga}@is.nagoya-u.ac.jp 2 National Cancer Center Hospital East Kashiwanoha 6-5-1, Kashiwa, Chiba, 277-8577, Japan
Abstract. This paper presents a method for extracting lymph node regions from 3-D abdominal CT images using 3-D minimum directional difference filter. In the case of surgery of colonic cancer, resection of metastasis lesions is performed with resection of a primary lesion. Lymph nodes are main route of metastasis and are quite important for deciding resection area. Diagnosis of enlarged lymph nodes is quite important process for surgical planning. However, manual detection of enlarged lymph nodes on CT images is quite burden task. Thus, development of lymph node detection process is very helpful for assisting such surgical planning task. Although there are several report that present lymph node detection, these methods detect lymph nodes primary from PET images or detect in 2-D image processing way. There is no method that detects lymph nodes directly from 3-D images. The purpose of this paper is to show an automated method for detecting lymph nodes from 3-D abdominal CT images. This method employs a 3-D minimum directional difference filter for enhancing blob structures with suppressing line structures. After that, false positive regions caused by residua and vein are eliminated using several kinds of information such as size, blood vessels, air in the colon. We applied the proposed method to three cases of 3-D abdominal CT images. The experimental results showed that the proposed method could detect 57.0 % of enlarged lymph nodes with 58 FPs per case.
1
Introduction
Recent progress of medical imaging devices including multi-detector CT scanners has enabled us to take very precise volumetric images of a human in very short time [1,2]. Development of computer-aided diagnosis (CAD) system or computer-assisted surgery (CAS) system, which assists diagnostic process or surgical planning procedure using medical images, is strongly desired. For example, in the case of surgery of colonic cancer, resection of metastasis lesions is N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 336–343, 2007. c Springer-Verlag Berlin Heidelberg 2007
Automated Extraction of Lymph Nodes from 3-D Abdominal CT Images
337
performed with resection of a primary lesion. Lymph nodes are main route of metastasis and are quite important for deciding resection area. Therefore, diagnosis of enlarged lymph nodes is quite important process for surgical planning. However, manual detection of enlarged lymph nodes on CT images is significantly burden task. Thus, development of lymph node detection process is very helpful for assisting such surgical planning task. Although there are several reports that present lymph node detection [3,4], these methods detect lymph nodes based on SUV [7] primary from PET images or detect in 2-D image processing way. There is no method that detects lymph nodes directly from 3-D images. Thus, the purpose of this paper is to develop an automated method for detecting lymph nodes from only 3-D abdominal CT images. We employ a 3-D directional difference filter [5] that enhances blob structures and suppresses line structures. The proposed method can detect enlarged lymph nodes only from CT images. This is a big advantage of the proposed method.
2 2.1
Method Overview
Inputs of the proposed methods are contrasted 3-D abdominal CT images. Normal lymph nodes are quite small. It is quite difficult to find these normal lymph nodes on 3-D CT images. On the other hand, metastasis lymph nodes are enlarged and show spherical shape on 3-D CT images. In the clinical filed, it is important to detect lymph nodes whose diameters are 5mm or higher. So we set the size of a target lymph node as 5mm or higher in diameter. To enhance blob structure regions, we utilize a 3-D minimum directional difference filter called 3-D Min-DD filter and an extended 3-D Min-DD filter. The 3-D Min DD and the extended 3-D Min-DD filter are kinds of second-order difference filter. Both filters output minimum values of second-order differences with rotating their difference directions. The 3-D Min-DD filter can enhance blob structure regions with suppressing line structure regions. The extended 3-D MinDD filter can enhance blob structure regions with suppressing curve structure regions. Ability of the extended 3-D Min-DD filter in enhancement of blob structure regions is higher that that of the 3-D Min-DD filter. Since computation time of the extended 3-D Min DD filter is much longer than that of the 3-D Min-DD filter. We apply the extended 3-D Min DD filter only to regions enhanced by the 3-D Min-DD filter. False-positive (FP) reduction process is performed after enhancement by the 3-D Min-DD filter and the extended 3-D Min-DD filter. The proposed method consist of nine steps: (a) pre-processing, (b) extraction of blob structure, (c) region growing, (d) FP reduction by size, (e) integration of results by detected by filters of different filter sizes, (f) FP reduction using blood vessel information, (g) FP reduction using air region information, (h) removal of small connected components, and (i) FP reductions by degree of sphere.
338
2.2
T. Kitasaka et al.
Detail Procedures for Lymph Node Detection
(a) Pre-processing We apply the median filter to an input 3-D abdominal CT image to reduce speckle noise. Then we perform smoothing filtering and obtain the smoothed image F. To eliminate areas of lymph node detection, we extract air regions and bone regions by simple thresholding process. Voxels inside there areas are excluded from the target voxels of lymph node extraction. (b) Extraction of blob structure region We apply the 3-D Min-DD filter to all target voxels p ∈ F. The 3-D Min-DD filter calculates the directional difference value at r(θ1 , θ2 ) and −r(θ1 , θ2 ) apart from p on a sphere with 2r in radius as follows. g(p) = min 2f (p) − {f (p + r(θ1 , θ2 )) + f (p − r(θ1 , θ2 ))}, θ1 ,θ2
(1)
where 0 ≤ θ1 ≤ 2π and 0 ≤ θ2 ≤ 2π. The length of r controls the size of a target region to be detected. For voxels at which the output of the 3-D Min-DD filter is higher than a given threshold value Ta , we apply the extended 3-D Min-DD filter, h(p) =
min
θ1 ,θ2 ,φ1 ,φ2
2f (p) − {f (p + r(θ1 , θ2 )) + f (p − r(θ1 + φ1 , θ2 + φ2 ))}, (2)
where φ1 and φ2 are parameters for bending the direction of difference, and are set as −α1 ≤ φ1 ≤ α1 and −α2 ≤ φ2 ≤ α2 , respectively. Then, we binarize the output images with the threshold value Ta and obtain the binary image B. (c) Region growing We perform a region growing process starting from voxels p extracted in (b) to cover whole lymph nodes, since detected regions in (b) are just centers of lymph nodes. The growing condition at a voxel x is f (p) − Tb <= f (x) <= f (p) + Tb , where f (x) denotes a CT value at the voxel x and Tb is a parameter that controls the range of CT values of voxels which are merged into a region. If this condition is satisfied at the voxel x, x is merged into the region. The structural element of this region growing is a point (one voxel). (d) FP reduction by size The extended 3-D Min-DD filter does not suppress curve structure regions (i.e. blood vessel regions) whose bending angles are sharper than the maximum bending angle of the extended 3-D Min-DD filter (controlled by the parameter α1 and α2 .) In such regions, resulting regions of the region growing process will become large. So we remove a region whose size is larger than a cube whose edge length is 2r × 1.5 [mm] where r is the radius of the 3-D Min-DD or the extended 3-D Min-DD filters.
Automated Extraction of Lymph Nodes from 3-D Abdominal CT Images
339
Table 1. Acquisition parameters of the CT images used in the experiments Scanner GE Discovery LS 512 × 512 Num. of pixels 401 - 451 Num. of slices 0.586 Pixel size ( mm ) 1.25 Thickness ( mm ) 1.00 Recon. pitch ( mm ) 140 Volt (kV) 330 Tube current (mAs)
(e) Integration of results We perform Steps 2.3 to 2.5 by filters of different sizes. Then, extraction results are integrated. Regions detected by these processes are lymph node candidates. (f ) FP reduction using blood vessel information We remove FP regions by using blood vessel information. First, we enhance linelike structures by utilizing eigenvalues of a Hessian matrix, and extract blood vessel regions by region growing to both original and enhanced intensities [6]. Then, lymph node candidates located inside these blood vessel regions are removed as FP regions. (g) FP reduction using air region information FP regions occur around folds or stools of the colon. We use information of these regions for FP reduction. First, we extract regions whose CT values range from -900 H.U. to -200 H.U.. Then, the closing operation of mathematical morphology is applied to the extracted regions. The size of a structural element of the closing operation is 5mm in diameter. Lymph node candidates located inside the extracted regions are removed as FPs. (h) Removal of small connected components Small connected components (less than Tc voxels) are also removed as FP regions. (i) FP reduction by degree of sphere Regions not showing spherical shape are removed in this step. The measure of degree of sphere (DoS) is used to determine its sphericity.
3
Experiments and Discussion
We have applied the proposed method to five cases of 3-D abdominal CT images. Acquisition parameters of CT images are: image size 512 × 512 × 401 − 407, voxel size 0.586 × 0.586 × 1.000 [mm], slice thickness 1.25 [mm], X-ray tube voltage 140 [kV], and tube current 330 [mAs]. We used the 3-D Min-DD filter and the extended 3-D Min-DD filter of 4.0 [mm], 5.0 [mm], and 7.5 [mm] in radius. The
340
T. Kitasaka et al. Table 2. TP Rate and Number of FPs
Case No. 000 001 002 003 004 Total
Num. of extracted lymph nodes TP Rate (%) Num. of FPs / Num. of lymph nodes specified by radiologists 22/30 73.3 51 44/58 75.9 103 15/22 68.2 56 20/50 40.0 23 25/61 41.0 57 126/221 57.0 290 (58/case)
number of directions of the 3-D Min-DD filter and the extended 3-D Min-DD filter are set as 81 and 6561, respectively. The maximum bending angles α1 and α2 of the extended 3-D Min-DD filter are set to 90 [degree]. Threshold value Ta to outputs of the 3-D Min-DD filter and the extended 3-D Min-DD filter is 40 [H.U.]. The parameter Tb used in the region growing process is set to 45 [H.U.]. The parameters Tc and Td are 50 [voxel] and 8, respectively. Examples of extracted lymph nodes are shown in Fig. 1. As shown in Fig. 1 (a-d), it is clear that enlarged lymph nodes are extracted successfully. The 3-D views of extracted lymph nodes are shown in Fig. 2. The extraction results are validated by collaborating radiologists. True lymph nodes are specified on CT images. The detection performance is shown in Table 2. The average TP rate was 57.0 % and the average number of FP regions was 58 per case. We conducted ROC analysis of proposed procedures and the results are shown in Fig. 3. After extracting blob structures, there were approx. 17,000 FPs/case while TP rate was 0.89 (indicated the point by arrow in Fig. 3(a)). After region growing, the number of FPs/case was decreased to approx. 600 with 0.66 TP rate (Fig. 3(b)). After removal of small connected components, it was decreased to approx. 250 with 0.63 TP rate (Fig. 3(c)). Finally, after reduction by DoS, it was decreased to 58 with 0.57 TP rate (Fig. 3(d)). The FP reduction process worked effectively using information of size, blood vessel, air, and degree of sphere. The proposed method can detect lymph nodes only from CT images. This is main advantage of the proposed method against other methods. Also, by employing 3-D visualization technique, it is easy to understand the locations of lymph nodes. This 3-D view is quite useful for surgical planning determining resection areas. Most of FP regions existed on the vein, especially they were found at branching points. This is because we extracted only the artery in the extraction process of blood vessels. FP reduction process using blood vessel information did not work proper for such FPs. Also, CT values of vein regions are quite similar to those of lymph nodes. FPs were detected at branching points of the vein. To reduce such FPs, development of an extraction method of the vein will be one of solutions. Classification based on AdaBoost technique [8] may be another solution.
Automated Extraction of Lymph Nodes from 3-D Abdominal CT Images
341
(a)
(b)
(c)
(d)
(e) Fig. 1. Results of lymph node detection (arrow: ground truth, red: detected area). Successive slices are shown from left to right for each example. Many enlarged lymph nodes are detected (a-d). However, there are some FNs: larger node in (a) and nodes touching with other tissues in (e) are missing.
342
T. Kitasaka et al.
Fig. 2. An example of 3-D display of lymph node detection results
0.8
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
5000
10000
15000
20000
0
500
1000
(a)
2000
2500
150
200
250
(b)
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 150
1500
0 200
250
300
(c)
350
400
0
50
100
(d)
Fig. 3. Results of ROC analysis of each procedure. Horizontal and vertical axes indicate the number of FPs and TP rate, respectively. FROC curves with changing (a) Ta , (b) Tb , (c) Tc , and (d) Td .
Automated Extraction of Lymph Nodes from 3-D Abdominal CT Images
4
343
Conclusion
This paper presented a method for extracting lymph node regions from 3-D abdominal CT images by using 3-D minimum directional difference filter. The actual procedure consisted of lymph node enhancement using 3-D Min-DD filter and false positive reduction using information of size, blood vessel, air, and degree of sphere. As the results, average TP rate was 57.0% and the average number of FP regions was 58 per case. We believe that the proposed method is quite useful not only for diagnosis of 3-D abdominal CT images but also for surgical planning determining resection areas. Future work includes improvement TP rate with suppressing increase of FPs by modifying the 3-D Min-DD filter and FP reduction procedures, and fusion of detection results from 3-D CT and PET images.
References 1. Shiraishi, S., Tomiguchi, S., Utsunomiya, D., Kawanaka, K., Awai, K., Morishita, S., Okuda, T., Yokotsuka, K., Yamashita, Y.: Quantitative analysis and effect of attenuation correction on lymph node staging of non-small cell lung cancer on SPECT and CT. American Journal of Roentgenology 186, 1450–1457 (2006) 2. Pijl, M.E.J., Chaoui, A.S., Wahl, R.L., van Oostayen, J.A.: Radiology of colorectal cancer. European journal of Cancer 38, 887–898 (2002) 3. Yokoi, N., Shimizu, A., Sato, R., Kobatake, H., Oriuchi, N., Endo, K.: Improvement of the computer-aided detection process of abnormal regions using a combination of PET and CT images. In: Proceedings of JAMIT2006, Op10-2 (2006) (in japanese) 4. Nitta, S., Honda, S., Kasuya, T., Hontani, H., Fukami, T., Yuasa, T., Akatsuka, T., Wu, J., Takeda, T.: Tumor Detection in PET/CT images. IEICE Technical Reports, MI2005, -66 (2006) (in japanese) 5. Shimizu, A., Toriwaki, J.: Characteristics of rotatory second order difference filter for computer aided diagnosis of medical images. Systems and Computers in Japan 26(11), 38–51 (1995) 6. Nakamura, Y., Tsujimura, Y., Kitasaka, T., Mori, K., Suenaga, Y., Nawano, S.: A study on blood vessel segmentation and lymph node detection from 3D abdominal X-ray CT images. In: Proceedings of the 20th International Congress and Exhibition, pp. 381–382 (2006) 7. Sugawara, Y., Zasadny, K.R., Neuhoff, A.W., Wahl, R.L.: Reevaluation of the Standardized Uptage Value for FDG: Variations with Body Weight and Methods for Correction. Radiology 213, 521–525 (1999) 8. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55 (1997)
Non-Local Means Variants for Denoising of Diffusion-Weighted and Diffusion Tensor MRI Nicolas Wiest-Daessl´e, Sylvain Prima, Pierrick Coup´e, Sean Patrick Morrissey, and Christian Barillot Unit/Project VisAGeS U746, INSERM - INRIA - CNRS - Univ-Rennes 1, IRISA campus Beaulieu 35042 Rennes Cedex, France {nwiestda,sprima,pcoupe,spmorris,cbarillo}@irisa.fr http://www.irisa.fr/visages
Abstract. Diffusion tensor imaging (DT-MRI) is very sensitive to corrupting noise due to the non linear relationship between the diffusionweighted image intensities (DW-MRI) and the resulting diffusion tensor. Denoising is a crucial step to increase the quality of the estimated tensor field. This enhanced quality allows for a better quantification and a better image interpretation. The methods proposed in this paper are based on the Non-Local (NL) means algorithm. This approach uses the natural redundancy of information in images to remove the noise. We introduce three variations of the NL-means algorithms adapted to DW-MRI and to DT-MRI. Experiments were carried out on a set of 12 diffusion-weighted images (DW-MRI) of the same subject. The results show that the intensity based NL-means approaches give better results in the context of DT-MRI than other classical denoising methods, such as Gaussian Smoothing, Anisotropic Diffusion and Total Variation.
1
Introduction
Image processing procedures needed for fully automated and quantitative analysis (registration, segmentation, visualisation) require images with the best signalto-noise ratio and the least artifacts in order to improve their performances. Most of the time, the hardware introduces artifacts during the acquisition (noise, intensity non-uniformities, geometrical deformations). Therefore, one critical issue is to remove the noise while keeping relevant image information. This is particularly true for diffusion-weighted MRI (DW-MRI) especially when they are acquired with high diffusion (b-value) coefficient. This paper focuses on denoising using variants of the non-local means (NL-means) method modified to deal with DT-MRI (NLMt) and DW-MRI, either gradient-by-gradient (NLM) or as a multi-spectral (NLMv) image. The NL-means variants are compared with the simple Gaussian Filter (GF), the Anisotropic Diffusion (AD) [13] and the Total Variation (TV) [15]. In particular the AD filter is frequently used for diffusion image denoising [4, 9] or tensor field regularisation [16]. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 344–351, 2007. c Springer-Verlag Berlin Heidelberg 2007
Non-Local Means Variants for Denoising of DW- and DT-MRI
2 2.1
345
Methods The Non-Local Means Algorithm
First introduced by Buades et al. in [3], the NL-means algorithm is based on the natural redundancy of information in images to remove noise. In the theoretical formulation of the NL-means algorithm, the restored intensity of the voxel xi, N L(v)(xi ), is a weighted average of all voxels intensities in the image I. Let us denote: N L(v)(xi ) = w(xi , xj )v(xj ), (1) xj ∈I
where v is the intensity function and thus v(xj ) is the intensity of voxel xj and w(xi , xj ) the weight assigned to v(xj ) in the restoration of v(xi ). More precisely, the weight quantifies the similarity of voxels xi and xj under the assumptions that w(xi , xj ) ∈ [0, 1] and xj ∈I w(xi , xj ) = 1.
Fig. 1. Two-dimensional illustration of the NL-means principle. The restored value of voxel xi is a weighted average of all intensities of voxels xj in the search volume Vi . The weight w(xi , xj ) is based on the similarity of the intensities in cubic neighborhoods Ni and Nj around xi and xj .
In practice, voxels similar to i are only searched over a neighborhood Vi , so Eq. 1 is: N L(v)(xi ) = xj ∈Vi w(xi , xj )v(xj ). For each voxel xj in Vi , the weight w(xi , xj ) is related to the distance d(v(Ni ), v(Nj )), Ni and Nj being neighborhoods around xi and xj , following: w(xi , xj ) =
i ),v(Nj )) 1 − d(v(N(hˆ σ )2 e Z(i)
(2)
346
N. Wiest-Daessl´e et al.
where Z(i) is a normalization constant with Z(i) = ˆ is the j w(xi , xj ), σ estimation of the standard deviation of the noise using the pseudo-residuals method [8] and h acts as a filtering parameter (for more details see [6] and Fig. 1). The distance d is expressed in general terms as: d(v(Ni ), v(Nj )) = N 1 k Δ(v(yk ), v(zk )) where N = card Ni = card Nj and yk and zk are N the k-th voxels in the neighborhoods Ni and Nj . For a grey-level image, Δ is Δ(v(yk ), v(zk )) = v(yk ) − v(zk )2 . 2.2
DW- and DT-MRI Adaptations
This section introduces the NL-means as a method to remove noise from either the whole DW-MR dataset (with n directions, n ≥ 6, plus the B0 image) or the resulting DT-MR image. Three variants are proposed here, two acting on the DW-MRI and one on the DT-MRI: 1. NLM: each DW-MRI is denoised individually as described in Section 2.1 and the DT-MRI is estimated from these denoised DW-MRI, 2. NLMv: the whole set of DW-MRI is considered as a multi-spectral image, each voxel being a (n + 1)-dimensional vector. The Δ is defined as: Δ(v(yk ), v(zk )) =
n+1
v i (yk ) − v i (zk )2 ,
(3)
i=1
v i (.) being the i-th component of the vector v(.). 3. NLMt: the DT-MRI is computed from the raw DW-MRI and then denoised. The weighted average of the MRI intensities (grey levels) Eq. 1 is replaced by a Log-Euclidean weighted average [1, 12]: of the image diffusion tensors. Δ is defined as: Δ(v(yk ), v(zk )) = log(v(yk )− 2 v(zk )v(yk )− 2 )2 , 1
1
(4)
v(yk ) and v(zk ) being the tensors at voxels yk and zk . The Log-Euclidean framework could have been replaced by other methods [7, 12, 17]. 2.3
Implementation Details
The NLM method uses a cubic neighborhood (card Ni = 27). For NLMv and NLMt, considering that both vectors and tensors convey enough information for denoising, card Ni is set to 1. In these cases, having larger neighborhoods Ni makes it difficult to find similar blocks in the search area Vi and thus limits the denoising capacities of the algorithms. The search area Vi is chosen to be identical for all the NL-means variants (card Vi = 113 voxels).
Non-Local Means Variants for Denoising of DW- and DT-MRI
2.4
347
Comparison Measure
A comparison measure is needed to validate the denoising methods with respect to a ground truth. We define the distance between two DT-MRI as: 2 1 Id , d I, (5) RM S = card Ω card Ω
with Ω the masked diffusion tensor image grid, I the reference DT-MRI, Id the denoised DT-MRI, and d a distance over tensors. The Log-Euclidean distance is selected as it is specifically designed for tensors as described in Sec. 2.2. The comparison is restricted to cerebral tissues, where the estimation of a diffusion tensor is relevant.
3 3.1
Validation and Results Dataset
In order to evaluate the performances of the different algorithms on DT-MRI multiple tests are performed on a reference data set. The reference data set is constructed by averaging multiple acquisitions of the same subject. The acquisition protocol is a single-shot spin echo EPI sequence on a Siemens 1.5T scanner, with diffusion encoding (10 directions, b = 1000 s/mm2 , voxels=1.875×1.875×5 mm3 , 24 slices, 24 cm FOV). The acquisition is repeated 12 times with identical slice locations and each acquisition has a run time of 8 minutes. Each diffusionweighted acquisition is corrected for distortions [18]. Numerous methods exist for the estimation of the tensor [10, 16]. We simply choose to estimate the tensor by classical linear regression. 3.2
Leave-One-Out Comparison
To assess the validity of the proposed denoising methods, a leave-one-out api the 11 other DW-MRI are proach is devised. For each DW acquisition Inoisy i averaged gradient-by-gradient, giving Iaverage (cf Fig. 2, left). A DT-MRI is i estimated from Iaverage and serves as a comparison basis. The selected image, i Inoisy (or its corresponding DT-MRI), is then denoised with the 6 denoising techniques, and the resulting denoised DT-MRI is estimated. The error between i is computed this denoised image and the ground truth data built from Iaverage using the measure defined in Section 2.4. The process is then iterated, yielding 12 RMS errors, which are finally averaged to give a global RMS error. These error measures are displayed on Fig. 2 (right) for the 6 denoising methods. This leave-one-out method helps avoiding the introduction of bias. The denoising using the NLMt techniques yields very poor results, probably due to the poor redundancy of tensor information in the image. Computing the
348
N. Wiest-Daessl´e et al. 0.5
0.45
RMS
0.4
0.35
0.3
0.25
0.2 Noisy
AD
GF
NLM
NLMv
TV
Fig. 2. Left: Scheme of the first step of the leave-one-out validation. An acqui1 1 ), and the others are averaged, giving Iaverage . The DW-MRI sition is selected (Inoisy 1 are denoised with each of the 6 algorithm; the (or DT-MRI) corresponding to Inoisy associated DT-MRI is computed and quantitatively compared with the DT-MRI com1 , giving an error measure 1 . The process is then repeated with puted from Iaverage 1 i = 2, ..., 12 and the global error measure is computed as 12 i i . Right: Error plot of the RMS for each method. The bar length indicates the min and max error over the 12 experiments; the middle mark indicates the mean value. The acronyms are as follows: GF: Gaussian Filter, AD: Anisotropic Diffusion, TV: Total Variation, NLMv: NL-means vector, NLM: NL-means gradient-by-gradient, NLMt: NL-means tensor. The NLMt method is not plotted due to poor results: average RMS is 1.2.
weights for each voxel shows that on average only 8 significantly similar tensors are found, whereas for grey level or vector images the number of significantly similar blocks is generally higher than 100. 3.3
Comparison with Different Noise Levels
In this section, the average of the 12 images Iavg is used as a reference. A new image In is built by adding Rician noise with different levels. In Collins et al. [5], the noise percentage p is related to the standard deviation of the Gaussian noise σ and the mean value ν of the brightest tissue following p = 100σ/ν. The same idea is used here with Rician noise. The mean intensity of the CSF in the nondiffusion-weighted image (S0 ) is used for the ν value. The RMS error is computed between each denoised DT-MRI and the ground truth DT-MRI computed from Iaverage . Results are shown in Fig. 3. At low levels of noise (below 4%), TV and AD perform better than the NL-means filters. That could be partially explained by the fact that the estimation of the noise by pseudo-residuals used in the NL-means variants is known to be overestimated for these low levels of noise. At higher levels (in the range 5-10%), usually met in real DW-MRI, the NLmeans filters outperform all the other filters, NLM being constantly better than NLMv.
Non-Local Means Variants for Denoising of DW- and DT-MRI
349
0.8 noise NLM NLMv Gauss AD TV
0.7
0.6
RMS
0.5
0.4
0.3
0.2
0.1
0 0
2
4
6
8
10
12
14
16
Noise Level
Fig. 3. Plot of different noise levels and RMS. Noise is added to the reference image. The image is then denoised and compared to the original.
3.4
Choice of the Filters Parameters
Each proposed method needs specific parameters for denoising. For a fair comparisons of all the methods, those parameters are selected with an optimisation procedure so that each method gives its best result for a given experiment. In practice, according to the 12 different RMS errors computed in the leave-one-out experiment, the noise level is between 6 and 7 percents. All the parameters are optimised for this level of noise added to the Iaverage DW-MRI. This optimisation is performed with the Nelder-Mead’s downhill simplex algorithm [14]. The cost function for this optimiser is the measure described in Sec. 2.4. Initial guesses for the parameters are empirically chosen after a few manual tests, and are used to initialise the downhill simplex. The unknown parameters are: number of iterations and regularisation strength (TV and AD), kernel size (GF), and filtering parameter h (NL-means variants). 3.5
Visual Assessment
In Figure 4, we display axial slices at the level of the ventricles for the ground truth data (DT-MRI computed from Iaverage ), the raw DT-MRI computed from i one of the acquisitions Inoisy , and the 6 denoised images. The color encodes the principal direction of diffusion (colinear to the eigenvector of the tensor with maximal eigenvalue), weighted by the fractional anisotropy [11]. The reference image has smooth color transitions but also sharp edges. The GF filtered image efficiently removes the noise but suppresses the edges and lowers the anisotropy of tensors. The standard NL-means seems to be the best filter, followed by TV, NLMv, AD and NLMt, which confirms the quantitative values in Fig. 3.
350
N. Wiest-Daessl´e et al.
Noisy
GF
NLM
NLMv
Reference
AD
NLMt
TV
Fig. 4. Visual Comparison of the different algorithms. The color encodes the principal direction of diffusion (colinear to the eigenvector of the tensor with maximal eigenvalue), weighted by the fractional anisotropy [11].
4
Conclusion and Further Works
This paper presents new variants of the Non-Local (NL) means algorithm, applied to diffusion-weighted and diffusion tensor images. The validations performed on a reference dataset underline how the NL-means denoising outperforms well-established other methods, such as Anisotropic Diffusion [13] and Total Variation [15]. The results are obtained from the denoising of either the diffusion images or the diffusion tensor. Our comparison does not take into account the Rician nature of the noise, and comparison with more specific denoising methods [2] will be performed in the future. The direct denoising of the DT-MRI with our proposed NL-means variant does not achieve good performances. This relates to the fact that the number of similar tensors inside the search region is quite low (≈ 8). The lower quality of the direct denoising of DT-MRI compared to denoising on DW-MRI is in line with the literature [2]. The effect of such denoising techniques needs to be investigated in pathological cases. For instance Multiple Sclerosis clearly shows changes in diffusion coefficients (such as fractional anisotropy and mean diffusivity). The effect of denoising must be studied in lesion areas to make sure these are well preserved in terms of their diffusion characteristics. Moreover, the impact of this NL-means denoising variants on the performances of post-processing algorithms, such as segmentation and fiber tracking has to be further investigated.
Non-Local Means Variants for Denoising of DW- and DT-MRI
351
References 1. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Fast and simple calculus on tensors in the Log-Euclidean framework. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 115–122. Springer, Heidelberg (2005) 2. Basu, S., Fletcher, T., Whitaker, R.: Rician noise removal in diffusion tensor mri. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 117–125. Springer, Heidelberg (2006) 3. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2), 490–530 (2005) 4. Chen, B., Hsu, E.: Pde denoising of MR diffusion tensor imaging data. In: ISBI 2004, pp. 1040–1042 (2004) 5. Collins, D.L., Zijdenbos, A.P., Kollokian, V., Sled, J.G., Kabani, N.J., Holmes, C.J., Evans, A.C.: Design and construction of a realistic digital brain phantom. IEEE Trans. Med. Imaging 17(3), 463–468 (1998) 6. Coup´e, P., Yger, P., Barillot, C.: Fast Non Local Means Denoising for 3D MR Images. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 33–40. Springer, Heidelberg (2006) 7. Fletcher, P.T., Joshi, S.C.: Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors. In: Sonka, M., Kakadiaris, I.A., Kybic, J. (eds.) CVAMIA and MMBIA 2004. LNCS, vol. 3117, pp. 87–98. Springer, Heidelberg (2004) 8. Gasser, T., Sroka, L., Steinmetz, C.: Residual variance and residual pattern in non linear regression. Biometrika 73(3), 625–633 (1986) 9. Lee, J.E., Chung, M.K., Alexander, A.L.: Evaluation of anisotropic filters for diffusion tensor imaging. In: on Biomedical Imaging: Macro to Nano 3rd IEEE International Symposium, pp. 77–78. IEEE Computer Society Press, Los Alamitos (2006) 10. Mangin, J.-F., Poupon, C., Clark, C., Le Bihan, D., Bloch, I.: Distortion correction and robust tensor estimation for MR diffusion imaging. Med. Image Anal. 6(3), 191–198 (2002) 11. Pajevic, S., Pierpaoli, C.: Color schemes to represent the orientation of anisotropic tissues from diffusion tensor data: application to white matter fiber tract mapping in the human brain. Magn. Reson. Med. 42(3), 526–540 (1999) 12. Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. International Journal of Computer Vision 66(1), 41–66 (2006) 13. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990) 14. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (UK), New York (1992) 15. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms, pp. 259–268 (1992) 16. Tschumperl´e, D., Deriche, R.: Variational frameworks for DT-MRI estimation, regularization and visualization. In: ICCV 2003, pp. 116–121 (October 2003) 17. Wang, Z., Vemuri, B.C.: DTI segmentation using an information theoretic tensor dissimilarity measure. IEEE Trans. Med. Imaging 24(10), 1267–1277 (2005) 18. Wiest-Daessl´e, N., Prima, S., Morrissey, S.P., Barillot, C.: Validation of a new optimisation algorithm for registration tasks in medical imaging. In: ISBI 2007 (April 2007)
Quantifying Calcification in the Lumbar Aorta on X-Ray Images Lars A. Conrad-Hansen2, Marleen de Bruijne1,2 , Fran¸cois Lauze2 , L´aszl´o B. Tank´ o3 , Paola C. Pettersen3 , Qing He3,4 , Jianghong Chen3,4 , Claus Christiansen2,3 , and Mads Nielsen1,2 1
Department of Computer Science, University of Copenhagen, Denmark 2 Nordic Bioscience A/S, Herlev, Denmark 3 Center for Clinical and Basic Research A/S, Copenhagen, Denmark 4 Department of Radiology, Beijing Friendship Hospital, China
Abstract. In this paper we propose to use inpainting to estimate the severity of atherosclerotic plaques from X-ray projections. Inpainting allows to “remove” the plaque and estimate what the background image for an uncalcified aorta would have looked like. A measure of plaque severity can then be derived by subtracting the inpainting from the original image. In contrast to the current standard of categorical calcification scoring from X-rays, our method estimates both the size and the density of calcified areas and provides a continuous severity score, thus allowing for measurement of more subtle differences. We discuss a class of smooth inpainting methods, compare their ability to reconstruct the original images, and compare the inpainting based calcification score to the conventional categorical score in a longitudinal study on 49 patients addressing correlations of the calcification scores with hypertension, a known cardiovascular risk factor.
1
Introduction
Atherosclerosis forms the basis of coronary-heart diseases that may culminate in myocardial infarct, stroke, and sudden death [9]. Various studies demonstrate that calcific deposits in the abdominal aorta have an independent predictive value for estimating the risk of future cardiovascular events [9,12,10] and that aortic calcification is a good indicator for the presence of atherosclerosis in coronary arteries [4]. The medical expert consensus is that vascular imaging is vital for monitoring the severity and progression of atherosclerosis in both clinical practice and clinical trials [4,9]. An attractive modality suitable for large scale clinical trials is common lateral X-ray imaging. Although several other modalities can be used for the assessment of atherosclerotic plaque, including ultrasound (US), Computed Tomography (CT), and Magnetic Resonance Imaging (MRI) [13,9], none of these seems particularly well suited for large scale studies because of the time and costs involved. In addition, common X-rays are already routinely used for the assessment of vertebral fractures in clinical trials for osteoporosis, and a large amount of “historical” data is therefore readily available (see for instance [7]). N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 352–359, 2007. c Springer-Verlag Berlin Heidelberg 2007
Quantifying Calcification in the Lumbar Aorta on X-Ray Images
353
For the assessment of atherosclerotic plaque in the aorta, the lumbar region denoted by the L1-L4 vertebrae is a natural choice as the region of interest, since bifurcation of the lumbar aorta that is mostly located at the L4 level [13,5] is a known predilective site of atherosclerosis. Early development of plaques at this site is mainly due to the turbulent fluid dynamics, directly contributing to vascular damage and the deposition of atherogenic metabolites in the vessel wall [13]. The natural progression of atherogenesis and development of calcified deposits expands from distal to the more proximal regions (i.e. L4 to L1). This led Witteman et al. [11] to develop a semi-quantitative scoring system measuring the longitudinal extent of the plaque areas in the L1-L4 region. This scheme was developed further in 1997 by Kauppila et al. [5], which still constitutes the current practice of assessing the amount of calcification on lateral 2-D X-ray [5,6]. In this scheme the calcifications are measured longitudinally for the posterior as well as anterior walls at each aortic segment adjacent to the first four lumbar vertebrae, using the midpoint of the intervertebral space above and below the respective vertebra as segment boundaries. Each of the eight wall segments is then scored a 0 (uncalcified), 1 (up to 1/3 of the length of the segment calcified), 2 (between 1/3 and 2/3 calcified) or 3 (completely calcified). The scores are summed into a 24-score reflecting the anterior - posterior severity which will in the course of this paper be referred to as the AC24 score (see [5] for details). The categorical nature of this AC24 score makes the detection of subtle changes of atherogenesis difficult over shorter periods of time (< 3 years). In this paper we propose to quantify the severity of an atherosclerotic plaque by comparing the observed image intensity to the image intensity that would be expected if the aorta were uncalcified. First, all calcified areas in the L1-L4 region are segmented. In this work we have used manual annotations by radiologists, but automated segmentation methods could be used instead [1]. Subsequently, the “healthy” aorta appearance is reconstructed by interpolating the background image around the calcification using inpainting techniques. Plaque density can then be estimated by subtracting this inpainting from the original X-ray. This paper presents a more elaborate validation in a longitudinal setting of the methods previously presented in [3]. The paper is organized as follows. In the next section we present the inpainting and the resulting scoring method and in Section 3 we discuss validation of the new scoring method. In Section 4 we present results. Conclusions are drawn in Section 5.
2
Inpainting Based Calcification Scoring
Inpainting Given an area of an X-ray where a calcification is present, can we estimate what the signal would have looked like without the calcification, i.e. at an hypothetical baseline when atherosclerosis had not started? Assuming that the plaque is superposed to the aortic wall, from the additive nature of the X-ray attenuation coefficients, subtracting the estimated signal from the original should ideally
354
L.A. Conrad-Hansen et al.
result in the calcific deposit signal. This is illustrated in Fig. 1. The simulated non-calcified baseline image is created by inpainting the calcified areas. The hope is that such a measure not only allows for assessing minor plaque progression, but also offers a possibility to make a statement about the plaque density.
Fig. 1. Quantification methodology: Left: The original image, inverted for better visibility of the calcifications. Second-to-left: The manually delineated calcifications. Second-to-right: The inpainted result, mimicking the aorta in a healthy state. Right: The difference between the original and inpainted aorta.
We first introduce some notations. Let D denote the image domain, Ω the region encompassing the calcification, u0 the observed X-ray signal as a function D → R. Then a very crude inpainting method is the simple “Average”, where Ω is filled homogeneously with the constant value S resulting from averaging over the immediate boundary ∂Ω of Ω, in discrete settings: S=
1 u0 (i), n
(1)
i∈∂Ω
where n is the number of boundary pixels. Meanwhile the expectation that the X-ray image is piecewise smooth suggests a Bayesian approach: the probability p(u|u0 , Ω) of a candidate baseline signal u can be written p(u|u0 , Ω) ∝ p(u0 |u, Ω)p(u|Ω) (2) (the denominator p(u0 |Ω) being constant is not used). We may assume independence of u and Ω (because, for instance, at our hypothetical baseline, conditionally to the the knowledge of the location of the aorta in the image, there would be no way to decide whether atherosclerosis would develop at a specific location from the observation of the tissues), and thus p(u|Ω) = p(u). This latter term represents the prior knowledge on the image regularity, while the first term p(u0 |u, Ω) is the likelihood of observing u0 knowing u and Ω. As a coarse approximation, we assume that u0 comes from the contamination of u by some
Quantifying Calcification in the Lumbar Aorta on X-Ray Images
355
independent Gaussian white noise of variance σ on each pixel in D\Ω. The regularity is expressed in terms of the distribution of local variations of intensities at a given pixel location i of D. These variations are expressed as a discrete gradient q |∇u| i and assumed to be pixelwise independent : p(u) ≈ i∈D e μ . We’ll see in the next paragraph that it is not necessary to choose values for σ and for μ. A candidate reconstruction can thus be obtained by maximization of the posterior (MAP) following its factorization in (2). By taking −log on each side of equation 2, the MAP problem is transformed into a minimization problem that can be written (with some slight abuses in continuous setting) as minimizing the following energy 2 (u − u0 ) dx + λ |∇u|q dx (3) E(u) = D\Ω
D
2
where λ = 2σ /μ. As this formula shows, we don’t need to worry about choices for σ and μ, it is sufficient to choose a value of λ, we chose λ = 10−2 , it did experimentally provide good background estimation both for q = 1 and q = 2. A numerical solution is obtained by numerically solving the Euler-Lagrange equation resulting from equation (3) ∇u =0 (u − uo )χ − λ∇ · |∇u|2−q where ∇· is the divergence and χ(x) = 0 if x ∈ Ω, χ(x) = 1 otherwise. For q = 1 this is the well known Total Variation Inpainting/Denoising of Chan and Shen [2], (TV), while for q = 2, one observes from the above equation that the solution u satisfies a Laplace equation Δu = 0 in Ω, i.e. is harmonic in Ω and for that reason we call it Harmonic inpainting (although u may not be harmonic on D). In order to solve numerically the resulting equations, we use the scheme proposed by Chan and Shen in [2], verbatim for q = 1 and with the obvious simplification for q = 2. Calcification Scoring This is straightforward. For each image u0n , and each of its calcified areas Ωn1 , . . . , Ωnk , as annotated by radiologists, we compute the inpainting, according to method M where M is either average, harmonic or TV inpainting, along each Ωni to obtain the inpainted image uMi n and assign as score the quantity k sM (i) = (u0n − uMi M ∈ {Average, Harmonic,TV}. n n ) dx, i=1
3
i Ωn
Validation Methodology
Background Estimation We need first to determine how well the individual inpainting techniques actually simulate the data. We apply TV, Harmonic and Average inpainting to cut-out
356
L.A. Conrad-Hansen et al.
sections of uncalcified aortas and compare these to corresponding sections of the true image. The pixelwise differences between the inpainted and original are taken and the standard deviations, mean errors and mean absolute errors are calculated. Due to potential differences in imaging technology we compute separate error models for baseline and follow-up images. In each group 10 healthy subjects were randomly selected. True calcification shapes were selected from the calcification annotations and the errors were then measured by placing one of these, randomly chosen, at the center of each of the seven possible sections of the aorta in order to ensure near-independent measures without spatial overlap. The area coinciding with each successfully placed template is cut out of the image and then inpainted. Clinical Value of the New Score Correlations of the results from the various inpainting methods with the AC24 score are computed. To investigate the clinical value of the continuous score we compare the Δ-results, which denote the respective differences between the scores at follow-up and at baseline, for the AC24 and TV scores, to Hypertension (HT), a known physiological risk factor, at baseline. HT is defined as systolic blood pressure (SBP) above 140 mm Hg or diastolic blood pressure (DBP) above 90 mm Hg. Statistical significance is assumed for p < 0.05, calculated according to Mann-Whitney tests.
4
Results
The Data Set The data set used in the here presented study constitutes a subset of 49 patients from a 500+ patients study population investigated by Tank´ o et al . in [8]. The selection was carried out by focusing on covering as much as possible of the entire atherosclerotic spectrum; thus, the patients of the population range from 0 to 17 on the AC24 scale. The selected 49 patients were subjected to X-ray twice, once at baseline (1992) and a second time eight years later (2000/2001). Each batch also contains 10 extra images that belong to the same initial population but include healthy (uncalcified aorta) subjects only, which were used for the evaluation of background estimation errors as described in Section 3. The data set contains thus 118 X-ray images. The images were taken at different centers in the Copenhagen area, Denmark. All images were digitized with a Vidar DosimetryPro Advantage scanner with 12 bit intensity range and a resolution of 570 dpi. In order to be able to extract the calcifications from the X-ray images, the relevant anatomical structures, which include the four corner points of each of the L1 - L4 vertebrae, the posterior as well as the anterior wall of the aorta, and the calcified regions (if any were present), where delineated manually by radiologists from CCBR.
Quantifying Calcification in the Lumbar Aorta on X-Ray Images
357
Background Estimation The results from the background estimation described in Section 3 are listed in Table 1. The tendency for all data sets is clear: TV inpainting produces the smallest absolute mean error per pixel and Harmonic inpainting follows closely. Average inpainting performs worst. Wilcoxon signed rank tests performed on both experimental sets of the background data showed significant (p < 0.01) performance differences for TV vs. Average inpainting, TV vs. Harmonic inpainting, and Average vs. Harmonic inpainting. Table 1. Background estimation results. Left: The results for the baseline data; the top part of the table displays the p-values resulting from the paired Wilcoxon signed rank test of the mean absolute errors. The three bottom rows show the standard deviations of the pixelwise error, the mean error, and the mean absolute error for the three inpainting methods. Right: The results for the follow-up. Baseline TV Harmonic Average
Follow-up TV Harmonic Average
TV
1
0.0074
<0.00001
TV
1
0.0001
<0.00001
Harmonic
0.0074
1
<0.00001
Harmonic
0.0001
1
<0.00001
<0.00001 <0.00001
Average
0.00125
1
std
0.00116
mean error
2.7676
3.6791
1.1335
mean abs error 24.2813
26.2639
41.0328
<0.00001 <0.00001
Average
0.00138
0.00153
1
std
0.00141
0.00175
mean error
0.7154
2.0805
2.7175
mean abs error 28.9577
31.8650
48.1623
Calcification Scoring In the remainder of the experiments we focus on the TV inpainting, since it is the most accurate of the three inpainting schemes. Figure 2 shows the obtained intensity differences calculated for baseline, follow-up, and Δ (difference between AC24 VS Differences (baseline), corrcoef = 0.8918
AC24 VS Differences (follow−up), corrcoef = 0.7772
12
AC24 VS Differences (delta), corrcoef = 0.4488
2
20
1.8 1.6
8
6
4
1.2 1 0.8
10
5
0.6 0.4
data regression
2
0
15
1.4
Differences
Differences
Differences
10
0
2
4
6
8
AC24
10
12
14
16
0
0
data regression
0.2 0
5
10
AC24
15
20
−5
data regression 0
2
4
6
AC24 (delta)
8
10
12
Fig. 2. The intensity difference score plotted against the AC24 score. Left: Intensity differences (×10−6 ) versus the AC24 score at baseline. Middle: Intensity differences (×10−7 ) versus the AC24 score at follow-up. Right: Intensity differences (×10−6 )versus the AC24 score for Δ.
358
L.A. Conrad-Hansen et al. Separation for Intensity difference score, p = 0.048 Average Intensity difference score (10
Average Framingham score
6
)
Separation for Framingham score, p = 0.670
0.4
0.35
0.3
0.25 low risk (35 patients)
high risk (14 patients)
0.8 0.7 0.6 0.5 0.4 0.3
0.2 low risk (35 patients)
high risk (14 patients)
Fig. 3. Visualization of the patient stratification into low- and high-risk groups for hypertension. Left: AC24 scores. Right: intensity difference score.
baseline and follow-up) using TV inpainting correlated with the respective AC24 score. The correlation coefficients r = 0.89 at baseline, r = 0.78 at follow-up, and r = 0.45 for Δ reflect that our continuous score follows the trend of the AC24 score rather well. Correlation to Hypertension For the comparisons of different measures of atherosclerosis with relation to HT, the study population was divided into a normal group and a HT (high risk) group at baseline, according to the usual risk-thresholds for HT. Figure 3 shows the annual rate of change in the AC24 and inpainting scores for the two groups. The respective p-values were calculated using Mann-Whitney tests.
5
Discussion and Conclusion
In our attempt to develop a meaningful, continuous scoring system for atherosclerotic plaque we have investigated TV and Harmonic inpainting and compared them to the crude Average inpainting scheme. Results showed that TV inpainting is best suited for the task (Table 1). The continuous intensity difference score shows a good correlation to the AC24 score. Higher r-values would not necessarily be desirable, since that would imply less room for improvement with respect to the AC24 score. The individual scatterplots in Figure 2 show many examples for which the AC24 score yields exactly the same value, whereas the inpainting score results in a larger variety of values, which indicates that the two scores carry different information. Our continuous inpainting score significantly separates the high risk and low risk groups for hypertension(p = 0.048), while the AC24 score does not (p = 0.670). This suggests that the inpainting based score may carry more relevant information than the AC24 score. At this point we can claim that we can put clinically meaningful, continuous numbers on the severity of calcification in atherosclerotic plaque. A larger scale
Quantifying Calcification in the Lumbar Aorta on X-Ray Images
359
study to further assess the utility and potential of this method is currently performed. The lack of accuracy in manual segmentation may influence in the inpainting process as illustrated in Fig. 1, parts of the calcified region were missed, the effect can be seen in inpainting. This suggests in fact a method for locally correcting the annotations and we are currently experimenting it intensively.
References 1. de Bruijne, M.: Shape particle guided tissue classification. In: Golland, P., Rueckert, D. (eds.) IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, IEEE Computer Society Press, Los Alamitos (2006) 2. Chan, T., Shen, J.: Mathematical models for local non-texture inpaintings. SIAM Journal of Applied Mathematics 62(3), 1019–1043 (2001) 3. Conrad-Hansen, L., de Bruijne, M., Lauze, F., Tanko, L., Nielsen, M.: Quantizing calcification in the lumbar aorta on 2-D lateral X-ray images using TV inpainting. In: Liu, Y., Jiang, T., Zhang, C. (eds.) CVBIA 2005. LNCS, vol. 3765, pp. 409–418. Springer, Heidelberg (2005) 4. Eggen, D.A., Strong, J.P., McGill, H.C.J.: Calcification in the abdominal aorta: Relationship to race, sex, and coronary atherosclerosis. Arch. Pathol. 78, 575–583 (1964) 5. Kauppila, L.I., Polak, J.F., Cupples, L.A., Hannan, M.T., Kiel, D.P., Wilson, P.W.: New indices to classify location, severity and progression of calcific lesions in the abdominal aorta: a 25-year follow-up study. Atherosclerosis 132, 245–250 (1997) 6. Kiel, D.P., Kauppila, L.I., Cupples, L.A., Hannan, M.T., O’Donnell, C.J., Wilson, P.W.F.: Bone Loss and the Progression of Abdominal Aortic Calcification Over a 25 Year Period: The Framingham Heart Study. Calcified Tissue International 68, 271–276 (2001) 7. Sauer, P., Leidig, G., Minne, H.W., Duckeck, G., Schwarz, W., Siromachkostov, L., Ziegler, R.: Spine deformity index (SDI) versus other objective procedures of vertebral fracture identification in patients with osteoporosis: A comparaive study. Journal of Bone and Mineral Research 6 (1991) 8. Tanko, L.B., Bagger, Y.Z., Gerong, Q., Alexandersen, P., Larsen, P., Christiansen, C.: Enlarged waist combined with elevated triglycerides is a strong predictor of accelerated atherogenesis and related cardiovascular mortality in postmenopausal women. Circulation (111), 1883–1890 (2005) 9. Wilson, P.W.F., Kauppila, L.I., O’Donnell, C.J., Kiel, D.P., Hannan, M., Polak, J.M., Cupples, L.A.: Abdominal Aortic Clcific Deposits Are an Important Predictor of Vascular Morbidity and Mortality. Circulation 103, 1529–1534 (2001) 10. Witteman, J.C., Kannel, W.B., Grobbe, P.A.: Aortic calcified plaques and cardiovascular disease (the Framingham study). American Journal of Cardiology 6, 1060–1064 (1990) 11. Witteman, J.C.M., Grobbee, D.E., Valkenburg, H.A., van Hemert, A.M., Stijnen, T., Burger, H., Hofman, A.: J-shaped relation between change in diastolic blood pressure and progression of aortic atherosclerosis. Lancet 343, 504–507 (1994) 12. Witteman, J.C.M., van Saase, J.L.C., Valkenburg, H.A.: Aortic calcification as a predictor of cardiovascular mortality. Lancet 2, 1120–1122 (1986) 13. Wolffe, J.B., Siegal, E.I.: X-ray of the abdominal aorta in detection of atherosclerosis. Clin. Med. 69, 401–406 (1962)
Physically Motivated Enhancement of Color Images for Fiber Endoscopy Christian Winter1 , Thorsten Zerfaß2 , Matthias Elter2 , Stephan Rupp2 , and Thomas Wittenberg2 1
University Erlangen-Nuremberg, Chair for Information Technology, Am Wolfsmantel 33, Erlangen, Germany
[email protected] 2 Fraunhofer-Institute for Integrated Circuits IIS, Am Wolfsmantel 33, Erlangen, Germany {zfs,elt,rupp,wbg}@iis.fraunhofer.de
Abstract. Fiber optics are widely used in flexible endoscopes which are indispensable for many applications in diagnosis and therapy. Computeraided use of fiberscopes requires a digital sensor mounted at the proximal end. Most commercially available cameras for endoscopy provide the images by means of a regular grid of color filters what is known as the Bayer Pattern. Hence, the images suffer from false colored spatial moir´e, which is further stressed by the downgrading fiber optic transmission yielding a honey comb pattern. To solve this problem we propose a new approach that extends the interpolation between known intensities of registered fibers to multi channel color applications. The inventive idea takes into account both the Gaussian intensity distribution of each fiber and the physical color distribution of the Bayer pattern. Individual color factors for interpolation of each fiber area make it possible to simultaneously remove both the comb structure from the fiber bundle as well as the Bayer pattern mosaicking from the sensor while preserving depicted structures and textures in the scene.
1
Introduction
Flexible endoscopy based on thin fiberscopes is widely used for medical diagnosis and therapy. To digitize and process the image data the eye-piece is connected to a video camera. For visualization and further computer aided applications, e.g. image enhancement, computer-aided diagnosis (CAD) or navigation through natural orifices, a preferably accurate and true color representation of the scene is desired. Particularly with regard to detection and classification of lesions, texture and color are important components for the correct decision. Commonly used single-chip cameras compose color images by means of a photo sensitive sensor array with Bayer pattern. On these regular grids (cf. fig. 1(d)), each pixel element is masked by a specific color filter. If a colored part of an image is not equally distributed to these three basic colors1, it cannot be accurately 1
The basic color components are red, green and blue, where green is represented twice, since the human eye is more sensitive to the according wavelength.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 360–367, 2007. c Springer-Verlag Berlin Heidelberg 2007
Physically Motivated Enhancement of Color Images for Fiber Endoscopy
361
demosaicing. This kind of moir´e becomes particularly worse in combination with color imaging through fiber bundles. The image bundle of a fiberscope contains a structured set of small fibers, usually between 5, 000 to 20, 000, depending on the working diameter of the fiberscope. Due to the hexagonal distribution of the fibers in the bundle and the Cartesian distribution of the sensor’s elements, the dimension of the digital sensor typically exceeds the number of fibers by a factor of 10 to 100 to avoid spatial moir´e. This results in the typical comb structure (see fig. 1(a)). If one fiber would be directly mounted to one sensor element, its intensity could be determined exactly, but due to the color pattern it only works for one color component. To obtain the color value we need several different sensor elements to be illuminated. The drawback hereof will be an unbalancing of color elements and therefore false colors will be unavoidable (see fig. 1(a-c)). Usually the contribution to each color is statistically distributed, but fiber transmission suffers from irregular occlusion through cladding and micro-shading. This work introduces a new approach taking into account both, the knowledge about the illumination center of every single fiber in the image on the sensor as well as the underlying color coding by the Bayer pattern. By processing the raw image data of the sensor it is possible to simultaneously remove both the comb structure from the fiber bundle and the Bayer pattern mosaicking from the sensor while preserving depicted structures and textures in the scene. Sect. 2 presents the state of the art of relevant imaging and image processing. The proposed method of physically motivated enhancement of color images
(a)
(c)
(b)
(d)
(e)
Fig. 1. Reference chart captured by quartz glass fiberscope (a) with enlarged section (b). Enlarged adjacencies of a single fiber f with assigned sensor elements SE (c). Schematic section of color pattern on CCD video sensor with denoted fiber boundary (d). Resulting color distribution after sampling section c by Bayer pattern d (c).
362
C. Winter et al.
for fiber endoscopy is described in Sect. 3. Experiments and results in Sect. 4 show the progress in visual quality of the approach for a typical example. The conclusions in Sect. 5 summarize the work and discuss future research.
2
Related Work
Literature covers several techniques to enhance the visual quality of fiber optic imaging. Algorithms which dispose of the comb structure can be categorized into two main classes with respect to the working domain. In the spatial domain there are attempts to recover the image by nonlinear diffusion [1], edge preserving smoothing and various algorithms with the idea of interpolation [2,3]. The subpixel accurate registration of fibers with respect to the physical motivation of the intensity distribution in their cross section can be used to apply super resolution approaches on non-uniform distributed grids [4,5]. In the frequency and wavelet domain, characterizable structures are reduced by dedicated bandpass filtering. [1] gives an overview of useful basic methods to remove typical structures from fiberscopic images. Manually parametrized [6] and also automatically adapted masks [7] can be used for fiber processing. The latter work suggests star shaped masks for optimal results with coherent glass fiber bundles. We consider this method as reference algorithm for filtering to evaluate our method. The importance of color [8] and several ideas of its calibration for endoscopy are pointed out in literature as well as comprehensive work about demosaicing of video data from sensors with Bayer pattern. However, no approach can been found which takes into account the information about the Bayer pattern on the image sensor to add knowledge about the formation and exact distribution of colors in each fiber’s image. Our work particularly utilizes this physical background to enhance color interpolation for fiber optic imaging. Other critical aspects of fiber imaging in endoscopy, like distortion and luminance correction are addressed in various articles but are not of special interest for this work.
3
Method
The physically motivated enhancement of color images in fiber endoscopy is performed in four steps, where the first two are for initialization. First all fibers are localized with subpixel accuracy on the sensor grid (Sec. 3.1). This information is then used to determine factors for local color correction (Sec. 3.2). Step three performs the false color removal (Sec. 3.3), while the last step interpolates on adjusted color values (Sec. 3.4). The result is a comb-free, true color image of the scene. 3.1
Fiber Registration
The method described in [3] is applied to determine each fiber’s center. Subpixel accuracy is obtained by fitting a Gaussian function to the intensity distribution
Physically Motivated Enhancement of Color Images for Fiber Endoscopy
363
of the linear interpolated green channel of an empty bright image, which is referred to as the white image. From a mathematical and physical point of view it is important to control the camera’s gain and shutter to just not saturate the image’s peaks2 . All fibers that are not saturated and fulfill the criteria of mean distance to the nearest neighbors (see [3] for details) are taken as valid and stored in a Delaunay grid to serve the following steps. 3.2
Color Distribution and Correction Factors
Determination of the local color distribution and calculation of according correction factors is done by the following procedure: 1. Assignment to closest fiber: With the information of fiber registration all sensor elements SE now can be related to their nearest fiber (see fig. 1(c)). From the amount F of all fibers, for the fiber f in the white image these are the elements SE(f ) with f ∈ F . 2. Accumulation of sensor intensities: For each channel i, the intensities I of all related sensor elements are accumulated to the calibration sum K. Ii (f ) with i ∈ {R, G, B} and f ∈ F Ki (f ) = SE(f )
ˆ ) 3. Determination of reference illumination: A reference illumination K(f is defined to compensate global intensity gradients and enhance the visual perception. Here we use a constant value, but in a more sophisticated version it is adapted to local information. 4. Calculation of correction factors: The intensity sums are referred to the reference level and result in the correction factor Ci (f ) = 3.3
ˆ ) K(f Ki (f )
with
i ∈ {R, G, B} and f ∈ F
False Color Reversal
The last two steps are performed for a whole sequence of images either from a live stream, an image archive or other image stacks. Without loss of generality they are described for a single image frame. 1. Intensity sum per channel: Similar to the accumulation of sensor intensities during the initialization, here for all related sensor elements SE(f ) of one fiber f the intensities I from within the current image are accumulated to the sum S: Si (f ) = Ii (f ) with i ∈ {R, G, B} and f ∈ F. SE(f ) 2
Meaning that the maximum pixel intensity is equal or lower than the sensor’s capacity.
364
C. Winter et al.
2. Real fibers’ intensity per channel: The corrected fiber intensity Iˆ for the channel i of the fiber f in the current image is given by the product of the sum Si (f ) and the correction factor Ci (f ). Iˆi (f ) = Si (f ) · Ci (f )
with i ∈ {R, G, B} and f ∈ F
3. Fiber relatedcolor value: The fiber f is now represented by the intensity triple IC (f ) = IˆR (f ), IˆG (f ), IˆB (f ) for f ∈ F 3.4
Color Interpolation
With the corrected color value of Sect. 3.3, the intensity based barycentric interpolation for fiberscopic data [3] is extended to color processing. The utilized Delaunay grid is supplemented with additional color information for each fiber in terms of the adjusted intensity triple ICk,f . Performing the image processing steps on the highest possible quantization within the technical specs of the camera sensor (e.g. 12 or 14 bit) and preserving the digital precision of intensity data until the final barycentric weighting ensures the least roundoff error in the result. To ensure real time performance we precalculate a lookup table (LUT) for fast access to fiber and color information for each output image pixel. Pixels which are not in between three valid fiber centers are skipped for processing the LUT.
4
Experimental Results
To exemplify the impact of the proposed method we use the example of human skin (fig. 2(A1)). Although particular details of the algorithm can be better understood by using calibration targets or artificial textures, we go for this surface as a good trade-off to show the handling with color and relief structure at the same time on texture tissue. We use a standard flexible endoscope with glass fiber image transmission (Sch¨ olly Fiberoptic GmbH ). The working diameter is 3.8mm and the direct sight offers a field of view of 80◦ . The working distance is between 5mm and 20mm. The technical specs specify the number of fibers in the optic bundle with 10, 000. A typical fiber registration as described in Sect. 3.1 locates about 95% of all fibers. This percentage is reasonable since from a technical point of view not all fibers are used for imaging and from a signal processing point of view some fibers are not suitable for registration since the distribution is not representative or separable. 99.96% of the detected fibers are taken as valid with respect to the criteria stated in 3.1. The mean distance between all valid fibers is μ = 5.20px (pixel) with a variance of σ = 0.652px. The effect of false colors depends on the relation between sensor resolution and the number of fibers. We use a typical scenario here but made similar experiences with various settings of cameras and different endoscopes. Figure 2(A1) shows the color image (with 8 bit color depth for each of the three channels) recorded by a digital video camera with dimension of 800×600px.
Physically Motivated Enhancement of Color Images for Fiber Endoscopy 1
2
365
3
A
B
C
Fig. 2. Plain sensor data of fiberscopic snapshot of human skin (Part A1) with denoted enlarged section (Part A2) and result of Canny filter on red channel (Part A3). The homogeneous fine structure in Part A3 represents the gradient of the comb structure. The same data preprocessed by state of the art methods (row B). The partly crossing structure in the Canny result (B3) shows the overlaying effect of false colored strips from moir´e. Row C gives an impression of the proposed approach without color irritation and with preserved relief texture of the hand illustrated by the Canny result in Part C3.
Due to the mounted fiberscope the visible part (aperture) is cropped to an extent of approx. ≈ 550px. The enlarged Part A2 shows that the image suffers from comb structure and strong mosaicking by the Bayer pattern. To depict the mixture of structures (hand relief, fiber comb and Bayer pattern), we use the Canny filter with constant lower (12) and upper (27) bound to post-process the red channel of the given sections. The mentioned effects in the plain data (fig. 2, top row) of course lead to a homogeneous fine structure (fig. 2(A3)) mainly representing the comb structure. Image content can hardly be detected.
366
C. Winter et al.
Row B is the result of state of the art image processing, namely optimal fiberscopic low pass filtering, contrast enhancement, white balancing and tonal value correction. An automatic application of the latter two steps leads to false results because the processing already relies on wrong color distributions between the fibers’ images. Although this work is not about color calibration and we do not care about some blue or red cast, we mention this effect here because it is responsible for the colored stripes overlaying the observed structure in the scene. Figure 2(B3) makes clear that in addition to the relief structure of the hand, several other components are mixed with the image. The proposed method of physically motivated enhancement for color interpolation is applied in row C. The particular difference to the state of the art can be seen in the enlarged parts fig. 2(C2) and fig. 2(C3). The grooves of the skin with an average distance of about 400μm can be separated clearly and appear without any colored texture fault. Manual adjustment of additional filters like in the previous processing is not necessary here. The color correction utilizes the knowledge of the position and intensity of each cell on the Bayer pattern and instead of inter-cell smoothing, the extended barycentric interpolation is performed to recover a pristine image. The aperture’s partly fringed border in the interpolation result (cf. fig. 2(C1)) results from the calibration step and can be reduced by smoothing the edge or completing the aperture’s circle by adding some virtual centers.
5
Conclusions and Future Work
Structural precision and color information are of great importance for decisionmaking in endoscopy based procedures, such as CAD, incisions or navigation. Acquiring and transmitting image data via flexible devices with fiber bundles and digitizing it by a commonly used single-chip camera results in undesirable comb and false colored mosaicking. We introduced a new approach that takes into account the physical construction of the Bayer pattern on the digital sensor and calculates individual color factors for each spatially registered fiber area. This information supplements the extended barycentric interpolation. By processing the raw image data of the sensor it becomes possible to remove both the comb structure from the fiber bundle as well as the Bayer pattern mosaicking from the sensor in one single step while preserving important structures and textures in the scene. Optimizations like lookup tables enable real time processing of fiberscopic transmitted color images. The approach will improve the visual perception in small orifices for decision making and will add real value to existing hardware in medical environment. Further improvement can be done in removing the effect of fringed borders in the interpolation result. We are currently working on the extension of a transmission model to consider the color coded mapping on the video sensor. This will help to quantify the improvement of our method. The color precision of the results will be approved by an independent concept of color calibration that could not be carried out yet. Also in-vivo performance remains to be assessed.
Physically Motivated Enhancement of Color Images for Fiber Endoscopy
367
In summary, the results of this work strongly suggests that an effective real time approach replacing several state of the art image processing steps is feasible to produce high quality colored fiberscopic images with digital sensors without false colors and moir´e effects from mosaicking.
Acknowledgement This work was supported by the Collaborative Research Center 603 (ModelBased Analysis and Visualization of Complex Scenes and Sensor Data) by the German Research Foundation (DFG).
References 1. Janssen, C.: Ein miniaturisiertes Endoskop-Stereomesssystem zur Str¨ omungsvisual isierung in Kiesbetten. Master’s thesis, Ruprecht-Karls-Univ. Heidelberg (2000) 2. Aslan, P., Kuo, R., Hazel, K., Babayan, R., Preminger, G.: Advances in digital imaging during endoscopic surgery. J. Endourology 13(4), 251–255 (1999) 3. Elter, M., Rupp, S., Winter, C.: Physically motivated reconstruction of fiberscopic images. In: IAPR/IEEE Intl. Conf. on Pattern Recognition, pp. 599–602. IEEE Computer Society Press, Los Alamitos (2006) 4. Lertrattanapanich, S., Bose, N.: High resolution image formation from low resolution frames using delaunay triangulation. IEEE Trans. Image Processing 11(12), 1427– 1441 (2002) 5. Winter, C., Elter, M., Rupp, S., Wittenberg, T.: Dynamic resolution enhancement for fiber optics. In: IASTED Intl. Conf. Signal and Image Processing, pp. 341–347 (2006) 6. Dickens, M.e.a.: Removal of optical fiber interference in color micro-endoscopic images. In: 11th IEEE Symp. Computer-Based Medical Systems, pp. 246–251. IEEE Computer Society Press, Los Alamitos (1998) 7. Winter, C., Rupp, S., Elter, M., M¨ unzenmayer, C., Gerh¨ auser, H., Wittenberg, T.: Automatic adaptive enhancement for images obtained with fiberscopic endoscopes. IEEE Trans. Biomed. Eng. 53(10), 2035–2046 (2006) 8. Vakil, N., Knyrim, K., Everbach, E.: The appreciation of colour in endoscopy. Baillieres Clin. Gastroenterol. 1(5), 183 (1991)
Signal LMMSE Estimation from Multiple Samples in MRI and DT-MRI S. Aja-Fern´ andez1,2 , C. Alberola-L´opez1 , and C.-F. Westin2 1
2
LPI, ETSI Telecomunicaci´ on, Universidad de Valladolid LMI, Brigham and Women’s Hospital,Harvard Medical School
Abstract. A method to estimate the magnitude MR data from several noisy samples is presented. It is based on the Linear Minimum Mean Squared Error (LMMSE) estimator for the Rician noise model when several scanning repetitions are available. This method gives a closed-form analytical solution that takes into account the probability distribution of the data as well as the existing level of noise, showing a better performance than methods such as the average or the median.
1
Introduction
Magnetic Resonance Imaging (MRI) or Diffusion Weighted MRI (DW-MRI) provide the possibility of acquiring several –and fairly aligned– images of the same slice or even of the same volume. The number of scanning repetitions is usually known as NEX (number of excitations). These multiple samples may be used to estimate the magnitude image, as a way to reduce the level of noise as well as other type of artifacts. Although in literature some methods based on estimators using the Rician model have been reported, as that based on Maximum Likelihood (ML) [1], due to their complexity this task is often done using a simple average or median operator. In this paper we propose an alternative Bayesian approach based on the Linear Minimum Mean Squared Error (LMMSE) estimator. If an accurate measure of the level of noise is feasible, the proposed estimator will be able to remove it more satisfactorily than the average and median operators and, although suboptimally with respect to the MLE, much more efficiently.
2
Rician Model and Signal Estimation
Due to the existence of uncorrelated Gaussian noise with zero mean and the same variance in both the real and imaginary parts of the complex k-space data, the magnitude signal of MR data may be modeled following a a Rician distribution, whose probability distribution function (PDF) for a 2D signal is as follows [2] pM (Mij |Aij , σn ) =
Mij − e σn2
2 +A2 Mij ij 2 2σn
I0
Aij Mij σn2
u(Mij )
(1)
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 368–375, 2007. c Springer-Verlag Berlin Heidelberg 2007
Signal LMMSE Estimation from Multiple Samples in MRI and DT-MRI
369
being I0 (.) the 0th order modified Bessel function of the first kind, u(.) the Heaviside step function and σn2 the variance of noise. Mij is the magnitude value of the pixel {i, j} and Aij the original value of the pixel without noise. Several samples of each slice will be considered, being Mij [k] the k-th scanning repetition of pixel {i, j} in the actual slice. These repetitions are usually fused using an average or median operator. The effect of using the average operator may be easily observed in the areas of low SNR, like the background, where the Rician distribution tends to be Rayleigh. After the estimation, the signal value in the background pixels should be zero. However, when using the average operator, π/2. In a similar this value tends to be the mean of a Rayleigh PDF [2], i.e. σ n √ way, the median of a Rayleigh PDF is σn 4. In both cases, although there may be a smoothing of the noisy region, there is also a bias related with σn in the output values. One feasible option is to use a ML estimator for the magnitude data [1], ML = which is defined for multiple samples following a Rician distribution as A arg maxA {log L}, being log L the log-likelihood function [1,3]. As this equation cannot be solved analytically, the maximum of the log-likelihood function must be found numerically. This task is computationally expensive, the more expensive the higher the number of images to be processed, especially when working with DTI data, with multiple slices and multiple gradient directions. Alternative methods to solve the ML estimator have been proposed, as the one based in Expectation-Maximization [3] or the work by Fillard et al. [4]. Other works use the Maximum a Posteriori, as Basu et al. [5]. We now propose a different approach to estimate the signal from the magnitude image, based on the LMMSE estimator. Instead of modeling A as unknown constant, we will consider it as a realization of a random variable which is functionally related to the observation. Although this approach may be suboptimal with respect to the MLE, the fact that a closed-form analytical solution is achievable, makes the whole process faster and more suitable when working with large ML would amount of data -like DTI- where an optimization method to search A be too slow.
3
LMMSE Estimation from Multiple Noisy Samples
The LMMSE estimator of a parameter θ using multiple samples is defined [6] θˆ = E{θ} + Cθx C−1 xx (x − E{x})
(2)
being C the covariance matrices and x the vector of available samples. The moments of the Rician distribution have a non-trivial integral expression. However, the even-order moments, are simple polynomial. In order to achieve a closedform expression we will use A2 instead of A. Consequently, all the moments to be used will be even. With this assumption in mind, the LMMSE estimator for the Rician distribution is 2 −1 2 = E{A2 } + C 2 2 C (3) Mij − E{M2ij } A Aij Mij ij ij M2 M2 ij
ij
370
S. Aja-Fern´ andez, C. Alberola-L´ opez, and C.-F. Westin
For the sake of simplicity we will suppose that all the equations are pixelwise, removing the subindexes {i, j}. Assuming that N measures are taken of every pixel, M = [M [1] M [2] · · · M [N ]]T is the measure vector. M2 must be understood element-wise, i.e. M2 = [M 2 [1] · · · M 2 [N ]]T . CM 2 M 2 is the N × N covariance matrix of M2 , defined T CM 2 M 2 = E{ M2 − E{M2 } M2 − E{M2 } } After some algebra and replacing expectations by their sample estimator ., we can finally write the covariance matrix as CM 2 M 2 = M4 + 4σn4 − 4σn2 M2 − M2 2 1N 1TN − 4σn4 − 4σn2 M2 IN (4) a M [k], 1 is an all 1 vector of length N , and I is the where Ma = N1 N N N k=1 N × N identity matrix. Matrix CA2 M 2 is T CA2 M 2 = E{ A2 − E{A2 } M2 − E{M2 } } = M4 + 4σn4 − 4σn2 M2 − M2 2 1TN Finally, for each point in the image, the estimator will be
2 = M2 − 2σ 2 + CA2 M 2 C−12 2 M2 − M2 A n M M
(5)
This equation must be understood pixelwise (say Mij [k] and Aij ) in the two dimensional case or voxelwise (say Mijl [k] and Aijl ) in the three dimensional case. Note that the value of the variance of noise σn2 value must be properly estimated somehow. Several methods have been reported in literature [7]. New robust methods are also emerging for this task, making the proposed method useful.
4
Validation
Some synthetic experiments have been carried out in order to validate the LMMSE estimator previously introduced. Firstly we will compare it with other fusion methods over a single MR image. The image in Fig. 1-(a) will be considered as the ground truth. The image is corrupted with synthetic Rician noise with different values of σn . For each value, 10 independent noisy images are created, say In [k], k = 1, · · · , 10. This images are fused using the following methods: average of the images, (Ia ), median of the images (Im ) and LMMSE estimator of eq. (5), say Ie . To compare the performance of the different approaches two structural quality indexes have been used: the Structural Similarity (SSIM) index [8] and the Quality Index based on Local Variance (QILV) [9]. Both give a measure of the structural similarity between the golden standard and the other images. However, the former is more sensitive to the level of noise in the image and the latter to any possible blurring of the edges. Both indexes are bounded; the closer to 1,
Signal LMMSE Estimation from Multiple Samples in MRI and DT-MRI
(a)
(b)
371
(c)
Fig. 1. Data used for the experiments. (a) MR image (256 gray levels) from BrainWeb database (http://www.bic.mni.mcgill.ca/brainweb/). (b) Synthetic 2D tensor field. (c) Noisy 2D tensor field, σn = 100 (SNR=2.62dB). Figures (b) and (c) created using Teem software. (http://teem.sourceforge.net/). Table 1. Quality measures. Average of 100 realizations with σn = 10. The LMMSE shows the better results in the structural measures, and the lower MSE and level of noise. In Ia Im Ie
MSE 158.5421 100.0603 93.4674 33.7761
SSIM 0.3882 0.4723 0.4693 0.5227
QILV 0.9890 0.9918 0.9932 0.9986
(a) Whole Image
LN 9.9517 9.9887 9.4919 5.1882
In Ia Im Ie
MSE 112.4711 31.6995 32.8593 15.6514
SSIM 0.7362 0.9090 0.8983 0.9209
QILV 0.9882 0.9912 0.9924 0.9982
LN 9.9517 9.9887 9.4919 5.1882
(b) Image without background
the better the quality. In addition we will also use the Mean Square Error (MSE) σn )o and the Level on Noise (LN) in the output image, measured as LN = (ˆ (σn )i , being (σˆn )o the estimated standard deviation of noise in the resulting image and (σn )i the standard deviation of noise in the original image. The average of 100 realizations for each method and each value of σn has been considered. Results are on Fig. 2 and Fig. 3-(a). The whole image and the image without background have been considered separately. As an illustration, numerical results for σn = 10 are shown in Table 1. The proposed method shows a better performance in all the indexes and for all the levels of noise. It is the one with the lower MSE and the higher values of SSIM and QILV. When comparing the LN, it is the only one in which a reduction of noise is noticeable. However, for small levels of noise (say σn < 5 for a 256 gray-level image), the use of the average and the median may be a feasible alternative. As we have stated previously, the method here presented is suboptimal with respect to a nonlinear counterpart or even the MLE. With respect to the latter, Fig. 3-(b) shows some comparative results for different values of σn . However, a numerical optimization is needed for the MLE, a fact that is avoided with our alternative simple (and linear) solution.
372
S. Aja-Fern´ andez, C. Alberola-L´ opez, and C.-F. Westin
1 1
1.6
0.98
0.9
1.4
0.96 1.2
0.94
0.7
0.6
0.92
MSE/σ2n
QILV index
SSIM index
0.8
0.9 0.88
1 0.8 0.6
0.86
0.5
0.4
Original Average Median LMMSE 5
0.84 0.82
10
σn
15
20
0.8
25
(a) SSIM (whole image)
0.4
Original Average Median LMMSE 5
0.2
10
σn
15
20
0
25
(b) QILV (whole image)
(c)
5
10
MSE/σn2
σn
15
20
25
(whole image)
0.8 Original Average Median LMMSE
0.7
1
1.2
0.98 1
0.96 0.94
0.5
0.4
0.8
0.92
MSE/σ2n
QILV index
SSIM index
0.6
0.9
0.4
0.86 0.84
0.3
0.82
0.2
5
10
σn
15
20
25
0.6
0.88
0.8
Original Average Median LMMSE 5
(d) SSIM (without back- (e) QILV ground) ground)
0.2
10
σ
15
20
25
0
5
10
n
(without
MSE/σn2
back- (f) ground)
σn
15
20
25
(without back-
Fig. 2. Quality measures of the resulting fused images. In all the cases the proposed method shows a better performance.
When working with DT-MRI, some scalar measures like the Fractional Anisotropy [10] are directly related to the eigenvalues of the diffusion tensors. To study the effect of the proposed method over these eigenvalues, a synthetic data set has been created: a 128 × 128 2D tensor field, as shown in Fig. 1-(b), where tensors are depicted using ellipses. Tensors with three different eigenvalue combinations were chosen λa = [1.9 10−3 , 0.4 10−3 ] λb = [2 10−3 , 0.1 10−3 ], λc = [2 10−3 , 2 10−3 ]} and the diffusion weighted images (DWI) were simulated using the StejskalTanner equation [10,11]. Different number of gradients have been considered, with a constant baseline with a level of 1000. The DWI are corrupted with Rician noise, Fig. 1-(c), and the tensors are re-estimated, using a Least Squares approach. Different values of σn have also been used. For the experiments 3, and 15 gradient directions, σn ∈ [30, 210] and N = 10 will We 2be 2considered. S , and we will /σ define the signal to noise ratio as SNR (dB) = 10 log 10 n
2 , in our case S 2 = 1.83 104 . define the power of the signal as S 2 = min Si,j,k The error is defined as the absolute distance of the estimated eigenvalues to the original values. For each number of gradients and each SN R value the average of 100 experiments is considered. In Fig. 4 the mean and the standard deviation of the error are shown. From the results it can be seen that in all the cases the bias and the variance of the estimation is smaller when using the LMMSE
Signal LMMSE Estimation from Multiple Samples in MRI and DT-MRI 1.1
373
0.5 ML LMMSE Average
0.45
1
0.4
0.9 0.35 0.3
LN
MSE/σ2n
0.8 0.7
0.25 0.2
0.6
0.15 0.1
0.5 0.05
0.4
5
10
σn
15
20
0 10
25
15
(a) Level of Noise
(b)
20
σn
25
MSE/σn2
Fig. 3. (a) Level of noise: estimated standard deviation of noise of the output image normalized by the standard deviation of noise in the original image. Same legend as Fig. 2. the LMMSE scheme is the only one that shows a significant reduction in the level of noise. (b) Comparison of estimators using the MSE/σn2 of the image without background. −4
7
x 10
−4
4.5
Noisy LMMSE Average MLE
6
Noisy LMMSE Average MLE
4 3.5
5
3
4
std(Error)
Mean(Error)
x 10
3
2.5 2 1.5
2
1
1 0
0.5
12
10
8
6 SNR
4
2
0
0
−2
(a) Error mean, 3 direc.
12
10
8
6 SNR
4
2
0
−2
(b) Error STD, 3 direc.
−4
7
x 10
−4
4.5
Noisy LMMSE Average MLE
6
Noisy LMMSE Average MLE
4 3.5
5
3
4
std(Error)
Mean(Error)
x 10
3
2.5 2 1.5
2
1
1 0
0.5
12
10
8
6 SNR
4
2
0
−2
(c) Error mean, 15 direc.
0
12
10
8
6 SNR
4
2
0
−2
(d) Error STD, 15 direc.
Fig. 4. Mean and standard deviation of the eigenvalue estimation error for 3 and 15 gradient directions
filter. Note that the effect of using more gradient directions positively affects to the variance or error of the original data and the data fused with LMMSE, but it hardly affects to the variance of the error when using the average. As in the previous experiment, results are just slightly better for the MLE but again a numerical optimization is needed to obtain the solution. In our experiments– using an optimized MATLAB code–, for 10 samples and 15 gradients LMMSE
374
S. Aja-Fern´ andez, C. Alberola-L´ opez, and C.-F. Westin −5
−4
2.6
x 10
2.2 2
x 10
3 grad. 10 samples 5 grad. 6 samples 6 grad. 5 samples 10 grad. 3 samples 15 grad. 2 samples 30 grad. 1 sample
14 12
1.8
std(Error)
mean(Error)
16 3 grad. 10 samples 5 grad. 6 samples 6 grad. 5 samples 10 grad. 3 samples 15 grad. 2 samples 30 grad. 1 sample
2.4
1.6 1.4
10 8
1.2 1
6
0.8
4
0.6 14
12
10
8 SNR (dB)
6
(a) Error Mean
4
2
14
12
10
8 SNR (dB)
6
4
2
(b) Error STD
Fig. 5. Mean (left) and standard deviation (right) of the error of estimation of the eigenvalues for different combination of number of gradient direction and number of samples. LMMSE estimator has been used for fusion.
Fig. 6. Fusion of MR images from an EPI volume. Original (left) and filtered (right).
is about 28 times faster than MLE using the EM method [3]. In Fig. 5 the same measures are shown again for LMMSE, this time for a constant number of 30 scans, distributed in multiple samples and multiple sample directions. Although a larger number of gradients always improves the estimation, if only one sample is used the results show a bias when the SNR decreases. Finally, to show the performance of the proposed scheme over real data, we have chosen a SENSE EPI data set, scanned in a 3.0 Tesla GE system, 51 gradient directions, 8 baselines, SENSE EPI. Voxel dimensions: 1.7 x 1.7 x 1.7 mm. We have selected an axial slice and fusing the 8 baselines using the LMMSE estimator. Results are on Fig. 6. Most of the noise in the original image has been removed after fusion. All the internal structures has been preserved, as well as the edges.
5
Conclusions
This paper introduced a method to estimate the magnitude signal from several acquisitions of both MRI and DWI based on the LMMSE estimator. To that end, a Rician assumption has been made. The proposed filtering method outperformed the average and the median methods, specially for moderate and high noise levels. Although a ML approach may have a slightly better performance
Signal LMMSE Estimation from Multiple Samples in MRI and DT-MRI
375
in terms of error, the LMMSE provides a closed-form analytical solution, which is faster and more suitable when working with large amounts of data. Reducing the number of operations per pixel is a task of paramount importance when these large data sets must be dealt with. In addition, when working with DTI, the proposed method has also shown an important reduction in the bias of the first eigenvalue of the diffusion tensor, which makes some scalar measures like the Fractional Anisotropy more reliable.
Acknowledgements The authors acknowledge the CICyT for research grants TIC2001-3808-C02-02 and TEC2004-06647-C03-01, the FIS for grants G03/135 and PIO-41483, the European Commission for the funds associated to the NoE SIMILAR (FP6507609)m the MEC/Fulbright Comission for research Grant FU2005-0716 and NIH for grants R01MH074794 and P41RR13218.
References 1. Sijbers, J., den Dekker, A.J.: Maximum likelihood estimation of signal amplitude and noise variance form MR data. Magn. Reson. Imag. 51, 586–594 (2004) 2. Simon, M.K.: Probability distributions involving Gaussian random variables. Kluwer Academic Publishers, Boston, MA (2002) 3. DeVore, M.D., Lanterman, A.D., O’Sullivan, J.A.: ATR performance of a Rician model for SAR images. In: Proc. of SPIE 2000, ISSU 4050, Orlando, pp. 34–37 (2000) 4. Fillard, P., Arsigny, V., Pennec, X., Ayache, N.: Clinical DT-MRI estimation, smoothing and fiber tracking with Log-Euclidean metrics. In: IEEE International Symposium on Biomedical Imaging, pp. 786–789. IEEE Computer Society Press, Los Alamitos (2006) 5. Basu, S., Fletcher, T., Whitaker, R.: Rician noise removal in diffusion tensor MRI. In: Proceedings of MICCAI, vol. 1, pp. 117–125 (2006) 6. Kay, S.M.: Fundamentals of statistical signal processing: Estimation theory. Prentice Hall, New Jersey (1993) 7. Sijbers, J., den Dekker, A.J., Van Audekerke, J., Verhoye, M., Van Dyck, D.: Estimation of the noise in magnitude MR images. Magn. Reson. Imag. 16(1), 87–90 (1998) 8. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004) 9. Aja-Fern´ andez, S., San-Jos´e-Est´epar, R., Alberola-L´ opez, C., Westin, C.: Image quality assesment based on local variance. In: Proc. of the 28th IEEE EMBC, New York, USA, pp. 4815–4818. IEEE Computer Society Press, Los Alamitos (2006) 10. Westin, C.F., Maier, S.E., Mamata, H., Nabavi, A., Jolesz, F.A., Kikinis, R.: Processing and visualization for diffusion tensor MRI. Medical Image Analysis 6, 93– 108 (2002) 11. Stejskal, E.O., Tanner, J.E.: Spin Diffusion Measurements: Spin Echoes in the Presence of a Time-Dependent Field Gradient. Journal of Chemical Physics 42, 288–292 (1965)
Quantifying Heterogeneity in Dynamic Contrast-Enhanced MRI Parameter Maps C.J. Rose1 , S. Mills1 , J.P.B. O’Connor1 , G.A. Buonaccorsi1, C. Roberts1 , Y. Watson1 , B. Whitcher2 , G. Jayson3 , A. Jackson1, and G.J.M. Parker1 1
3
Imaging Science & Biomedical Engineering, The University of Manchester, United Kingdom 2 MRI Modelling, GlaxoSmithKline, London, United Kingdom Cancer Research UK Dept. of Medical Oncology, Christie Hospital, Manchester, United Kingdom
Abstract. Simple summary statistics of Dynamic Contrast-Enhanced MRI (DCE-MRI) parameter maps (e.g. the median) neglect the spatial arrangement of parameters, which appears to carry important diagnostic and prognostic information. This paper describes novel statistics that are sensitive to both parameter values and their spatial arrangement. Binary objects are created from 3-D DCE-MRI parameter maps by “extruding” each voxel into a fourth dimension; the extrusion distance is proportional to the voxel’s value. The following statistics are then computed on these 4-D binary objects: surface area, volume, surface area to volume ratio, and box counting (fractal) dimension. An experiment using 4 low and 5 high grade gliomas showed significant differences between the two grades for box counting dimension computed for extruded ve maps, surface area of extruded K trans and ve maps and the volume of extruded ve maps (all p < 0.05). An experiment using 18 liver metastases imaged before and after treatment with a vascular endothelial growth factor (VEGF) inhibitor showed significant differences for surface area to volume ratio computed for extruded K trans and ve maps (p = 0.0013 and p = 0.045 respectively).
1
Introduction
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has been used in both clinical and research evaluation of tumour biology and response to therapy [3]. Using DCE-MRI tracer-kinetic modelling [7], tumour voxels can be parameterised by K trans (the volume transfer coefficient of contrast agent between the blood pool and the extravascular space), vp (the relative volume of the blood plasma space) and ve (the relative volume of the extravascular extracellular space). These quantities can be mapped to their spatial locations to yield parametric maps of tumour characteristics (see Figure 1). Most studies that incorporate DCE-MRI report either baseline values of, or changes in, simple summary statistics of the kinetic parameters (e.g. median K trans ). These approaches neglect spatial heterogeneity within the tumours, despite this being an important feature of most solid tumours that may relate to N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 376–384, 2007. c Springer-Verlag Berlin Heidelberg 2007
Quantifying Heterogeneity in DCE-MRI Parameter Maps
377
Fig. 1. K trans maps. Top row: slices through a low grade (left) and a high grade (right) glioma (data range: 0-0.16 min−1 ). Bottom row: corresponding slices through a liver metastasis before (left) and after (right) treatment with a VEGF inhibitor (data range: 0-0.45 min−1 ). The right-most column shows the tumours rendered as 3-D surfaces with height representing the K trans value at each location in the slices.
treatment choice and drug efficacy. High values of K trans indicate relatively high perfusion and vessel permeability. Figure 1 shows slices through K trans maps of low and high grade gliomas and a liver metastasis before and after treatment with a VEGF inhibitor. Compared to the low grade glioma, the high grade is larger, has a wider range of K trans values and—importantly—has a much more complex internal structure. The liver metastasis shows clear non-uniform changes in K trans values after treatment, particularly in the periphery. Statistics that capture heterogeneity will enable change in parameter values and/or their spatial arrangement to be quantified. Texture analysis underpins much of image analysis and is clearly of interest to researchers working on MRI [2]. However, an assumption is made when utilising texture measures that the signal being analysed is, to a large degree, a repeating pattern subject to some degree of randomness. Figure 1 demonstrates that tumour DCE-MRI parameter maps do not satisfy this assumption and so traditional texture analysis methods may be unsuitable. An alternative strategy is to treat parameter maps as objects rather than textures and seek statistics which describe the form of these objects [1]. Figure 2 shows two signals that have the same mean but differ in heterogeneity. Rendering the signals as surfaces (curves in this 1-D example) shows how measures of object complexity may distinguish them. This paper develops heterogeneity statistics based on such a representation and applies them in two experiments using real data.
378
C.J. Rose et al.
Fig. 2. Two 1-D signals that have the same mean value. The image on the left shows the grey-level values of the signals as an image; the plot on the right shows the signals as curves. The top signal on the left (plotted as a solid line on the right) is more heterogeneous than the other.
2
Quantifying Heterogeneity
Signals defined on a d-dimensional lattice can be represented as surfaces in d + 1 dimensions, where surface height is proportional to the signal value. DCE-MRI parameter maps are defined on 3-D lattices. Therefore we create a 4-D object by “extruding” each enhancing voxel of the original 3-D parameter map into a fourth dimension, using a 4-D binary array to represent the extruded object. The distance each voxel is extruded is proportional to its DCE-MRI model kinetic parameter (analogous to the lower-dimensional examples shown in Figure 1 and Figure 2). The distributions of DCE-MRI tracer kinetic modelling parameters are typically heavily positively skewed. To make optimal use of the available dynamic range of the extrusion dimension, we map from model parameter to surface height using non-linear functions fK trans , fvp and fve . For example fvp (vp ) = rDvp (vp ) where: r is the maximum number of elements in the extrusion dimension; Dvp is the cumulative distribution function for vp , learned from a training set of approximately 20,000 DCE-MRI parameters; and x is the ceiling function. Given a 4-D object we compute the following statistics: surface area, volume, ratio of surface area to volume and box-counting dimension [6]. The methods used to compute these statistics are explained in Section 2.1 and Section 2.2. 2.1
Computing Surface Area
While surface area is conceptually simple, a suitable algorithm for computing this quantity for a 4-D object may not be immediately obvious. The surface area of an object composed of N elements is maximised when all object elements are mutually unconnected. This maximum surface area is sM = N sT , where each element has a total surface area of sT . In general, however, object elements are connected to one another and some potential area, sC , is “lost” to these internal connections. The surface area of the object is therefore s = sM − sC and we seek a way of computing sM − sC . Figure 3 shows a 2-D object which has isotropic pixels of unit size (i.e. each pixel has a total surface area of 4). Each pixel is labelled with the surface area that the pixel cannot contribute to the surface area
Quantifying Heterogeneity in DCE-MRI Parameter Maps
379
Fig. 3. A 2-D object is shown on the left (shaded region) and each object pixel is labelled with the surface area that is “lost” due to neighbouring pixels (this example uses isotropic pixels of unit size). A convolution kernel is shown on the right (see Eqn. 2).
of the object as a whole due to connections with neighbouring object pixels. The surface area of the object is therefore sM − sC = (6 × 4) − 12 = 12 units. Let X be an n-dimensional array where a value of 1 indicates the element belongs to the object and 0 indicates otherwise. Let F be an n-dimensional 3 × 3 × · · ·× 3 convolution kernel with each connecting element set to the surface area that would be lost due to a connection in the corresponding direction—as shown on the right in Figure 3. The array K containing the surface areas involved in connections—as shown on the left in Figure 3—is computed as: K = X × (X ∗ F )
(1)
where × denotes element-wise multiplication and ∗ denotes convolution (after padding the edges with zeros)1 . The total surface area of an element—sT —is conceptually straightforward, but it is worth noting how this quantity can be computed in arbitrary dimensions. An n-cuboid has n pairs of faces of equal area. sT can be determined by computing the areas for each pair of faces. Each area is the product of the side lengths of the n-cuboid’s sides, excluding one of these lengths (corresponding to a connection direction). Let u be a vector of the side lengths of an n-cuboid. The surface area of a face in direction i is therefore: n uj . (2) si = j=1,j=i
The total surface area of the n-cuboid is therefore: n si . sT = 2
(3)
i=1
Given an array K computed for an object with N elements described by X , the surface area, s, of the object is: s = sM − sC =
N
sT − ki
(4)
i=1
where i is simply an index into the object elements of K . 1
A correction to Equation 1 is required if the object contains single elements which are not connected to the object but are nevertheless considered to be part of it, but further exposition is beyond the scope of this paper.
380
2.2
C.J. Rose et al.
Computing Additional Complexity Statistics
In addition to surface area, we compute the volume of the 4-D object, the ratio of surface area to volume and n the box-counting fractal dimension [6]. Volume is simply computed as v = N i=1 ui (i.e. the number of elements multiplied by the volume of a single element). The box-counting dimension, d0 , is a well-established statistic and is computed by investigating how the number of elements, C(), required to represent an object changes with their size, : d0 = − lim
→0
log C() . log
(5)
The array which represents the 4-D object is composed of the natural 3-D imaging grid and that of the imposed extrusion dimension. Box counting is performed by subsampling the 4-D array and proceeds from the finest scale to the coarsest.
3
Experiment I: Grading Gliomas
Gliomas are graded based upon their histological appearance—necessitating a highly invasive procedure—using criteria set out by the World Health Organisation (WHO) [4]. Grade has been shown to relate to DCE-MRI parameter summary statistics [5] but does not—in general—correlate with tumour volume and grade cannot currently be determined using conventional routine clinical imaging. Subjectively, however, grade appears to relate to DCE-MRI parameter map heterogeneity. High grades are characterised by both hypocellularity (reduced cell density due to necrosis) and hypercellularity (increased cell density due to malignancy), leading to heterogeneity in histology and, we assume, in imaging. Gliomas therefore provide a good candidate for the kind of problem described in Section 2. We hypothesised that the methods described above would be sensitive to this heterogeneity. 3.1
Patients and Imaging Protocol
Nine adult patients with gliomas were recruited and underwent T1 -weighted DCE-MRI on a 3T Philips Achieva MR scanner (Philips Medical Systems, Best, Netherlands)2 . The study was approved by the local Research Ethics Committee and all patients gave written informed consent. All MR imaging was performed before surgery (for tumour resection or debulking). All tumours were histologically confirmed to be gliomas and were graded according to criteria set out by the WHO (low grade (II) n = 4, high grade (III and IV) n = 5). While not generally true of the population, in this study the high grade tumours were approximately 1 cm in diameter larger than the low grade tumours. All data were subjected to a routine quality assurance (QA) procedure. 2
The authors would like to acknowledge Sha Zhao, who developed the 3T DCE-MRI protocol.
Quantifying Heterogeneity in DCE-MRI Parameter Maps
3.2
381
Method
The nine tumours were manually segmented by an expert radiologist and values of K trans , ve and vp were computed at each tumour voxel from the DCE-MR image sequences using the following tracer kinetic model [7]: t −K trans (t−t ) ve Cp (t )e dt . (6) Ct (t) = vp Cp (t) + K trans 0
The functions Ct and Cp give the concentration of tracer in the voxel as a whole and in the voxel blood plasma respectively at time t. Each resulting parameter map was converted into its 4-D representation as described in Section 2 and surface area, volume, surface area to volume ratio and box-counting dimension were computed. Differences between the statistics for the two groups (low and high grades) were investigated using Wilcoxon rank sum tests. 3.3
Results
There were significant differences between the groups for: box dimension computed for extruded ve parameter maps; surface area of extruded K trans and ve parameter maps; and the volume of extruded ve parameter maps (all p < 0.05 without correction for multiple comparisons); see Figure 4(a). The other differences were not significant. 3.4
Discussion
In this experiment, all statistics have been used as absolute estimates of heterogeneity (cf. Section 4.4). Surface area- and volume-based statistics will be biased by tumour size, and so may not in general distinguish glioma grade. Box-counting dimension should be invariant to tumour size, although in practice object size can limit the precision with which this statistic can be estimated. We investigated the effect that object size has on box-counting dimension. 1000 random 4-D binary arrays were generated, using a similar array size to those for the high grade (larger) gliomas. The box-counting dimension was then computed for each object. The objects were then subsampled down to approximately the same size as the low grade (smaller) gliomas, and box-counting dimension was computed for these. Pairwise differences in box-counting dimension were then computed. The differences in box-counting dimension were approximately a quarter of the size of those observed for the real glioma data, suggesting that tumour size cannot explain the differences between the two groups. Box-counting dimension on extruded ve maps may therefore allow gliomas to be graded using a non-invasive procedure.
4
Experiment II: Quantifying Change in Heterogeneity
The following experiment demonstrates how the statistics described in the paper could be used in the drug trial setting to investigate heterogenous changes within the tumour microvasculature following administration of a VEGF inhibitor.
382
4.1
C.J. Rose et al.
Patients and Imaging Protocol
Four patients with a total of 25 liver metastases (n1 = 10, n2 = 1, n3 = 7, n4 = 7) were recruited and underwent T1 -weighted DCE-MRI imaging on a Philips Intera 1.5T scanner. The study was approved by the local Research Ethics Committee and all patients gave written informed consent. Two pre-treatment baseline scans were performed in the week preceding the administration of the VEGF inhibitor (though the initial baseline was missing for one patient, n4 = 7). A post-treatment scan was then performed. All data were subjected to a routine QA procedure. Seven tumours were rejected due to poor data quality, leaving 18 tumours. Three of the remaining 54 scans were also rejected by QA due to substantial motion artefacts. 4.2
Method
The tumours were manually segmented by an expert radiographer. Maps of K trans , ve and vp were computed from the DCE-MR image sequences and converted to their 4-D representations as described in Section 3.2. Surface area, volume, surface area to volume ratio and box-counting dimension were then computed for each tumour at each time point. The tumours were assumed to be independent and non-parametric ANOVAs (Kruskal Wallis) were performed to investigate group differences. Post hoc Wilcoxon sign rank tests were performed as appropriate. 4.3
Results
There were significant differences in surface area to volume ratio for extruded K trans and ve maps (p = 0.0013 and p = 0.045 respectively). Post hoc testing revealed significant differences between first baseline and post-treatment scans
Fig. 4. (a) Boxplots showing the difference in distribution between box-counting dimension statistics computed for extruded ve parameter maps for low- and high-grade gliomas. (b) Boxplots showing the difference in distribution between surface area to volume ratio statistics computed for extruded K trans parameter maps before and after treatment with a VEGF inhibitor. Note that baseline 1 lacks data for one patient (7 data points); removing the corresponding data points from baseline 2 confirms the missing data to be the cause of the tighter distribution at baseline 1.
Quantifying Heterogeneity in DCE-MRI Parameter Maps
383
(p = 0.004) and second baseline and post-treatment (p < 0.001) scans (without correction for multiple comparisons). As expected, there were no significant differences in heterogeneity statistics between the baseline scans. See Figure 4(b). The other differences were not significant. 4.4
Discussion
All statistics in this experiment have been used as relative estimates of heterogeneity (i.e. we are interested in a change in a statistic for a particular tumour; cf. Section 3.4). While the effects of the VEGF inhibitor can be observed using simple summary statistics, the results clearly illustrate that heterogeneity statistics are sensitive to known treatment effects. Heterogeneity statistics may therefore be useful in the evaluation of pharmaceutical therapies.
5
Conclusions
This paper has described DCE-MRI-based statistics of heterogeneity. Significant differences were found in some of these between low and high grade gliomas, suggesting that it might be possible to grade gliomas using a non-invasive imagingbased technique (e.g. as an adjunct to histology). Heterogeneity statistics were also sensitive to the effects of a VEGF inhibitor. However, further work is required to determine their ability to provide information that is unavailable from “standard” DCE-MRI summary statistics. There was no combination of model parameter and heterogeneity statistic common to both experiments that provided significant discrimination. However, it is interesting that heterogeneity statistics based upon ve were able to provide significant discrimination in both experiments—ve is often considered to be of little physiological importance compared to K trans and vp . We hypothesise that the observed differences in heterogeneity are due to differences in cellular packing density. Further work is required to elucidate the generality of this observation.
References 1. Dzik-Jurasz, A., Walker-Samuel, S., Leach, M.O., Brown, G., Padhani, A., George, M., Collins, D.J.: Fractal parameters derived from analysis of DCE-MRI data correlates with response to therapy in rectal carcinoma. In: Proc. Intl. Soc. for Mag. Reson. Med., p. 2505 (2004) 2. Hajek, M., Dezortova, M., Materka, A., Lerski, R. (eds.): Texture Analysis for Magnetic Resonance Imaging. Med4publishing (2006) ISBN: 80-903660-0-7 3. Jackson, A., Buckley, D.L., Parker, G.J.M.: Dynamic Contrast-Enhanced Magnetic Resonance Imaging in Oncology. In: Medical Radiology: Diagnostic Imaging, Springer, Heidelberg (2005)
384
C.J. Rose et al.
4. Kleihues, P., Louis, D.N., Scheithauer, B.W., et al.: The WHO Classification of Tumors of the Nervous System. J. Neuropathol. Exp. Neurol. 61(3), 215–225 (2002) Discussion 226–9 5. Patankar, T.F., Haroon, H.A., Mills, S.J., et al.: Is Volume Transfer Coefficient (K trans ) Related to Histological Grade in Human Gliomas? Am. J. Neuroradiol. 26(10), 2455–2465 (2005) 6. Peitgen, H.-O., J¨ urgens, H., Saupe, D.: Chaos and Fractals. Springer, Heidelberg (2004) 7. Tofts, P.S.: Modeling Tracer Kinetics in Dynamic Gd-DTPA MR Imaging. J. Magn. Reson. Imaging 7(1), 91–101 (1997)
Improving Temporal Fidelity in k-t BLAST MRI Reconstruction Andreas Sigfridsson1,2,3, Mats Andersson2,3 , Lars Wigstr¨ om1,3 , 1,3 2,3 John-Peder Escobar Kvitting , and Hans Knutsson 1
Division of Clinical Physiology, Department of Medicine and Care 2 Department of Biomedical Engineering, 3 Center for Medical Image Science and Visualization (CMIV), Link¨ oping University, Sweden
Abstract. Studies of myocardial motion using magnetic resonance imaging usually require multiple breath holds and several methods have been proposed in order to reduce the scan time. Rapid imaging using k-t BLAST has gained much attention with its high reduction factors and image quality. Temporal smoothing, however, may reduce the accuracy when assessing cardiac function. In the present work, a modified reconstruction filter is proposed, that preserves more of the high temporal frequencies. Artificial decimation of a fully sampled data set was used to evaluate the reconstruction filter. Compared to the conventional k-t BLAST reconstruction, the modified filter produced images with sharper temporal delineation of the myocardial walls. Quantitative analysis by means of regional velocity estimation showed that the modified reconstruction filter produced more accurate velocity estimations.
1
Introduction
In the setting of myocardial ischemia, the assessment of regional wall motion is of great importance. Since the introduction of delayed enhancement magnetic resonance imaging (MRI) of late uptake of gadolinium in scarred tissue [1], this has become the method of choice to quantify viable myocardium. In combination with this, MRI has become an important method to assess myocardial function. Its ability of arbitrary three-dimensional coverage is appealing, compared to echocardiography which is restricted to certain acoustic windows. A limitation of MRI compared to echocardiography is the longer imaging time. Imaging is usually carried out during several cardiac cycles in a gated fashion. In order to reduce artifacts caused by respiratory motion, this is usually done during a breath hold. Multiple slice coverage of the entire heart will not fit into a single breath hold without sacrificing spatial and/or temporal resolution. For patients, using multiple breath holds can be a burden, and the risk that the separate breath holds are not consistent is impending. Several methods for scan
We acknowledge the financial support from the Swedish Research Council and the Swedish Heart-Lung Foundation.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 385–392, 2007. c Springer-Verlag Berlin Heidelberg 2007
386
A. Sigfridsson et al.
time reductions have been applied to limit this problem or to increase resolution or coverage. These methods include parallel signal reception from multiple coils with different signal encoding sensitivities [2,3], efficient k-space trajectories [4], variable sampling density with subsequent interpolation [5,6,7] and alias suppressing reconstruction from lattice subsampled data [8,9]. Of the aforementioned methods, the k-t BLAST (Broad-use Linear Acquisition Speed-up Technique) approach [9] has shown impressive reductions of scan time by a factor 5 or 8 with little perceived loss of image quality. k-t BLAST works by subsampling the k-t space (spatial frequency and time) sparsely on a lattice grid. By using a sheared lattice, the resulting signal aliasing will be shifted in the reciprocal x-f space, i.e. low temporal frequencies of one spatial position will be aliased to higher temporal frequencies at a different spatial position. By also acquiring so-called training data, an estimate of the distribution of the signal and aliased signal in x-f space is obtained. This estimate can be used to separate the true signal from the aliased copies. The rationale is that large parts of the imaging field of view will only contain low temporal frequencies, and the signals will not interfere. The separation is accomplished by a Wiener filter approach in x-f space, using a filter R: M2 (1) 2 M 2 + Malias + Ψ2 2 Malias is the estimated aliased where M 2 is the signal distribution estimate, energy and Ψ 2 is the measurement noise variance. For wide-sense stationary sources, the Wiener filter is the optimal linear reconstructor in the least squares sense [10]. However, the least squares error norm is not necessarily the best norm for motion analysis. The result after k-t BLAST reconstruction is visually appealing, even with high subsampling, in single time frames. When considering temporal variations, however, it becomes apparent that temporal fidelity is suffering from the regularization. When the aliased signal or noise dominates over the true signal, the reconstruction filter R will attenuate the output in order to suppress the aliasing signal and noise. The attenuation is more pronounced for signal with high temporal frequency, because it is more easily dominated by noise or aliased signal with low temporal frequency. The attenuation of high temporal frequencies translates into temporal smoothing and loss of rapid motion. The aim of this work was to investigate if the conventional k-t BLAST reconstruction filter could be improved to preserve more of the high temporal frequency content by reducing the amount of regularization of noise and aliased signal. R=
2
Method
To evaluate the conventional k-t BLAST reconstruction and an alternative reconstruction, a fully sampled reference data set was artificially decimated and reconstructed using different reconstruction filters.
Improving Temporal Fidelity in k-t BLAST MRI Reconstruction
387
Fig. 1. A Two-chamber delayed enhancement image (left) and an end diastolic time frame from the dynamic image sequence (right). Note the scarred tissue in the anteroapical region of the left ventricle, indicated by the arrow. The box indicates the region shown in subsequent images.
Data were used from a clinical follow-up of a patient who recently had suffered an ST-elevation myocardial infarction treated with percutaneous coronary intervention. Image acquisition was done on a Philips Achieva 1.5T MRI scanner (Philips Medical Systems, Best, The Netherlands). Using delayed enhancement imaging the extension of the infarction in the anterior-apical region was demonstrated, as shown in a two-chamber view in Fig. 1. A time resolved slice in the same two-chamber orientation was acquired using a retrospectively gated balanced steady-state free precession pulse sequence with the following parameters; slice thickness 8 mm, field of view 320 mm, repetition time 3.2 ms, echo time 1.6 ms, flip angle 60◦ , k-space segmentation factor 11, acquisition matrix 192 × 187 and reconstruction matrix 256 × 256. The SENSE-Cardiac coil was used for signal reception, but SENSE was not utilized for image acceleration. An end diastolic time frame from this image sequence is also shown in Fig. 1. The time resolved slice was reconstructed by the scanner into 30 time frames. Since the actual temporal resolution was 35 ms and the heart rate was 71 beats per minute, the data were subsequently temporally interpolated into 24 time frames using linear interpolation. This also allowed a k-t BLAST reduction factor of 8, since the number of time frames must be divisible by the reduction factor. These 24 time frames were considered to be the reference image data. The reference data were Fourier transformed along the spatial dimensions to obtain data in k-t space. The central 16 k-space lines were kept in all time frames as training data to be used for the signal estimate M 2 . The reference data were then artificially decimated using the lattice shown in Fig. 2. The lattice was obtained by maximizing the shortest distance between signal aliases in x-f space [11]. After decimation, the data were Fourier transformed in both
388
A. Sigfridsson et al.
t k Fig. 2. The lattice in k-t space that was used for artificial decimation. Open circles represent the data points that were discarded and filled circles represent the data points that were retained. The 8 × 8 tile was repeated to cover the full k-t space of 256 × 24. The central 16 lines in k-space were fully sampled and used for estimation of M 2 .
spatial and temporal dimensions, yielding data in x-f space. The reconstruction filter (described below) was applied, and after Fourier transformation in the temporal dimension, resulting images were obtained. The terms in the conventional k-t BLAST reconstruction filter as described in Eq. 1 were obtained as follows. The central 16 k-space lines acquired for the training data were windowed using a Hamming window and zero-filled in the phase-encoding direction to a size of 256 × 256 × 24 and then Fourier transformed into x-f space. The squared magnitude of the result is used as M 2 in the reconstruction filter. In order to preserve high temporal frequencies, temporal frequency windowing as described in the original k-t BLAST paper [9] was not performed in this work. The variance of a non-moving area close to the heart was used as the noise variance estimate Ψ 2 . For the alternative reconstruction approach, a modified version of the conventional k-t BLAST filter was considered: Ralt =
M2 2 M 2 + α( Malias )γ + βΨ 2
(2)
In this work, β was empirically set to 0.1 to reduce the noise regularization. The exponent γ wasset to 2, with α used as a normalization factor to keep the 2 maximum value of Malias constant. This was used to reduce suppression of weak aliased signal while reverting to full suppression where the aliased signal is very strong. To study the temporal fidelity of the data, the temporal evolution of a line through the ventricle was displayed. Analogous to M-mode ultrasound, the temporal dimension was combined with one spatial dimension to create a twodimensional image. In order to evaluate the different reconstruction filters quantitatively, the velocity of the wall was estimated in five regions in the left ventricle. The regions are shown in Fig. 3. The velocity estimation was based on quadrature phase optical flow [12,13] and performed as follows. All images were filtered using three
Improving Temporal Fidelity in k-t BLAST MRI Reconstruction
389
1 2 3
5
4 a
b
c
d
e
Fig. 3. Region placement (a), M-mode line position (b) and M-mode projections over time for the reference data set (c), conventional k-t BLAST data set (d) and alternative k-t BLAST data set (e). Note the induced temporal smoothing in both k-t BLAST reconstructions, and the better preservation of high temporal frequency content in the alternative reconstruction.
quadrature lognorm filters in the directions 0◦ , 60◦ and 120◦, with cos2 shaped angle envelop and a lognorm radial response with center frequency of π/6 and 3 octaves relative bandwidth. Phase differences (ϕ) and corresponding certainties (Q) in the x, y and t directions were formed from conjugate products according to Qx eiϕx = fyt ∗ q(x) q (x + Δx ) Qy eiϕy = fxt ∗ q(x) q (x + Δy ) (3) Qt eiϕt = fxy ∗ q(x) q (x + Δt ) with denoting complex conjugate, Δx , Δy and Δt are one pixel in the x, y, and t directions, respectively, and fyt , fxt and fxy are convolution kernels to center the phase differences around a common pixel. The speed of the local region was estimated as a weighted sum of the phase difference ratio weighted by the local certainty of the speed estimate |ϕt | Qt Q2x + Q2y (4) s= ϕ2x + ϕ2y Ω The sum was performed over all pixels in the region and all filter directions (Ω). Using similar energy weighted sums, a vector v in the direction of motion was obtained as ⎛√ ⎞ Qx Qt ϕϕxt Ω
v = −⎝ (5) ϕ ⎠ Qy Qt ϕyt Ω
The certainty weighting coefficients were normalized in all sums in Eqs. 4 and 5. v The speed and direction were combined to a velocity vector as s |v| , that was projected onto the direction of the edge. The edge orientation was obtained from
390
A. Sigfridsson et al.
Fig. 4. An end diastolic time frame reconstructed using conventional k-t BLAST (left) and with the alternative reconstruction filter (right). The alternative reconstruction suffers from slightly increased artifacts, as indicated by arrows, and increased noise.
a local structure tensor description of the neighborhood based on the quadrature filter responses [13]. The sign of the velocity estimate (inwards or outwards motion) was determined from the phase of the local structure. An inward motion from the dark myocardium to the bright ventricle is considered as positive.
3
Results
Images in end diastole reconstructed using the two reconstruction filters are shown in Fig. 4. The alternative reconstruction results in slight artifacts from signal aliasing and increased noise. The normalized root-mean-square error decreased from 9.8% to 7.9% by using the alternative k-t BLAST reconstruction. The temporal evolution of a single line through the ventricle is shown in Fig. 3. Both k-t BLAST methods introduce temporal smoothing, but qualitatively sharper temporal evolution is observed in the alternative k-t BLAST data set, along with slightly increased noise. The estimated velocities across the edge from the different regions in the three image series are shown in Fig. 5. Positive values indicate inward motion. The velocity traces in the reference data set from the normal regions (1, 4 and 5) show a clear systolic peak, and two diastolic peaks related to early filling and atrial contraction. The regions placed in the scar tissue show paradoxical bulging in early diastole. The k-t BLAST reconstructions result in lower velocity estimates which makes these events less apparent. The conventional reconstruction filter produces lower velocity estimates than the alternative reconstruction filter. A paired Wilcoxon signed rank test showed that the magnitude error in edge velocity, with the fully sampled data set used as reference, was larger in the conventional k-t BLAST data set compared to the alternative k-t BLAST data set (p < 0.01). The root-mean-square error of the velocity estimates was reduced from 1.4 cm/s to 1.0 cm/s using the alternative approach.
Improving Temporal Fidelity in k-t BLAST MRI Reconstruction
1
4
4
2
2
391
0
0
4 −2
−2
−4
−4
−6
−6 1
4
8
12
16
20
24
2
2
−8 1
4
8
12
16
20
24
1
4
8
12
16
20
24
4
0
2
−2 1
4
8
12
16
20
24
5 0 −2
4
−4
2
3
0 −2 1
4
8
12
16
20
24
Fig. 5. Edge velocities [cm/s] in the different regions over time in the reference images (blue), conventional k-t BLAST reconstructed images (green) and the alternative k-t BLAST reconstructed images (red). Regions 2 and 3 are located in the infarcted area. The direction of positive velocities is inwards.
4
Discussion
An alternative reconstruction filter for k-t BLAST subsampled data has been proposed and compared to a conventional filter. Velocities estimated in five regions in an infracted left ventricle were less underestimated using the alternative filter. The alternative reconstruction filter preserves more of the high frequency content, at the expense of more artifacts from aliased signal and increased noise. Since the velocity analysis used in this work is based on regional image information, it is not highly sensitive to a slightly increased aliasing artifact or noise level, which do not manifest coherently in adjacent time frames. Although regional analysis methods are in principle robust to the types of artifacts introduced, optimal filter parameters have to be determined for specific analysis strategies. The improved reconstruction filter suggested was designed with few simple parameters, and optimization of these for other analysis tools and acquisition parameters such as reduction factor and temporal resolution might yield even better results. In the present setting, the reduced noise regularization seemed to have largest impact on the temporal fidelity. Other types of reconstruction filters may also be considered. The choice of reconstruction filter does not affect the MRI data acquisition, allowing multiple reconstructions targeted for different analysis tools to be performed from the same raw data. Optimizing the filter kernel with respect to temporal support [14] was attempted, but did not produce significantly improved results in this setting. In
392
A. Sigfridsson et al.
cases with lower reduction factors, however, applying such a filter optimization might have a larger effect since the relaxed settings allow shorter filters in the temporal domain. In conclusion, the conventional k-t BLAST filter suppresses the high frequency content more than necessary for certain applications and better temporal fidelity can be achieved by merely changing the reconstruction filter. Velocity estimation has been shown to be significantly improved using a modified reconstruction filter.
References 1. Kim, R., Fieno, D., Parrish, T., Harris, K., Chen, E., Simonetti, O., Bundy, J., Finn, J., Klocke, F., Judd, R.: Relationship of MRI delayed contrast enhancement to irreversible injury, infarct age, and contractile function. Circulation 100(19), 1992–2002 (1999) 2. Pruessman, K.P., Weiger, M., Scheidegger, M.B., Boesiger, P.: SENSE: sensitivity encoding for fast MRI. Magn. Reson Med. 42, 952–962 (1999) 3. Griswold, M.A., Jakob, P.M., Heidemann, R.M., Nittka, M., Jellus, V., Wang, J., Kiefer, B., Haase, A.: Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson Med. 47, 1202–1210 (2002) 4. Nayak, K.S., Pauly, J.M., Yang, P.C., Hu, B.S., Meyer, C.H., Nishimura, D.G.: Real-time interactive coronary MRA. Magn. Reson Med. 46, 430–435 (2001) 5. van Vaals, J.J., Brummer, M.E., Dixon, W.T., Tuithof, H.H., Engels, H., Nelson, R C., Gerety, B.M., Chezmar, J.L., den Boer, J.A: ”Keyhole” method for accelerating imaging of contrast agent uptake. J. Magn. Reson. Imaging 3(4), 671–675 (1993) 6. Doyle, M., Walsh, E., Blackwell, G., Pohost, G.: Block regional interpolation scheme for k-space (BRISK): a rapid cardiac imaging technique. Magn. Reson. Med. 33, 163–170 (1995) 7. Korosec, F.R., Frayne, R., Grist, T.M., Mistretta, C.A.: Time-resolved contrastenhanced 3D MR angiography. Magn. Reson. Med. 36, 345–351 (1996) 8. Madore, B., Glover, G.H., Pelc, N.J.: Unaliasing by Fourier-encoding the overlaps using the temporal dimension (UNFOLD), applied to cardiac imaging and fMRI. Magn. Reson. Med. 42, 813–828 (1999) 9. Tsao, J., Boesiger, P., Pruessman, K.P.: k-t BLAST and k-t SENSE: Dynamic MRI with high frame rate exploiting spatiotemporal correlations. Magn. Reson. Med. 50, 1031–1042 (2003) 10. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, San Diego (1999) 11. Tsao, J., Kozerke, S., Boesiger, P., Pruessmann, K.P.: Optimizing spatiotemporal sampling for k−t BLAST and k−t SENSE: Application to high-resolution real-time cardiac steady-state free precession. Magn. Reson. Med. 53, 1372–1382 (2005) 12. Andersson, K., Johansson, P., Forcheimer, R., Knutsson, H.: Backward-forward motion compensated prediction. In: Advanced Concepts for Intelligent Vision Systems (ACIVS 2002), Ghent, Belgium, pp. 260–267 (2002) 13. Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer Academic Publishers, Dordrecht (1995) 14. Knutsson, H., Andersson, M., Wiklund, J.: Advanced filter design. In: SCIA. Proceedings of the 11th Scandinavian Conference on Image Analysis, Greenland (1999)
Segmentation and Classification of Breast Tumor Using Dynamic Contrast-Enhanced MR Images Yuanjie Zheng, Sajjad Baloch, Sarah Englander, Mitchell D. Schnall, and Dinggang Shen Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA
Abstract. Accuracy of automatic cancer diagnosis is largely determined by two factors, namely, the precision of tumor segmentation, and the suitability of extracted features for discrimination between malignancy and benignancy. In this paper, we propose a new framework for accurate characterization of tumors in contrast enhanced MR images. First, a new graph cut based segmentation algorithm is developed for refining coarse manual segmentation, which allows precise identification of tumor regions. Second, by considering serial contrast-enhanced images as a single spatio-temporal image, a spatio-temporal model of segmented tumor is constructed to extract Spatio-Temporal Enhancement Patterns (STEPs). STEPs are designed to capture not only dynamic enhancement and architectural features, but also spatial variations of pixel-wise temporal enhancement of the tumor. While temporal enhancement features are extracted through Fourier transform, the resulting STEP framework captures spatial patterns of temporal enhancement features via moment invariants and rotation invariant Gabor textures. High accuracy of the proposed framework is a direct consequence of this two pronged approach, which is validated through experiments yielding, for instance, an area of 0.97 under the ROC curve.
1
Introduction
Dynamic contrast-enhanced MR imaging (DCE-MRI) is emerging as an important complementary diagnostic tool for early detection of breast cancer [1]. It involves characterizing temporal response of a tumor to a contrast agent prior to analyzing discriminating features between various tumor types. High sensitivity of DCE-MRI to breast cancer detection is, however, confounded by its relatively low specificity [2]. Existing approaches attempt to improve low specificity through better segmentation [3] and/or complete characterization of tumor (using architectural and dynamic features [4,1]). Expert manual segmentation, regarded as gold standard for tumor segmentation, is usually tedious and time consuming. In addition, it suffers from inaccuracy due to imprecision, which results in high inter- and intra-rater variability. Numerous segmentation methods [3,5,6] have recently been proposed to address these limitations. These algorithms are driven by a simple assumption that considers enhancements within tumor to be uniform, limiting only one class per N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 393–401, 2007. c Springer-Verlag Berlin Heidelberg 2007
394
Y. Zheng et al.
tumor. Here, we propose a graph cut [7] based segmentation algorithm that accounts for spatial variations of enhancements, and allows association of multiple classes to the tumor and background for a more accurate tumor extraction. Tumor classification employs segmented tumor to extract appropriate features, which are eventually used for cancer diagnosis. Two such features, namely dynamic and architectural features [4,1,2,3], have extensively been investigated for quantifying tumor properties. For instance, an early strong enhancement with rapid washout has been observed in malignant tumors. On the other hand, a slow increase in the temporal enhancement followed by a tapering off is typically exhibited by benign tumors. Majority of the existing dynamic features exploit these simple enhancement differences within a region of interest taken inside a tumor. Architectural features, on the other hand, are driven by morphological differences, with a spiculated border and irregular shape attributed to malignancy versus a smooth border and regular shape related to benignancy. The information in such dynamic and architectural features is, however, limited in the scope in the sense that their combination lacks comprehensive spatial variations of pixel-wise temporal enhancements (TE). These spatial patterns have been shown to be fundamentally important for distinguishing malignant and benign tumors [4,1]. Although the spatial information incorporated in [4,1] offers some improvement, their dependence on qualitative ratings by experts limits their utility for automatic diagnosis. In this paper, we propose a framework for cancer diagnosis that combines temporal, spatial, and morphological attributes of an automatically segmented tumor in a Spatio-Temporal Enhancement Pattern (STEP). STEP views a serial contrast-enhanced images as a single spatio-temporal image and consequently models its temporal variations through Fourier transform coefficients. Spatial and morphological features are then accounted for by moment invariants [8] and Gabor texture features [9]. Our segmentation refinement algorithm coupled with STEP features provides a robust framework for cancer diagnosis, which is validated with tumor classification using a linear classifier. Instead of using the entire set of STEP features in the classifier, we employ a simple ranking-based feature selection method that helps in finding the most discriminating features between malignancy and benignancy.
2
Methods
This section describes our breast cancer diagnosis framework, which consists of tumor segmentation, STEP feature extraction, and tumor classification. 2.1
Segmentation
Discrimination between benign and malignant breast tumors may be greatly enhanced by accurate segmentation [3] that precisely identifies spatial domain of a tumor. To this end, we develop a graph-cut based algorithm [7] for tumor segmentation, which improves on a coarse manual segmentation, thereby eliminating the
Segmentation and Classification of Breast Tumor Using DCE-MRI
395
need of an expert rater. It involves assigning same labels to pixels with similar TE vectors by minimizing an energy functional. TE vector C i = [C(i, 1) · · · C(i, T − 1)] of a pixel i is defined as: C(i, t) =
I(i, t) − I(i, 0) , t = 1, · · · , T − 1 I(i, 0)
(1)
where I(i, t) denotes the intensity of ith pixel at scanning time t. The energy function is defined as below, which consists of four terms: E= E1 (li ) + λ1 E2 (li , lj ) + λ2 E3 (li , lj ) + λ3 E4 (li , lj ) i∈Ω
∈N
∈Nd
∈Ntb
(2) where li is the label of pixel i. Factors λ1 , λ2 , λ3 are used to adjust the relative importance of the four terms, and are empirically set to 1 in this paper. E1 ensures the statistical similarity of pixel-wise TEs within each class, and is defined as E1 (li ) = (1 − P r (C i |μli , σ li )) for pixels in image Ω, where P r measures probability of C i belonging to class li , and the class li is represented by a Gaussian model with mean μli and variance σ li . E2 penalizes different label assignment to neighboring pixels, which in fact introduces a Markovian property. It is defined for neighboring pixel-pairs as E2 (li , lj ) = (1 − δ(li − lj )), where δ is a Kronecker delta function. E3 introduces fidelity in segmentation by forcing boundary to regions of large enhancement gradients. It is defined for all neighboring pixel-pairs having different labels as E3 (li , lj ) = g1 (C i − C j ), where · means L2-Norm and 1 g1 (ζ) = ζ+1 . E4 attempts to find the tumor boundary in the vicinity of manually placed contour, and is defined for the pixel pairs which are neighboring and belong differently to tumor and background as E4 (li , lj ) = g2 (β · Di,j ), where Di,j is the distance from the center point between pixel i and pixel j to the manually ζ . delineated boundary, and β is a control parameter for Di,j , and g2 (ζ) = ζ+1 In order to initialize the graph-cut based segmentation algorithm, tumor is first specified and roughly segmented by a manual rater. Then, a rectangle region, larger than the bounding box of segmented tumor, is specified as a domain for segmentation. After that, tumor and background are both classified into 3 classes using k-means clustering. The energy functional described above is then minimized by the expansion move algorithm [7] with the output of the k-means algorithm as initialization. After convergence, the union of all tumor classes found by the algorithm is taken as the refined tumor region. Fig. 1 compares segmentation refinement with expert manual segmentations (ground-truth). It can be observed that the refined segmentations yield a result very close to expert manual segmentations. 2.2
Extraction of STEP Features
To extract STEP features, segmented tumor samples (from Section 2.1) are first spatially normalized and then temporally modeled. Finally, both spatial and temporal properties of TE are combined to construct the STEP features.
396
Y. Zheng et al.
Rough
Refined
Expert
Fig. 1. Rough manual segmentations by a manual rater, corresponding refined results, and manual segmentations by an expert
Original
Normalized
TE Map
Fig. 2. Tumor samples before and after normalization, and corresponding TE map of the 1st DFT coefficient. For each panel, top row is malignant, while bottom row is benign.
Tumor Normalization: In order to extract spatial and temporal properties, tumors are first rigidly registered using an approach similar to Procrustes analysis, which aligns principal directions corresponding to the distribution of pixels of each tumor sample and scales tumor sizes for equal largest principal modes. As a result, all tumor samples are normalized to have similar predefined principal directions, in addition to the same “largest eigenvalues” as shown in Fig. 2. Temporal Enhancement (TE) Modeling: While several signal decompositions, such as Fourier, wavelet transforms, and wavelet packets, may be exploited to model temporal response of a breast tissue, we adopt Fourier transform to model pixel-wise temporal enhancement, due to its simplicity. Discrete Fourier Transform (DFT) of TE (given by Eq. (1)) yields T −1 coefficients for each pixel. Combining all pixels in the tumor results in Nt = T − 1 DFT coefficient maps which provide a frequency domain representation of the spatio-temporal enhancement image. An example of a DFT based TE map is given in Fig. 2. Spatial Description of TE Maps: DFT based TE maps given in Fig. 2 indicate that both global morphological attributes of a tumor and local variations in the TE map can form distinguishing characteristics of malignant and benign tumors.
Segmentation and Classification of Breast Tumor Using DCE-MRI
397
Accordingly, we employ two types of rotation-invariant features, namely, moment invariants [8] for global properties, and Gabor textures [9] for local attributes. Consequently, we compute Hu’s Hm = 7 moment invariants for each of Nt TE maps in addition to Hg = Z × K rotation-invariant Gabor texture features for K orientations within a period of π and Z radial frequencies. We set K = 4 and Z = 8 in our experiments. The resulting Nt × (Hm + Hg ) STEP features for Nt TE maps out of a total of Nt TE maps provide a rich characterization of a tumor, thereby incorporating spatial, temporal, and morphological attributes. 2.3
Tumor Classification
We validate tumor segmentation and the resulting STEP features by classifying tumors into benign and malignant. It should be noted that STEP features are derived for complete characterization of a tumor, some of which may not be discriminative between the two classes. It is, therefore, important to select a smaller set of most distinctive STEP features before tumor classification. In this paper, we employ Fisher Linear Discriminant based Linear Discriminant Analysis(LDA) [10] classifier, along with a simple t-score ranking-based feature selection method. Although one may obtain better performance by using advanced features selection [11] and nonlinear classification [12], the results provided serve as the baseline performance. Due to limited sample size, leave-one-out crossvalidation framework is employed.
3
Experimental Results
DCE-MRI data of bilateral fat suppressed T2 weighted image for 31 subjects were acquired from UPenn under PO1CA085424 (Clinical evaluation of multimodality breast imaging). Sequential post contrast acquisitions were acquired for approximately 6 minutes following contrast injection. There are 22 malignant and 9 benign tumors, which were histologically verified. All the subjected were aligned before performing our algorithms. Two experiments were performed to evaluate the proposed segmentation algorithm and the STEP features. Classification performance was consequently compared through receiver operating characteristic (ROC) curves (fitted with [13]), sensitivity, specificity, and accuracy for various tumor features. • Evaluation of Segmentation Algorithm We compare the performances of tumor classification on roughly-segmented tumor samples by manual rater, segmentation-refined tumor samples, and expertsegmented tumor samples. As indicated by ROC curves in Fig. 3, the segmentation refinement algorithm improves tumor classification, which is consistent with the results reported in [3]. Highly coincident ROC curves for refined segmentation based- and expert segmentation based-classification given in Fig. 3 indicate that our segmentation algorithm yields a performance comparable to that for expert segmentations.
398
Y. Zheng et al. ROC curve 1 0.9 0.8
True positive rate
0.7 0.6 0.5 0.4 Rough AUC:0.81 Refinement AUC:0.97 Expert AUC:0.97
0.3 0.2 0.1 0
0
0.2
0.4 0.6 False positive rate
0.8
1
Fig. 3. ROC curves of tumor classification on rough segmentation, refinement segmentation, and expert segmentation, along with their AUC values. Notice that ROC curves of ‘refinement’ and ‘expert’ are overlapped together, as identical curves.
Besides, based on all our testing samples, the mean and the standard deviation of boundary distances between our segmentation and expert segmentation are 4.10 and 5.62 pixels, which are much less than the mean and the standard deviation of boundary distances obtained between rough segmentation and expert segmentation, i.e., 7.67 and 8.24 pixels. • Validation of STEP Features Performance of STEP features was compared with various existing features. Seven dynamic features (D = {D1 , · · · , D7 }), six architectural features (A = {A1 , · · · , A6 }), and nine features of spatial variations of TE (V = {V1 , · · · , V9 }) were selected (See Appendix for details). Classification performance of individual features {A, D, V }, their combinations {A ∪ D, A ∪ D ∪ V }, and the proposed STEP features was compared using feature selection and leave-one-out classification procedure explained in Section 2.3. Table 1. Best classification accuracy, along with corresponding sensitivity and specificity, for different sets of features used in tumor classification Feature Accuracy(%) Sensitivity Specificity Selected Features A 87.1 0.91 0.78 A3 ,A5 ,A6 D 64.5 0.68 0.56 D6 V 90.3 0.91 0.89 V3 ,V5 A∪D 87.1 0.91 0.78 A3 ,A5 ,A6 A∪D∪V 90.3 0.91 0.89 A3 ,A5 ,V3 ,V5 STEP 96.8 0.95 1.00 3(moment)+1(Gabor)
Segmentation and Classification of Breast Tumor Using DCE-MRI
399
ROC curve 1 0.9 0.8
True positive rate
0.7 0.6 0.5 0.4 A D V A∪D A∪D∪V STEP
0.3 0.2 0.1 0
0
0.2
0.4 0.6 False positive rate
AUC:0.87 AUC:0.75 AUC:0.89 AUC:0.87 AUC:0.93 AUC:0.97 0.8
1
Fig. 4. ROC curves of tumor classification using different sets of features. The corresponding AUC values are also provided. Notice that ROC curves of A and A ∪ D are overlapped together, as identical curves.
ROC curves and their AUC values for various features are given in Fig. 4. Best classification accuracy, corresponding sensitivity and specificity, along with features selected by our feature selection method, are listed in Table 1. Four selected STEP features (shown in the last row of Table 1) include 3 moment invariants and 1 local Gabor texture feature. Combining general architectural and dynamic features (A ∪ D) for tumor diagnosis, did not improve the AUC value or the best classification rate as compared to the architectural features. This may be due to the simple rankingbased feature selection, which may fail to optimally combine architectural and dynamic features. On the other hand, spatial variation of contrast enhancement was proved very effective in distinguishing malignant and benign tumors. Although the combination of this feature with architectural and dynamic features (A ∪ D ∪ V ) fell short of providing an outright improvement in classification rate (compared to V ) as shown in Table 1, it improved AUC from 0.89 to 0.93 as shown in Fig. 4. This clearly demonstrates that TE, architectural structure, and spatial variation of TE all play an important role in distinguishing malignant and benign tumors. In all experiments, STEP features offered the best performance, with AUC, classification rate, sensitivity, and specificity all showing improvements. The fact that both moment invariants and local Gabor texture features were selected as STEP features for tumor classification validates our hypothesis that capturing both global and local variations of contrast enhancement is important in distinguishing between malignant and benign breast tumors. In particular, the 6th moment invariant of the 2nd DFT coefficient map, the 1st moment invariant of
400
Y. Zheng et al.
the 1st DFT coefficient map, the 2nd moment invariant of the 4th DFT coefficient map, and one texture feature were selected, respectively.
4
Conclusion
In this paper, we have proposed a framework for extracting Spatio-Temporal Enhancement Pattern (STEP) that completely characterizes three tumor properties, namely, temporal enhancement, architectural structure, and spatial variations of pixel-wise temporal enhancement. STEP features were validated through tumor classification, where experimental results show better tumor classification performance with STEP features, than that with individual and pairwise features. We have also shown that STEP features are benefited greatly by the proposed segmentation refinement algorithm as indicated by the tumor classification results, which are consistent with those for expert segmentations. Future work involves extensive evaluation of our methods with a larger database.
References 1. Schnall, M., et al.: Diagnostic architectural and dynamic features at breast MR imaging: Multicenter study. Radiology 238, 42–53 (2006) 2. Gilhuijs, K., et al.: Computerized analysis of breast lesions in three dimensions using dynamic magnetic-resonance imaging. Med. Phys. 25, 1647–1654 (1998) 3. Tanner, C., et al.: Classification improvement by segmentation refinement: Application to contrast-enhanced MR-mammography. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 184–191. Springer, Heidelberg (2004) 4. Szabo, B., et al.: Dynamic MR imaging of the breast: Analysis of kinetic and morphologic diagnostic criteria. Acta. Radiol. 44, 379–386 (2003) 5. Meinel, L., et al.: Breast MRI lesion classification: Improved performance of human readers with a backpropagation neural network computer-aided diagnosis (CAD) system. J. Magn. Reson. Imaging 25, 89–95 (2007) 6. Wismuller, A., et al.: Segmentation and classification of dynamic breast magnetic resonance image data. J. Electron Imaging 15, 013020–1–013020–13 (2006) 7. Boykov, Y., et al.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001) 8. Hu, M.: Visual pattern recognition by moment invariants. IRE Trans Information Theory, 179–187 (1962) 9. Tan, T.: Rotation invariant texture features and their use in automatic script identification. IEEE Trans. Pattern Anal. Mach. Intell. 20, 751–756 (1998) 10. Lachenbruch, P.: Discriminant Analysis. Hafner Press (1975) 11. Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002) 12. Vapnik, V.: Statistical Learning Theory. Wiley Interscience, New York, NY, USA (1998) 13. Metz, C., et al.: Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat. Med. 17, 1033–1053 (1998)
Segmentation and Classification of Breast Tumor Using DCE-MRI
401
14. Tanner, C., et al.: Does registration improve the performance of a computer aided diagnosis system for dynamic contrast-enhanced MR mammography. In: ISBI 2006, pp. 466–469 (2006) 15. Chen, W., et al.: Computerized interpretation of breast MRI: Investigation of enhancement-variance dynamics. Med. Phys. 31, 1076–1082 (2004) 16. Chen, X., et al.: Simultanous segmentation and registration of contrast-enhanced breast MRI. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 126–137. Springer, Heidelberg (2005)
5
Appendix
For the dynamic features, we selected the standard deviation of enhancement (D1 ), maximum washout (D2 ) [14], the maximum uptake (D3 ), uptake rate (D4 ), washout rate (D5 ) [15], and the two features (D6 ) and (D7 ) extracted from the enhancement curve modelled by the Hayton-Brady pharmacodynamic model in [16]. Notice that the dynamic features were all computed from the average intensities over the tumor area at every time point. For the architectural features, we selected the compactness (A1 ) [14], circularity (A2 ) [15], irregularity (A3 ), eccentricity (A4 ), rectangularity (A5 ), and entropy of radial length distribution (A6 ) [3]. The features that can account for spatial variations of TE are the variance of uptake (V1 ), change in variance of uptake (V2 ), margin gradient (V3 ), variance of margin gradient (V4 ), variance of radial gradient histogram (V5 ) [2], the maximum variation of enhancement (V6 ), the enhancement-variance increasing rate (V7 ), the enhancement-variance decreasing rate (V8 ), and the enhancementvariance (V9 ) at the first post-contrast frame [15].
Automatic Whole Heart Segmentation in Static Magnetic Resonance Image Volumes Jochen Peters, Olivier Ecabert, Carsten Meyer, Hauke Schramm , Reinhard Kneser, Alexandra Groth, and J¨ urgen Weese Philips Research Laboratories, X-Ray Imaging Systems, Weisshausstrasse 2, D-52066 Aachen, Germany {jochen.peters,carsten.meyer}@philips.com FH Kiel, Sokratesplatz 1, D-24149 Kiel, Germany
Abstract. We present a fully automatic segmentation algorithm for the whole heart (four chambers, left ventricular myocardium and trunks of the aorta, the pulmonary artery and the pulmonary veins) in cardiac MR image volumes with nearly isotropic voxel resolution, based on shape-constrained deformable models. After automatic model initialization and reorientation to the cardiac axes, we apply a multi-stage adaptation scheme with progressively increasing degrees of freedom. Particular attention is paid to the calibration of the MR image intensities. Detailed evaluation results for the various anatomical heart regions are presented on a database of 42 patients. On calibrated images, we obtain an average segmentation error of 0.76mm.
1
Introduction
Cardiovascular disease is the leading direct or contributing cause of death in the world. Three-dimensional (3-D) magnetic resonance (MR) imaging is now well established in the noninvasive diagnosis of cardiovascular disease. A prerequisite to image interpretation, e.g. 3-D visualization, is image segmentation. Due to the increasing amount of image data associated with finer resolution, segmentation needs to be highly automated to be clinically valuable. Automatic segmentation of cardiac MR images is however challenging due to image noise, low tissue contrast between the myocardium and surrounding tissues, patient variability, the lack of gray level calibration and spatial magnetic field inhomogeneities. A number of techniques have been applied to cardiac MR image segmentation. Among these are active shape models [1,2], active appearance models [3,4,5], deformable models [6,7,8,9], active contours [10], level sets [11], and atlas-based methods [12]. While many studies focus on segmenting only the left and right ventricles and the myocardium, segmentations of the four chambers plus the left ventricular myocardium have recently been reported in [13,11,14]. Often, dynamic MR images are acquired in a stack of slices in short axis and/or long axis views, and consequently the algorithms are developed for this kind of data. Here, we focus on static cardiac (3-D) image volumes with nearly isotropic voxel resolution, acquired with steady-state free-precession MRI. For this kind of data, N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 402–410, 2007. c Springer-Verlag Berlin Heidelberg 2007
Automatic Whole Heart Segmentation in Static MR Image Volumes
403
the heart position and axes are not known a priori and surrounding structures are visible since the field of view is not confined to the heart itself. In this work, we outline and evaluate a fully automatic algorithm for whole heart segmentation in cardiac MR image volumes, based on shape-constrained deformable models [7]. In previous work, we have successfully applied this framework to cardiac CT images [15]. Here, particular attention is paid to a novel model initialization step as well as to the effect of MR image calibration to compensate for image intensity variations [16,17,4]. Our main contributions are: – fully automatic heart localization based on a 3-D generalized Hough transformation, streamlined for fast processing, – the evaluation of the effect of MR image calibration for the given task, – the demonstration of fully automatic whole heart segmentation (four chambers, left ventricular myocardium and trunks of aorta, pulmonary artery and pulmonary veins) in nearly isotropic, static cardiac MR image volumes.
2
Shape-Constrained Deformable Models
In this work, the cardiac anatomy is extracted from the MR image volumes using a deforming mesh comprising both ventricles, both atria, the epicardial surface around the left ventricle and the trunks of the great vessels (aorta, pulmonary artery and veins, and vena cava). This mesh is made of V = 7286 vertices combined in T = 14771 triangles with complex junctions connecting 3 or more surfaces [18]. An illustration of this mesh is given in Figure 1. The mesh adaptation is performed by iterating two steps. First, boundary candidates are searched along the normal vectors ni of the mesh triangles. For each triangle i, these target points are selected according to xtarget = ci + arg max Fi (ci + jδni ) − Dj 2 δ 2 · δ · ni , (1) i j=−l,...,+l
where ci are the triangle centers, δ is the sampling distance along the normal vector, l · δ is the search range and D is a heuristic penalty term, which biases the search to nearby points. The choice for optimal, spatially varying boundary detection functions Fi (.) is essential for a robust and accurate segmentation. A more detailed discussion about these functions is provided in Section 4. In the second step, the mesh is deformed by minimizing the sum of the quadratic distances between the triangle centers ci and the detected boundary points xtarget [7] i Eexternal =
T i=1
w ˜i
target
)
target
)
∇I(xi ∇I(xi
2 target
· (xi
− ci )
.
(2)
The term (xtarget − ci ) is projected onto the direction of the image intensity i gradient ∇I/ ∇I at the target point. This makes the energy invariant
404
J. Peters et al.
Fig. 1. Heart model from two different views, and typical results of the fully automatic segmentation (with calibration) for three different patients (first row: axial views, second row: coronal views; different zoom factors). Note the differences in contrast and relative chamber sizes and orientations.
to movements of the triangle within the object tangent plane, preventing the triangle from becoming stuck at the target position. The weights w ˜i reflect the reliability of the detected boundary. Practically, they are set to w ˜i = max{ 0, Fi (xtarget ) − D · (xtarget − ci )2 } i i Minimizing Eq. (2) with respect to ci may lead to irregular shapes due to some false target points and missing boundaries (w ˜i = 0). Prior shape knowledge can be used to stabilize this problem, as described in the following. Shape-Constrained Deformable Adaptation. To constrain the deformation we require the vertices v i to stay close to those of a reference shape (e.g., mean shape m) that is allowed to be modified by a geometric transformation T [.] (e.g., rigid or affine). This can be formalized by introducing an internal energy Einternal =
V
2 (v i − v j ) − (T [mi ] − T [mj ]) ,
(3)
i=1 j∈N (i)
where N (i) is the set of indices of the first–order neighbor vertices of vertex v i . The objective function to minimize is now given by E = Eexternal + α Einternal [7], where α balances the contribution of the internal and external energy. The free variables are the vertex positions v i and the transformation parameters of T . The boundary detection step followed by the optimization of E is iterated until the mesh reaches a steady state.
Automatic Whole Heart Segmentation in Static MR Image Volumes
3
405
Automatic Whole Heart Segmentation
To achieve robust segmentation, we use a multi-stage adaptation scheme: 1. Heart Localization. The first step of the segmentation chain consists of detecting the heart using the generalized Hough transformation (GHT) [19]. GHT is a method to detect the occurrence of any characteristic shape undergoing a geometric transformation in a 2-D image. It can be straightforwardly extended to 3-D, but the computational complexity and memory demand make this method inappropriate. In this paper, we introduce a practicable solution that makes use of image properties and characteristics of the cardiac anatomy. In particular, we constrain the geometric transformation to translation and scaling. We can observe that the heart is much larger than the voxel resolution. Before locating the heart in a new image, we can first down-sample the input image to voxel resolution 3.0 × 3.0 × 3.0 mm3 . After this step, the global shape of the heart is still well preserved while small surrounding structures are suppressed. Then, we filter out many disturbing edges by applying a threshold on the image to coarsely separate the blood pool from the remaining structures. GHT is then performed on the edge image resulting after this thresholding operation. That is, we actually do not directly search for the heart in the image, but indirectly by localizing the blood pool which has better contrast in our MR images. GHT is usually trained for one single reference shape of an object class. We can learn the shape variability of several individuals by combining the R-tables of several reference shapes [17]. After encoding the shapes from various patients, we can prune the R-table and keep only those entries that occur more than τ times (in our experiments τ = 2). That way, the entries are more discriminative and the detection time is reduced. 2. Parametric Adaptation. After heart localization by the GHT, the model might still be “far” from the heart boundaries which increases the risk of false target points. This motivates to first adapt the heart model in a parametric instead of a fully deformable way. In particular, the global pose has yet to be refined (i.e., up to now the model has not been rotated). The parametric adaptation works as follows: We apply a geometric transformation T [.] to the whole model and use the same transformation in Eq. (3). Einternal thus vanishes and we optimize Eexternal alone. Assuming that T [.] can be described by some parameters q = (q1 . . . qM ) and that the transformation is applied to the whole mesh, the vertex positions can be expressed as v(q). The goal is now to find the parameters q that minimize Eexternal (v(q)). The boundary detection step followed by the optimization of the energy function is iterated until convergence. The geometric transformation T [.] is successively refined with increasing degrees of freedom (number of parameters) q, starting with a global similarity transformation (T [.] = Tsimilarity [.]) for global pose correction. Then, to correct for anisotropic scale variations, we use a global affine transformation (T [.] = Taffine [.]). Finally, we assign an individual affine transformation to each of the K anatomical regions of the model to capture changes in size and rotation
406
J. Peters et al.
of the chambers between patients. To ensure that the mesh remains smooth at the transitions between two (or more) anatomical regions, we linearly interpolate the affine transformations as Tpiecewise affine [v i ] =
K
wi,k · Taffine,k [v i ] ,
(4)
k=1
where wi,k is the contribution of transformation Taffine,k [.] for vertex i, satisfying the normalization condition K k=1 wi,k = 1 ∀ i. 3. Deformable Adaptation. The model is now usually initialized well enough to proceed with a deformable adaptation as described in Section 2, i.e., the model is no longer constrained to undergo the transformation T [.]. Here, we keep using the piecewise affine transformation Eq. (4) in the internal energy Eq. (3). In summary, the degrees of freedom of the mesh deformation are progressively increased during segmentation, thereby increasing the robustness of the segmentation.
4
Optimized Boundary Detection
Vertex correspondence is preserved during mesh adaptation. This property enables us to encode information into the model which will be carried with the triangles during the adaptation. Especially, in a training phase a locally optimal boundary detection function is assigned to each mesh triangle using the Simulated Search approach [20]. This method needs as input adapted reference meshes and candidates of boundary detection functions. The selection process works as follows for each triangle independently: 1. The pose of the reference triangles is slightly disturbed, i.e., the triangle is shifted (along the normal vector and/or laterally) and/or tilted. 2. The boundary detection of Eq. (1) is performed. 3. The residual error between the detected point and the reference position is recorded for all tested displacements and all function candidates. 4. The candidate with the smallest simulated residual error is finally selected. As function candidates, we use the magnitude of the image gradient projected onto the triangle normal vector jointly with some boundary discrimination represented by rejection criteria Qk and acceptance intervals [mink , maxk ]:
||ni · ∇I(x)|| if Qk ∈ [mink , maxk ] for all Qk Fi (x) = . (5) / [mink , maxk ] for some Qk if Qk ∈ 0 Rejection Criteria and Acceptance Intervals. For edge characterization, we use the following rejection criteria: the gray values on either side of the boundary and the (signed) difference of the gray values across the boundary. All these criteria can be used alone or in combination to build numerous candidates (5).
Automatic Whole Heart Segmentation in Static MR Image Volumes
407
Reasonable acceptance intervals are trained by clustering the rejection criteria introduced above into M classes using the Qk values obtained from the reference meshes. The acceptance intervals are then derived from these clusters by rejecting the low and high N % percentiles per Qk . Note that both M and N are associated with tradeoffs. Using more clusters (large M ) may be more specific for the local image properties but may lead to less robust estimates. Narrow intervals (large N ) might reject too many correct edges, whereas wide intervals might be too unspecific. Here, as in CT [20], we use M = 5, 10 and N = 5, 10%. Finally, the candidates that result from combining all possible rejection criteria and all trained acceptance intervals are used by the Simulated Search to select the optimal boundary detection function for each triangle. Image Intensity Calibration. To compensate for image intensity variations across images, we compute the image histogram and derive the low and high L% percentiles, similar to [16]. The intensity values within this interval are then linearly re-scaled to a reference interval computed, e.g., from a reference image. This calibration strongly reduces the intensity variations between images and yields more consistent and narrow acceptance intervals. For consistency, the image calibration has to be performed both during training and segmentation. Preliminary experiments performed with L = 0, 2, 5 and 10% show that L = 2% was the value yielding the best results (i.e., smallest segmentation error) and will be used in the next section.
5
Results
Experimental Setup. The segmentation algorithm proposed in this paper was evaluated on 42 steady-state free-precession MR images, designated to inspect the coronary arteries specifically for ischemic disease. The images, acquired on Philips Intera and Achieva 1.5T scanners at end-diastolic phase over various cardiac cycles and breathing compensated (TE = 2.14 ± 0.11 [2.01 − 2.38] ms, TR = 4.27±0.22 [4.04−4.75] ms, flip angle = 86.07±5.33 [70.00−90.00] degrees, pixel spacing = 0.5 – 0.7mm, slice distance = 0.7 – 0.9mm, matrix size = 512 × 512, 100 – 170 slices), were obtained from clinics in various continents. Dividing the 42 images into 4 clusters with N = 10 or N = 11 images each, we used a leave-N -out validation approach to evaluate our segmentation algorithm. The training images were used to compute the mean mesh m (see Section 2), to generate the R-table for the GHT (see Section 3) and to train and assign the optimal boundary detection function to each triangle (see Section 4). The parameters of the algorithm (D, α, δ and l) have been adjusted for optimal performance on CT images; previous experiments have shown that the segmentation performance on CT images is robust with regard to these parameters. Error Measurement. We measure the segmentation error as the symmetrized mean Euclidean “surface-to-patch” distance mean , i.e. the mean distance between the triangle centers of the adapted mesh to an anatomically corresponding patch of maximum geodesic radius r = 10mm of the reference mesh and
408
J. Peters et al.
Table 1. Mean segmentation error after deformable adaptation for the different anatomical regions of the heart as well as the percentage of triangles with various error ranges, (a) without calibration, (b) with calibration.
Anatomical region Whole mesh Aorta Left atrium Left ventricle Myocardium (LV) Right atrium Right ventricle Pulmonary artery
mean (mm) (a) (b) 1.33 0.76 1.16 0.60 1.44 0.72 1.59 0.69 1.78 0.83 0.69 0.63 0.95 0.74 0.99 0.73
Percentage of triangles with error range (%) < 1.0 mm 1.0 − 2.0 mm > 2.0 mm (a) (b) (a) (b) (a) (b) 42.7 76.6 37.4 22.7 19.9 0.7 36.1 98.5 63.6 1.5 0.3 0.0 22.8 80.0 60.7 18.1 16.5 1.9 28.6 83.2 40.5 16.8 30.9 0.0 26.1 68.3 32.9 30.5 41.0 1.2 83.9 87.7 15.8 12.3 0.3 0.0 54.9 81.1 44.0 18.9 1.1 0.0 62.9 78.4 32.8 21.2 4.3 0.4
vice versa, averaged over all 42 data sets. The triangle positions near the artificial caps of the truncated vessels are excluded from the error measurement (1032 out of 14771 triangles), since they do not relate to anatomical structures. The reference meshes were generated semi-automatically by using preliminary boundary detection functions and a preliminary mean mesh obtained from a different database of cardiac CT images (thus unrelated to the models used in the test experiments), followed by manual correction. Segmentation. Reasonable segmentation results have been obtained in all cases of our database. In Table 1 we evaluate the effect of calibrating training and test images according to Section 4, for the compelete mesh and each anatomical region. In particular, we show the error distribution across the triangles with the corresponding anatomical label. We find that image calibration reduces the mean segmentation error for all anatomical parts. The reduction is statistically significant (paired t-test) for the whole mesh, the left ventricle, and the LV myocardium (P < 0.01) as well as for the right chambers (P < 0.05).1 Furthermore, the fraction of triangles with medium to large errors (> 1mm) is considerably reduced, i.e., improvements are distributed over the entire model surface. For the left chambers and the myocardium, large errors (> 2mm) are drastically reduced. Some segmentation examples are presented in Figure 1. The whole segmentation chain needs less than a minute on a workstation with Dual-Xeon Hyper-Threading Intel processors (2 × 1.7GHz) and 1 GByte RAM.
6
Conclusion
We have demonstrated fully automatic whole heart segmentation (four chambers, myocardium and the trunks of the aorta, pulmonary artery and pulmonary veins) 1
For the remaining structures P < 0.2.
Automatic Whole Heart Segmentation in Static MR Image Volumes
409
in static MR image volumes with nearly isotropic voxels, using shape-constrained deformable models. The 3-D generalized Hough transformation successfully localized the heart in all 42 test images. Image calibration to compensate for intensity variations in the MR images significantly improved the segmentation performance over the entire model surface. We measured a mean segmentation error of 0.76mm (ranging from 0.60mm for the aorta to 0.83mm for the LV myocardium). Future work includes improved MR calibration schemes and an application of our algorithm to other imaging protocols. Acknowledgments. For the cardiac images and fruitful discussions we are grateful to M. Breeuwer, P. Ermes, G. Hautvast and F. Gerritsen from Philips Medical Systems Healthcare Informatics, Clinical Informatics - Advanced Development (Best, The Netherlands). We also thank J. von Berg and C. Lorenz (Philips Research Hamburg) for providing the cardiac mesh.
References 1. Cootes, T.F., Taylor, C.J.: Active shape models - ‘smart snakes’. In: Proc. British Machine Vis. Conf., pp. 266–275. Springer, Heidelberg (1992) 2. van Assen, H.C., Danilouchkine, M.G., Frangi, A.F., et al.: SPASM: A 3D-ASM for segmentation of sparse and arbitrarily oriented cardiac MRI data. Med. Image Anal. 10, 286–303 (2006) 3. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 484–498. Springer, Heidelberg (1998) 4. Mitchell, S.C., Lelieveldt, B.P.F., van der Geest, R.J., et al.: Multistage hybrid active appearance model matching: Segmentation of left and right ventricles in cardiac MR images. IEEE Trans. Med. Imag. 20(5), 415–423 (2001) 5. Stegmann, M.B., Pedersen, D.: Bi-temporal 3D active appearance models with applications to unsupervised ejection fraction estimation. In: Proc. SPIE Med. Imag., vol. 5747, pp. 336–350 (2005) 6. McInerney, T., Terzopoulos, D.: Deformable models in medical image analysis: a survey. Med. Image Anal. 1(2), 91–108 (2004) 7. Weese, J., Kaus, M.R., Lorenz, C., et al.: Shape constrained deformable models for 3D medical image segmentation. In: Insana, M.F., Leahy, R.M. (eds.) IPMI 2001. LNCS, vol. 2082, pp. 380–387. Springer, Heidelberg (2001) 8. Montillo, A., Metaxas, D., Axel, L.: Automated model-based segmentation of the left and right ventricles in tagged cardiac MRI. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, pp. 507–515. Springer, Heidelberg (2003) 9. Kaus, M.R., Berg, J.v., Weese, J., et al.: Automated segmentation of the left ventricle in cardiac MRI. Med. Image Anal. 8, 245–254 (2004) 10. Jolly, M.P., Duta, N.D., Funka-Lea, G.F.: Segmentation of the left ventricle in cardiac MR images. In: Int. Conf. on Computer Vision, vol. 1, pp. 501–508 (2001) 11. Fritscher, K.D., Pilgram, R., Schubert, R.: Automatic cardiac 4D segmentation using level sets. In: Frangi, A.F., Radeva, P.I., Santos, A., Hernandez, M. (eds.) FIMH 2005. LNCS, vol. 3504, pp. 113–122. Springer, Heidelberg (2005) 12. Lorenzo-Vald´es, M., Sanchez-Ortiz, G.I., Elkington, A.G., et al.: Segmentation of 4D cardiac MR images using a probabilistic atlas and the EM algorithm. Med. Image Anal. 8, 255–265 (2004)
410
J. Peters et al.
13. L¨ otj¨ onen, J., Kivist¨ o, S., Koikkalainen, J., et al.: Statistical shape model of atria, ventricles and epicardium from short- and long-axis MR images. Med. Image Anal. 8, 371–386 (2004) 14. Pfeifer, B., Hanser, F., Seger, M., et al.: Cardiac modeling using active appearance models and morphological operators. Medical Imaging 2005: Visualization, ImageGuided Procedures, and Display 5744, 279–289 (2005) 15. Ecabert, O., Peters, J., Walker, M.J., et al.: Automatic whole heart segmentation in CT images: Method and validation. In: SPIE-MI (2007) 16. Nyul, L.G., Udupa, J.K.: On standardizing the MR image intensity scale. Magnetic Resonance in Medicine 42, 1072–1081 (1999) 17. Brejl, M., Sonka, M.: Object localization and border detection criteria design in edge-based image segmentation: Automated learning from examples. IEEE Trans. Med. Imag. 19(10), 973–985 (2000) 18. von Berg, J., Lorenz, C.: Multi-surface cardiac modeling, segmentation, and tracking. In: Frangi, A.F., Radeva, P.I., Santos, A., Hernandez, M. (eds.) FIMH 2005. LNCS, vol. 3504, pp. 1–11. Springer, Heidelberg (2005) 19. Ballard, D.H.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), 111–122 (1981) 20. Peters, J., Ecabert, O., Schramm, H., Weese, J.: Discriminative boundary detection for model-based heart segmentation in CT images. In: SPIE-MI (2007)
PCA-Based Magnetic Field Modeling : Application for On-Line MR Temperature Monitoring G. Maclair1,2 , B. Denis de Senneville1, M. Ries1 , B. Quesson1 , P. Desbarats2 , J. Benois-Pineau2 , and C.T.W. Moonen1 1
IMF, UMR 5231 CNRS/Universit´e Bordeaux 2 - 146, rue L´eo Saignat, F-33076 Bordeaux 2 CNRS/Universit´e Bordeaux 1 - 351, cours de la Lib´eration, F-33405 Talence
Abstract. Magnetic Resonance (MR) temperature mapping can be used to monitor temperature changes during minimally invasive thermal therapies. However, MR-thermometry contains artefacts caused by phase errors induced by organ motion in inhomogeneous magnetic fields. This paper proposes a novel correction strategy based on a Principal Component Analysis (PCA) to estimate magnetic field perturbation assuming a linear magnetic field variation with organ displacement. The correction method described in this paper consists of two steps : a magnetic field perturbation model is computed in a learning step; subsequently, during the intervention, this model is used to reconstruct the magnetic field perturbation corresponding to the actual organ position which in turns allow computation of motion corrected thermal maps.
1 Introduction Real-time MR-thermometry provides continuous temperature mapping inside the human body and is a promising tool to monitor and control interventional therapies based on thermal ablation carried out with help of radio-frequency, laser, cryogenics or focused ultrasound. The MR observable signal is a complex number representing the local magnetization − → − → M . Grey levels on anatomical images are proportional to the magnitude of M whereas → − the phase value is proportional to the magnitude of the local magnetic field B and the local temperature. The Proton Resonance Frequency (PRF) technique gives an estimate of the temperature changes at instant t (noted ΔTt ) by evaluating phase shifts between dynamically acquired images and reference data sets as follow [1]: ΔTt = (ϕref − ϕn ) .k
k=
1 − → γ.α. B .TE
(1)
where γ is the gyromagnetic ratio (≈ 42.58 MHz/Tesla), α the temperature coefficient (≈ 0.009 ppm/K), and T E the echo time. This calculation is performed for each voxel in the image to obtain temperature maps.
This study was funded in part by the EC-FP6-project DiMI, .Thanks to the “Conseil Regional d’Aquitaine” and the “Ligue Nationale Contre le Cancer” for financial contribution.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 411–419, 2007. c Springer-Verlag Berlin Heidelberg 2007
412
G. Maclair et al.
→ − The magnetic field B in a sample with a non-uniform magnetic susceptibility χ is expressed as [2]: → − → − → − → − → − (2) B = (1 + χ) · ( H 0 + H obj ) = H 0 · (1 + χ) + H obj · (1 + χ) → − where H 0 is the magnetic field in vacuum including field inhomogeneities due to the → − design of the magnet and fabrication tolerances, and H obj is the demagnetization field describing the field distortion caused by non-uniform susceptibility distribution. Since → − B is generally spatially non-uniform, any phase measurements on a tissue sample taken at a different position will show a relative phase difference. Therefore the attempt to detect temperature changes with help of phase differences would be severely biased by motion induced phase changes [3]. A robust removal of these non-temperature related phase variations is a prerequisite for precise MR-thermometry on moving objects. Processing of an image must be done fast enough to ensure real-time monitoring of the temperature evolution. In practical terms, this implies that image processing must be achieved within the delay between successive acquisitions and on the time scale of significant temperature variations. The recently suggested multi-baseline approach [4] is motivated by the fact that for most therapeutic applications within the human body, motion is caused by the respiratory or the cardiac cycle and is thus periodic. In this method, a complete collection of reference magnitude and phase images is constructed before thermal therapy. During the intervention, the phase image of the collection acquired with a similar organ position is selected (for that purpose, an inter-correlation coefficient is computed for anatomical images), and then used as a reference for temperature computation in equation (1). Since the difference represents only temperature related phase changes, the corrected temperature can be estimated. However, this approach has two drawbacks: first, the correction is constrainted to positions present in the collection. Second, complex motion patterns require a densely populated collection and thus computational overhead may be unsuitable for real-time MR-thermometry. This paper proposes an alternative approach by modeling the contributions of phase changes induced by respiratory motion in the abdomen in a preparative learning step. Subsequently, during the intervention, the necessary phase reference maps of equation (1) are calculated in real-time.
2 Proposed Approach 2.1 Physical Background The first term of equation (2) represents the local static magnetic field. Since a voxel by voxel motion compensation allows us to follow a selected target in space, the magnetic susceptibility χ, which is a material constant, is always the value of target which was initially chosen χ0 , since it moves with the observed tissue. This reduces the problem → − to the task of finding an approximation for H 0 for the range of the entire motion cycle for all voxels independently. Two simplifications can be made : firstly, during the respiratory cycle, the predominant movement direction of the abdominal organs is in head-foot direction. Therefore,
PCA-Based Magnetic Field Modeling
413
for commonly used horizontal bore superconductive magnets, the most important factor → − → − is the magnetic field variation of H 0 in head-foot direction. Since B 0 has to fulfill the → − Maxwell-Equation ∇ B = 0, it can be shown that for the case of a superconducting horizontal bore magnet this leads to Laplace’s equation for the longitudinal component of the magnetic field [5]: (3) ∇2 Bz = 0 The solution to Laplace’s equation can be expressed in the form of spherical harmonics. Therefore, for motion along the z-axis of the field, the field variation is of polynomial form: (4) ΔBz = c1 z + c2 z 2 + c3 z 3 + . . . + cn z n The second simplification is that for recent clinical 1.5T scanners with optimized spatial homogeneity, the dominant term in the vicinity of the iso-center of the magnet is the linear term c1 , which can be of the order of 5 − 10 ppm [5]. Consequently, it can be expected that the local phase changes due to motion have a large linear component. The second term of equation (2) describes the demagnetization field caused by the non-uniform magnetic susceptibility. The form of the demagnetization field is generally complex since it depends on the shape and the difference of the magnetic susceptibility of the different objects in the magnetic field and has to satisfy the Laplace equation (4). Any displacement or plastic deformation of the abdominal organs will in general lead to a different demagnetization field. However, the magnetic susceptibility difference in biological tissues is only of the order of 1 ppm and the dominant field distortion is due to the air/tissue interface at the surface of the human body (10 ppm difference). For the case of respiratory cycle related motion, the global deformation of the abdomen is not very important and the induced magnetic field change can be neglected. Furthermore, it can be shown that the demagnetization field of susceptibility differences of 1 ppm falls off rapidly with the distance and is thus already of the order of 0.05 ppm at a distance of 1 cm [2]. Consequently, as long as local tissue deformations are on the scale of < 1 cm, demagnetization field changes in the far field are of small amplitude and can be approximated in first order with a linear term. Due to the fact that biological tissues are incompressible, this condition is fulfilled to a large extent for the kidney and the liver. Nevertheless, any method relying on these assumptions has to provide the means to test their validity on a pixel-by-pixel basis. 2.2 Learning Step During a learning step performed before the intervention (no hyperthermia, same MR acquisition protocol), complex organ deformation is estimated on anatomical images using an image registration algorithm. A parameterized motion flow model is then constructed using PCA [6]. This motion model allows the computation of a small set of parameters representing complex organ deformations. Subsequently, a parameterized magnetic field model is computed assuming a linear magnetic field variation with organ motion. Estimation of organ displacement: The training set from which a model of image motion is obtained, is a set of N flow fields. The objective is to relate the coordinate
414
G. Maclair et al.
of each part of tissue in the image with the corresponding part of tissue in a reference image (chosen to be the first of the time series and noted I0 ). Motion estimation on anatomical images can be obtained by a variety of image registration algorithms. In this study, a global affine transformation is estimated in first, using a differential approach of Gauss-Newton algorithm [7]. As a second step, a hierarchical approach of the Cornelius and Kanade algorithm [8] provides a good estimation of local organ displacement. The obtained motion field is used to remap all pixels of the phase images to their reference position. This maps all values of χ(x, y, t) (see equation (2)) to a fixed position χ0 (x, y). Learning a parameterized motion model: The set of N optical flow fields is used to build a parameterized flow model. PCA is used to find an orthonormal basis that spans an N -dimensional vector space. The components of this basis can be interpreted as the underlying characteristic patterns of the motion cycle. Since data sets from coherent periodical motion cycles have typically a high degree of redundancy, PCA is a convenient way to reduce the dimensionality N of the basis. In order to reduce the dimensionality of the original data set, only the m eigenvectors Di associated to the m largest eigenvalues λi (0 ≤ i ≤ m N ) are conserved to preserve the representative patterns of the observed motion. The size m of our subset is obtained by selecting only a subset of the highest ranked components which accounts for more than 95% of the sum of the quadratic norm of the eigenvalues. Characterization of the periodic motion cycle with a small set of parameters: The approximated spatial transformation Tt between anatomical image acquired at instant t (noted It ) and the reference image I0 is a linear combination of the first m components Di previously computed: Tt (x, y) =
m−1
Cit Di (x, y)
(5)
i=0
where Di and Tt are vector fields and Cit (0 ≤ i ≤ m − 1) is a set of parameters representing organ displacement at instant t. The objective is to find the coefficients Cit that produce a flow field minimizing the matching error : LS = (I0 − Tt (It ))
2
(6)
This minimization is realized using a Marquardt-Levenberg least square solver in a multi-resolution framework. Note that optical flow algorithms rely on the assumption of conservation of intensity along the trajectory. This condition can be violated during thermotherapy : several MR relevant tissue properties can change during heating [9] and potential global intensities variations can be generated by the heating device. Since the proposed method employs optical flow only during the preparative step and relies on a global fit of the image content during the intervention, estimated transformation Tt is thus less susceptible to those intensity variations.
PCA-Based Magnetic Field Modeling
415
Modeling motion induced phase changes: Our approach approximates the overall magnetic field variations in equation (2) by a sum of linear phase changes of each principal motion component on a pixel-by-pixel basis : ϕt (x, y) =
m−1
Cit Pi (x, y) + Pm (x, y)
∀t, 0 ≤ t ≤ N − 1
(7)
i=0
where Pi (0 ≤ i ≤ m) denotes the parameterized magnetic field model, with Pm representing the initial phase distribution, and Cit are the coefficients calculated from equations (5) and (6). From the set of N equations with m + 1 unknowns defined by (7), an overdetermined system is obtained if m + 1 N . Pi (x, y) are individually computed for each pixel (x, y) using a Singular Value Decomposition (SVD) performed at the end of the learning step. 2.3 Hyperthermia Procedure During hyperthermia, the parameterized magnetic field model Pi allows the reconstruction of magnetic field distribution corresponding to the current position of an organ. A reference non-heated phase image (ϕreco ) is thus calculated using the set of parameters Cit representing the actual organ displacement as follow : ϕreco (x, y) =
m
Cit Pi (x, y) + Pm+1 (x, y)
(8)
i=0
Then, the measured phase is subtracted to ϕreco to compute the temperature map from equation (1). The use of SVD to compute reference phase ϕreco offers the advantage of improving the precision of MR-thermometry. Temperature uncertainty can be estimated from standard deviation on time series measurements. As temperature variation is proportional to a phase variation and assuming that noise is equally distributed (σ(ϕref ) = σ(ϕt ) = σ(ϕ)), temperature uncertainty with the multi-baseline approach can be evaluated with: √ (9) σ(ΔT ) = σ 2 (ϕref ) + σ 2 (ϕt ).k = σ(ϕ). 2.k As phase uncertainty is directly related to Signal to Noise Ratio (SN R) with σ(ϕ) = 1 SN R [10], a lower bound on temperature accuracy can thus be evaluated as follow: √
σ(ΔT ) =
2.k SN R
(10)
With the proposed approach, as Pi results from the resolution of an overdetermined system, noise will be reduced on reconstructed √ phase images. Temperature uncertainty will thus be optimally reduced with a factor 2 (as optimally σ(ϕref ) = 0).
416
G. Maclair et al.
3 Results All experiments were performed on a Philips Achieva 1.5 Tesla MR scanner. Stability of the thermometry has been evaluated both on a physiologic phantom heated with a radiofrequency device and on the abdomen of a healthy volunteer during free breathing. 3.1 Results on a Physiologic Phantom The first experiment consists of simulating physiological motion with a periodic displacement of an amplitude of 23 mm and a period of 3.8 seconds by mounting 600 g of calf liver on a motorized platform. One image (resolution 128 × 128 pixels) was acquired each 70 ms (TE=13 ms). During the learning step, one hundred images were acquired to get a precise sampling of the periodical motion (approximately two respiratory cycles). Subsequently, the tissue has been heated with a 20 Watts of RF-power during 50 seconds.
Fig. 1. Standard deviation of the temperature evolution of MR-thermometry obtained during 50 seconds RF-heating of calf liver : A) without correction, B) with the classic multi-baseline approach, C) with the proposed correction. Note the reduction of local deformations around the two RF-electrodes and the improved temperature accuracy.
Temperature temporal standard deviation maps are reported without correction (1.A), with the classic multi-baseline approach (1.B) and with the proposed correction (1.C). The two hot spot observable in Figures 1.B and 1.C are induced by temperature rise around the extremity of the two radio-frequency electrodes. It can be seen that those two hot spots in Figure 1.B present irregular shape, due to the fact that local intensity changes appear during heating and lead to incorrect optical flow estimation. This effect is not observable with the proposed approach since it requires neither a conserved image intensity nor a normalized magnitude image during the hyperthermia procedure, but rather relies on a global fit of the principal components. 3.2 Results on In-Vivo Data Sets Temperature stability was then analyzed on the abdomen of a free breathing volunteer. One image of resolution 128×128 pixels was acquired each 63 ms (TE=28 ms). During the learning step, fifty images were acquired to get a precise sampling of the respiratory
PCA-Based Magnetic Field Modeling
417
cycle. Figure 2.A compares measured (blue dots) and reconstructed (red dots) phase variation along the principal axis reported by the yellow line in Figure 2.B in a pixel located in the kidney (white arrow) which is in good correspondence of the suggested linear model. Figures 2.C and 2.D show respectively the standard deviation and the peak-to-peak error of temperature between measured and reconstructed phase images. These two measures provide a quality criteria of the magnetic field modeling. It can be observed that, although the macroscopic magnetic field variation is rather complex, the observation of magnetic field variations along the principal axis of displacement for each voxel can be well approximated by a linear model as expected from the assumptions detailed previously (see Section 2.1). Temperature temporal standard deviation maps are reported without (3.A), with the classic multi-baseline approach (3.B) and with the proposed correction (3.C). It can be observed that the proposed approach gives an accurate temperature monitoring, both on the kidney and the liver. SN R in the kidney was evaluated to 9, which, using equation (10), leads to an expected theoretical temperature uncertainty of 1.5o C with the multibaseline approach. This value matches experimental results as a temperature standard deviation of 1.5o C was measured. An experimental temperature uncertainty of 1.11oC was measured with the proposed approach demonstrating an improvement with a factor √ close to 2. A)
B)
C)
liver
D)
o
5C
kidney o
0C
Fig. 2. Quality criteria on the magnetic field modeling : A) phase variation along the principal axis reported by the yellow line on B) in a pixel located in the kidney (white arrow). In blue the measured phase values, in red the reconstructed phase values; C) temperature standard deviation error; D) peak-to-peak temperature error.
Computational time required for classical multi-baseline [4] and proposed approach on a dual processor dual core AMD Opteron 2.4 GHz with 8 Gb of RAM are compared in Table 1. Total computation time for the proposed approach is significantly reduced as compared to the classic multi-baseline approach and is lower than 80 ms per image demonstrating that on-line image processing can be performed.
418
G. Maclair et al.
Fig. 3. Maps of temperature standard deviation on the abdomen of a healthy volunteer under free-breathing : A) without correction, B) with the classic multi-baseline approach, C) with the proposed correction
Table 1. Computational time for one image of resolution 128 × 128 pixels during hyperthermia Multi-baseline approach Proposed approach Motion estimation : 150 ms Cin calculation : 75 ms Image selection in the collection : 40 ms Phase reconstruction : < 1 ms Computation of thermal maps : 1 ms Computation of thermal maps : 1 ms
4 Discussion The proposed approach presents several advantages compared to the previously suggested multi-baseline approach : improvement of accuracy of computed temperature maps and improved robustness with respect to local and global intensity changes. Furthermore, the proposed work is compatible with the real-time thermotherapy constraint. The proposed modeling is currently under evaluation for more complex motion patterns than respiratory induced motion such as motion induced by the cardiac cycle. The proposed approach is a step towards robust MR guided thermotherapies and will be evaluated in the scope of a clinical study in the near future.
References 1. Quesson, B., de Zwart, J.A., Moonen, C.T.W.: Magnetic resonance temperature imaging for guidance of thermotherapy. J. Magn. Reson. Imaging 12, 525–533 (2000) 2. Salomir, R., de Senneville, B.D., Moonen, C.T.W.: A fast calculation method for magnetic field inhomogeneity due to an arbitrary distribution of bulk susceptibility. Concepts in Magnetic Resonance 19B(1), 26–34 (2003) 3. De Poorter, J.: Noninvasive MRI thermometry with the proton resonance frequency method: study of susceptibility effects. Magn. Reson. Med. 34(3), 359–367 (1995) 4. de Senneville, B.D., Mougenot, C., Moonen, C.T.W.: Real time adaptive methods for treatment of mobile organs by MRI controlled High Intensity Focused Ultrasound. Magnetic Resonance in Medicine, Magnetic Resonance in Medicine 57(2), 319–330 (2007) 5. Chen, T.E., Hoult, T.I.: Biomedical Magnetic Resonance Technology, p. 30. Adam Hilger IOP Publishing Ltd., New York (1989)
PCA-Based Magnetic Field Modeling
419
6. Black, M.J., Yacoob, Y., Jepson, A.D., Fleet, D.J.: Learning parameterized models of image motion. In: IEEE Proc. Computer Vision and Pattern Recognition, pp. 561–567. IEEE Computer Society Press, Los Alamitos (1997) 7. Friston, K.J., Ashburner, J., Frith, C.D., Poline, J.B., Heather, J.D., Frackowiak, R.S.J.: Spatial registration and normalisation of images. Human Brain Mapping 2, 165–189 (1995) 8. Kanade, T., Cornelius, N.: Adapting optical-flow to measure object motion in reflectance and x-ray image sequences, Association for Computing Machinery, SIGGRAPH/SIGART, pp. 50–58 (1983) 9. Graham, S.J., Bronskill, M.J., Henkelman, R.M.: Time and temperature dependence of MR parameters during thermal coagulation of ex vivo rabbit muscle. Magn. Reson. Med. 39(2), 198–203 (1998) 10. Conturo, T.E., Smith, G.D.: Signal to Noise in Phase Angle Reconstruction: Dynamic Range Extension Using Phase Reference Offsets. Magnetic Resonance in Medicine 15, 420–437 (1990)
A Probabilistic Model for Haustral Curvatures with Applications to Colon CAD John Melonakos1, Paulo Mendonça2, Rahul Bhotka2, and Saad Sirohey3 1
School of ECE, Georgia Institute of Technology, Atlanta, GA, USA GE Global Research, One Research Circle, Niskayuna, NY, USA 3 GE Healthcare, Waukesha, WI
2
Abstract. Among the many features used for classification in computer-aided detection (CAD) systems targeting colonic polyps, those based on differences between the shapes of polyps and folds are most common. We introduce here an explicit parametric model for the haustra or colon wall. The proposed model captures the overall shape of the haustra and we use it to derive the probability distribution of features relevant to polyp detection. The usefulness of the model is demonstrated through its application to a colon CAD algorithm.
1 Introduction With over half a million obits, colorectal cancer was ranked as the fourth leading cause of cancer death worldwide in 2002 [3], and it is currently ranked as the second leading cause of cancer-related deaths in the United States [7]. Most colorectal cancers arise from benign colonic polyps, and their early detection can significantly increase survival rates [4]. Optical colonoscopy is part of the standard screening protocol for the detection of polyps in the colon [19], but the discomfort and long duration of this procedure has negative impact on patient compliance [6]. Virtual colonoscopy (VC) or computed tomography colonography (CTC) has shown promise as a less invasive method for detecting polyps, with performance at least as good as that of optical colonoscopy [18]. Once computed tomography (CT) imaging and advanced visualization tools were introduced, the natural second step is the use of computer-aided detection (CAD) systems for automating the search for colonic polyps, and a number of CAD techniques have been developed in recent years. Early examples include the work of Yoshida et al [28], in which principal curvatures were used to compute a shape index indicative of the roundness of polyps. Curvatures were also used by Vos et al [27] and by Summers et al [22,23]. More recently, modeling through spherical harmonics [10], surface normal overlap [17] and other curvature-based methods have been developed [1,26]. The use of small to moderate [5,8,26] and of large [25] feature sets followed by a more sophisticated classification mechanism have also been explored. In the list above, curvature and curvature-based measures are the features most commonly used for classification. In particular, [22,28,27,24,1,26] built explicit models with different degrees of complexity for the ranges of curvatures observed in colonic polyps, folds, and, occasionally, the haustra (or colon wall) itself. However, whereas folds and N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 420–427, 2007. © Springer-Verlag Berlin Heidelberg 2007
A Probabilistic Model for Haustral Curvatures with Applications to Colon CAD
421
polyps are modeled through highly sophisticated schemes, the haustra is either altogether omitted [27,1,26] or simply mentioned as a region of low curvature [22,28]. This paper introduces a novel model for the haustra. We use the same assumptions as [22,28] regarding the shape of the haustra, but we augment the model with a component driven by recent results in the theory of Gaussian random fields [14]. This allows for an accurate estimation of the probability distribution of curvatures of isosurfaces of the haustra, which can be naturally fed into any curvature-based CAD system aimed at detecting colonic polyps. Results with real data show the usefulness of the proposed model as applied to colon CAD.
2 Haustra Model Let the volume image I be defined as a twice-differentiable mapping from V ⊂ R3 into R. For any given c, we define an isosurface Mc ⊂ V at the isovalue c as the set of points x satisfying I(x) = c and ∇I(x) = 0. It can be shown that the principal curvatures of Mc at a point x can be computed in closed form as the eigenvalues of the matrix ˜ = NT HN/∇I [20, pg. 138], where ∇I is the gradient of I, H is the Hessian of I, H and columns of N form an orthonormal basis for the null space of ∇I. This computation can be carried out as the concatenation of a linear and a nonlinear step. As depicted in Fig. 1, the linear step comprises the computation of the gradient and the Hessian of the image input image. The nonlinear step involves the matrix multiplications NT HN, the computation of and division by the scalar ∇I, and the actual computation of the eigenvalues of the resulting 2 × 2 matrix. In practice, the linear step also includes smoothing the volume image, which is done to control the noise associated with the computation of the derivatives ∇I and H but can also model the effects of the system point spread function. Linear System
d/dx
Nonlinear System
d2s/dx2
2 d/dy d s/dxdy 2 d/dz d s/dxdz
Gaussian smoothing σ
I(x,y,z)
d/dy d/dz
ds/dy ds/dz
d/dx d/dy d/dz d/dx d/dy d/dz
Eigenvalue computation
d2s/dydx d2s/dy2
T
H
λ
( −N|| IHN|| Δ
I0 (x,y,z)
ds/dx
(
d/dx
κ 1(x,y,z) κ 2(x,y,z)
d2s/dydz d2s/dzdx d2s/dzdy d2s/dz 2 I
Δ
Fig. 1. Step-by-step computation of curvatures on isosurfaces of volume images
2.1 The Geometric Model In [21], the haustra of the colon is defined as: “the sacculations of the colon, caused by the teniae, or longitudinal bands, which are slightly shorter than the gut so that the latter is thrown into tucks or pouches”. These sacculations are the curved segments of the colon wall. The morphological models in Langer et al. [11] justify the adoption
422
J. Melonakos et al.
by [22,28] of a low curvature surface representation for the haustra. However, such a model, by itself, is not enough to specify the distribution of curvatures that one expects at the colon wall. In order to achieve this, we consider a volume image I0 (x) representing any particular low curvature geometric model of the haustra. Now let h0 (x) be a corruption of I0 (x) by additive white noise η0 (x). Following the pipeline described in Fig. 1 with h0 (x) as the input image, we obtain a smoothed volume h(x) = I(x) + η (x), where I(x) and η (x) are smooth versions of I0 (x) and η0 (x). In particular, η (x) is still Gaussian noise, but no longer white. Assuming that the variance of the input (zeromean) white noise is ση2 , the autocorrelation function R(x) of the filtered noise will be R(x) = ση2 exp(−xT Σ x/4) [2], where Σ is the covariance matrix of the smoothing kernel. For an isotropic kernel, we have Σ = σ 2 I. Denoting the function that maps a matrix A to its eigenvalues by λ (A), the nonlinear step of the curvature computation yields
κ =λ
−NT H N −NT (H + H )N I η h =λ , ∇h ∇I + ∇η
(1)
where κ = (κ1 , κ2 ) are the principal curvatures of h(x) at the point x. 2.2 Distribution of Curvatures The stochastic differential equation in (1) can be simplified by making reasonable assumptions about the shape and appearance of the haustra as observed in CT images. First, the matrix NT HI N must have, on average, a small Frobenius norm compared to that of NT Hη N. To demonstrate this, observe that, per the discussion in section 2.1, the magnitude of the eigenvalues of NT HI N is small, reflecting the low curvature of the haustra. Since NT HI N is a symmetric matrix, its singular values must also be small. This last observation indicates that a Taylor expansion of (1) around NT HI N = 0 yields a good approximation for κ , i.e., −NT (H )N ∂κ η + vec(NT HI N)T κ ≈λ (2) T T ∇I + ∇η ∂ vec(N HI N) HI =0 −NT (H )N −NT H N η I ≈λ +λ , (3) ∇I + ∇η ∇I + ∇η where vec(A) indicates the vector built out the matrix A by stacking its columns. Indicating the Kronecker product by ⊗ and the i-th eigenvalue of the (symmetric) matrix A by λi , the equality of (2) and (3) can be verified by using the relations vec(ABC) = (CT ⊗ A) vec(B) and ∂ λi (A)/∂ vec(A)T = vi ⊗ vi , where vi is the eigenvector associated to λi [13]. We further simplify (1) by assuming that the magnitude of ∇I is, on average, large compared to that of ∇η . This assumption simply reflects the fact that the haustra corresponds to a sharp interface between air and soft tissue. Hence, (1) becomes,
κ≈
λ (NT Hη N) + κ 0, ∇I
(4)
A Probabilistic Model for Haustral Curvatures with Applications to Colon CAD
423
T −N HI N where κ 0 = λ ∇I+∇ η . Note that κ is a random variable, and, from (4), we can see that its probability density p(κ ) is given by p(κ ) = pλ η ((κ − κ 0 )∇I)∇I2 ,
(5)
where pλ η is the probability density of the random variable λ η = λ (NT Hη N). In order to derive p(κ ) we adapt recent results in the theory of Gaussian random fields [14], which establish that λ η is distributed according to a linear combination of Gaussian and chi-distributed independent random variables. More precisely, for an isotropic smoothing kernel with covariance matrix Σ = σ 2 I, λ η ∼ α [N(0, 2) ∓ X (2)], with α = ση /(2σ 2 ), resulting in 1 p ((λ2 + λ1 )/(2α ))pX (2) ((λ2 − λ1)/(2α )) 2α 2 N(0,2) 5λ 2 −6λ2 λ1 +5λ2 2 1 − 1 32α 2 (λ2 − λ1 )e = √ 8 2πα 3
pλ η (λ ) =
(6)
Assuming a simple spherical model of radius R for the input image I, we have κ 0 = (−1/R, −1/R). Therefore, the final probability distribution of the haustral curvatures is given by (5κ1 2 −6κ2 κ1 +5κ2 2 )R2 +4(κ1 +κ2 )R+4 1 − 32R2 α 2 κ √ p( ) = (κ2 − κ1 )e , (7) 8 2πα 3 with α = ση /(2σ 2 ∇I).
3 Experiments and Results As a first step to demonstrate the usefulness of the proposed model we captured principal curvature data from random samples of pedunculated, sessile and flat polyps, haustral folds, and haustra. In Fig. 2, we display this scatter data over the decision boundaries generated from our haustra model (yellow) and models for pedunculated (red), sessile and flat polyp (green), and haustral fold (blue), described in [1]. The overlaid yellow scatter data points provide visual validation for the haustra model decision boundary, as shown in Fig. 2a. In Fig. 2b, we show the decision boundaries obtained by considering only two model categories: for (polyps) and against (haustra and folds). In Fig. 3, we provide another depiction of the benefit derived from the addition of the haustra model to the colon CAD system. Figure 3d shows an image region centered at an actual polyp. Figure 3a shows the result of a Bayesian competition [9] between the combined polyp model (pedunculated, sessile and flat) against the fold model, without inclusion of the haustra model in the same region. The strongest responses (magenta) are indeed in the polyp, but there are weaker responses (blue) scattered all around the colon wall. Even though most of those can be discarded through thresholding, such a “clean-up” algorithm can remove true positive detections. Figure 3b shows the same region as in Fig. 3a, but now with the haustra model included. The haustra model clearly plays a significant role in reducing the polyp responses on the haustra without affecting
424
J. Melonakos et al. Model−Based Partition of κ1−κ2 Space
Model−Based Detection Boundaries
0.4
0.4
0.3
0.3
0.2
0.2 κ
2
0.5
κ2
0.5
0.1
0.1
0
0
−0.1
−0.1
−0.2 −0.5
−0.2 −0.4
−0.3
−0.2
(a)
−0.1 κ1
0
0.1
0.2
−0.5
−0.4
−0.3
−0.2
−0.1 κ1
0
0.1
0.2
(b)
Fig. 2. Partitioning of κ1 -κ2 space with overlying scatter plot data: (a) Model-based partitioning of κ1 -κ2 space, red = pedunculated polyps, green = sessile and flat polyps, blue = haustral folds, yellow = haustra, (b) Binary partitioning of κ1 -κ2 space, red = polyp responses, blue=nonpolyp responses
the response at the polyp itself. In effect, the haustra model eliminates many potential false positives. In Fig. 3c we see the haustra responses alone, obtained by competing the haustra against all the other models, and the reason for the differences between Fig. 3a and Fig. 3b become clear. Finally, we show the application of our model to a the colon CAD system described in [1]. The test data consisted of a subset of 36 CT volumes from the WRAMC dataset, from which 23 polyps with diameter above 6 mm were marked by expert radiologists and confirmed by optical colonoscopy.1 The protocol for patient preparation consisted of oral administration of 90 ml of sodium phosphate and 10 mg of bisacodyl, with a clear-liquid diet that included 500 ml of barium for stool tagging and 120 ml of diatrizoate meglumine and diatrizoate sodium for fluid tagging [18]. The complete WRAMC dataset comprises many more images, but unfortunately ground truth is provided as a distance from the rectum along the colon centerline only, and the precise image location of polyps in this dataset must still be carried out by an expert. It is important to note that this dataset is significantly more challenging to CAD than the “fully prepped” data commonly used throughout the literature (e.g., all of the CAD work mentioned in the introduction with the exception of the work of Summers et al, such as in [24]). However, if the use of CTC demands full cleansing of the colon, patient compliance may still be an issue, since the strongest factor affecting acceptance of colonoscopy is the extent of bowel preparation [6]. Minimally invasive protocols such as the one applied to the collection of the WRAMC data mitigate this problem [12], but pose new difficulties for the interpretation of the images [4]. In Fig. 4, we present a free-response receiver operating characteristic (FROC) curve demonstrating the performance of a lung CAD system that makes use of the proposed haustra model. 1
This data has been provided courtesy of Dr. Richard Choi, Virtual Colonoscopy Center, Walter Reed Army Medical Center.
A Probabilistic Model for Haustral Curvatures with Applications to Colon CAD
(a)
(b)
(c)
(d)
425
Fig. 3. Visual Haustra Results: (a) polyp responses without the haustra model in a window centered at an actual polyp, (b) polyp responses with the haustra model, (c) haustra responses alone, and (d) Raw image data Subset of NIH dataset, polyps > 6 mm
% True positives
1 0.8 0.6 0.4 0.2 0 0
2
4
6
8
10
False positives/case
Fig. 4. FROC curve for the performance of a colon CAD system using the proposed haustra model. A sensitivity of 83% is achieved ad a cost of 6.25 false positive detections per case.
A sensitivity of 83% is achieved at a cost of 6.2 false positive detections per case, and running time is in the order of 10 min for a 512 × 512 × 700-voxel CT volume with a research prototype implemented with ITK.
4 Conclusions and Future Work In this work, we have introduced a novel probabilistic model for the curvature of isosurfaces of the haustra. An expression of the probability density function of such curvatures
426
J. Melonakos et al.
was provided by considering Gaussian random perturbation to a geometric abstraction of the colon wall. The model augments the set of models developed in [1] for applications in colon CAD, and was demonstrated in a specific colon CAD application. In the current formulation the radius R of the haustra is a fixed parameter representative of the expected radius of the insufflated colon. However, we could account for colonic haustra variations in size and shape by marginalizing over the parameter R. The prior for this marginalization will depend upon either training data of prior clinical knowledge of insufflated haustra radii. In [15] we have shown how to compute the probability distribution of curvatures for a class of ellipsoidal surfaces, suggesting a mechanism to achieve such generalization. An interesting debate is presented in [16], which, although in the context of lung CAD, is relevant to this work. From that discussion it is clear that many radiologist see CAD not necessarily as a tool to improve upon the performance of the best radiologists, but as a means to standardize or regularize results of radiologists with varying degrees of experience. To validate such expectation, however, it is necessary to have data read my multiple radiologists, which is not the case with the WRAMC data set.
References 1. Bhotika, R., Mendonça, P.R.S., Sirohey, S.A., Turner, W.D., Lee, Y.-L., McCoy, J.M., Brown, R.E.B., Miller, J.V.: Part-based local shape models for colon polyp detection. In: Medical Image Computing and Computer-Assisted Intervention, Copehnagen, Denmark, pp. 479– 486 (2006) 2. Couch II, L.W.: Digital and Analog Communication Systems, 3rd edn. Macmillan Publishing Company, New York (1990) 3. Ferlay, J., Bray, F., Pisani, P., Parkin, D.M.: GLOBOCAN 2002: Cancer incidence, mortality and prevalence worldwide. Technical report, IARC CancerBase No. 5. version 2.0. IARCPress, Lyon, France (2004), http://www-dep.iarc.fr/ 4. Ferrucci, J.T.: Colon cancer screening with virtual colonoscopy: Promise, polyps, politics. Am. J. Roentgenol. 177, 975–988 (2001) 5. Göktürk, S.B., Tomasi, C., Burak, A., Beaulieu, C.F., Paik, D.S., Brooke Jeffrey Jr., R., Yee, J., Napel, S.: A statistical 3-D pattern processing method for computer-aided detection of polyps in CT colonography. Medical Imaging 20(12), 1251–1260 (2001) 6. Harewood, G.C., Wiersema, M.J., Melton III, L.J.: A prospective, controlled assessment of factors influencing acceptance of screening colonoscopy. The American Journal of Gastroenterology 97(12), 3186–3194 (2002) 7. Jemal, A., Siegel, R., Ward, E., Murray, T., Xu, J., Thun, M.J.: Cancer statistics. CA Cancer J. Clin. 57(1), 43–66 (2007) 8. Jerebko, A., Lakare, S., Cathier, P., Periaswamy, S., Bogoni, L.: Symmetric curvature patterns for colonic polyp detection. In: Medical Image Computing and Computer-Assisted Intervention, Copehnagen, Denmark, pp. 169–176 (2006) 9. Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995) 10. Kiss, G., Van Cleynenbreugel, J., Drisis, S., Bielen, D., Marchal, G., Suetens, P.: Computer aided detection for low-dose CT colonography. In: Medical Image Computing and Computer-Assisted Intervention, Palm Springs, USA, pp. 859–867 (October 2005) 11. Langer, P., Takacs, A.: Why are taeniae, haustra, and semilunar folds differentiated in the gastrointestinal tract of mammals, including man? J. Morph. 259(3), 308–315 (2004)
A Probabilistic Model for Haustral Curvatures with Applications to Colon CAD
427
12. Lefere, P., Gryspeerdt, S., Baekelandt, M., Van Holsbeeck, B.: Laxative-free CT colonography. Am. J. Roentgenol. 183(4), 945–948 (2004) 13. Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York (1995) 14. Mendonca, P.R.S., Bhotika, R., Miller, J.V.: Probability distribution of curvatures of isosurfaces in Gaussian random fields, arXiv (May 2007) 15. Mendonça, P.R.S., Bhotika, R., Zhao, F., Miller, J.V.: Lung nodule detection via Bayesian voxel labeling. In: Karssemeijer, N., Lelieveldt, B. (eds.) Information Processing in Medical Imaging, Kerkrade, The Netherlands, pp. 134–145 (2007) 16. Mulshine, J.L.: Clinical issues in the management of early lung cancer. Clin. Cancer Res. 11(13), 4993s–4998 (2005) 17. Paik, D.S., Beaulieu, C.F., Rubin, G.D., Acar, B., Jeffrey Jr., R.B., Yee, J., Dey, J., Napel, S.: Surface normal overlap: A computer-aided detection algorithm with application to colonic polyps and lung nodules in helical CT. IEEE Trans. Medical Imaging 23(6), 661–675 (2004) 18. Pickhardt, P.: Target lesion: The radiologist’s perspective. In: Sixth International Symposium on Virtual Colonoscopy, Boston, MA, pp. 60–62 (2005) 19. Smith, R.A., Cokkinides, V., Eyre, H.J.: American cancer society guidelines for the early detection of cancer. CA Cancer J. Clin. 56(1), 11–25 (2006) 20. Spivak, M.: A Comprehensive Introduction to Differential Geometry, 3rd edn., vol. III. Publish or Perish, Houston, TX, USA (1999) 21. Spraycar, M. (ed.): PDR Medical Dictionary, 1st edn. Williams and Wilkins, Baltimore, MD (1995) 22. Summers, R.M., Beaulieu, C.F., Pusanik, L.M., Malley, J.D., Jeffrey, J., Brooke, R., Glazer, D.I., Napel, S.: Automated polyp detector for CT colonography: Feasibility study. Radiology 216(1), 284–290 (2000) 23. Summers, R.M., Johnson, C.D., Pusanik, L.M., Malley, J.D., Youssef, A.M., Reed, J.E.: Automated polyp detection at CT colonography: Feasibility assessment in a human population. Radiology 219, 51–59 (2001) 24. Summers, R.M., Yao, J., Pickhardt, P.J., et al.: Computed tomographic virtual colonoscopy computer-aided polyp detection in a screening population. Gastroenterology 129, 1832–1844 (2005) 25. Tu, Z., Zhou, X.S., Bogoni, L., Barbu, A., Comaniciu, D.: Probabilistic 3D polyp detection in CT images: The role of sample alignment. In: Proc. Conf. Computer Vision and Pattern Recognition, New York, USA, vol. II, pp. 1544–1551 (June 2006) 26. van Wijk, C., van Ravesteijn, V.F., Vos, F.M., Truyen, R., de Vries, A.H., Stoker, J., van Vliet, L.J.: Detection of protrusions in curved folded surfaces applied to automated polyp detection in CT colonography. In: Medical Image Computing and Computer-Assisted Intervention, Copenhagen, Denmark, pp. 471–478 (October 2006) 27. Vos, F.M., Serlie, I.W.O., van Gelder, R.E., Post, F.H., Truyen, R., Gerritsen, F.A., Stoker, J., Vossepoel, A.M.: A new visualization method for virtual colonoscopy. In: Medical Image Computing and Computer-Assisted Intervention, Berlin, pp. 645–654 (2001) 28. Yoshida, H., Näppi, J.: Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Trans. Medical Imaging 20(12), 1261–1274 (2001)
LV Motion Tracking from 3D Echocardiography Using Textural and Structural Information Andriy Myronenko1 , Xubo Song1 , and David J. Sahn2 1
Dept. of CSEE, OGI School of Science and Engineering 2 Cardiac Fluid Dynamics and Imaging Laboratory 1,2 Oregon Health and Science University 20000 NW Walker Road, Beaverton, OR 97006, USA [email protected], [email protected], [email protected]
Abstract. Automated motion reconstruction of the left ventricle (LV) from 3D echocardiography provides insight into myocardium architecture and function. Low image quality and artifacts make 3D ultrasound image processing a challenging problem. We introduce a LV tracking method, which combines textural and structural information to overcome the image quality limitations. Our method automatically reconstructs the motion of the LV contour (endocardium and epicardium) from a sequence of 3D ultrasound images.
1
Introduction
The analysis of myocardium deformation provides insight into the heart’s architecture and function. Medical imaging techniques, such as MRI, CT and recently 3D ultrasound, allow the acquisition of dynamic sequences of 3D images (3D+T) during a complete cardiac cycle. Processing these images can provide quantitative measurements such as strain, wall thickening, torsion, volume and ejection fraction, which can be used to evaluate the elasticity and contractility of the myocardium. For instance, an ischemic or infarcted segments of the heart are typically associated with reduced regional elasticity and contractility. Such measurements may also serve as earlier sub-clinical markers for ventricular dysfunction and myocardial disease [1,2]. 3D echocardiography provides an attractive alternative to MRI and CT because of its portability, bedside applicability, low cost, and safety benefits [1]. Despite certain advances in technology, 3D ultrasound images are still of low quality with many artifacts, such as attenuation, speckle, shadows, and signal dropout. A relatively small amount of research has been done in motion analysis from 3D echocardiography [3,4,5]. Montagnat and colleagues [4] develop a 3D model-based ultrasound segmentation; the method filters the image by (4D) anisotropic diffusion and fits the LV model to the high intensity gradients. Sanchez-Ortiz and collaborators [5] use multi-scale fuzzy-clustering method together with phase-based acoustic feature detection to fit the LV model. We call such methods, which incorporate a model-based boundary segmentation, structure based methods. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 428–435, 2007. c Springer-Verlag Berlin Heidelberg 2007
LV Motion Tracking from 3D Echocardiography
429
Low image contrast of 3D echocardiography makes the LV boundaries challenging for segmentation. Instead of tracking the LV boundaries, speckle tracking was proposed to analyze myocardium deformation [6,7]. Speckle is a texture pattern formed by the interference of the backscattered echoes produced by ultrasonic scatterers in tissue and blood. The speckles follow the motion of the myocardium and remain fairly constant when the temporal sampling rate is adequately high, which makes tracking possible. We call such methods, which incorporate ultrasound beam amplitude/phase, postprocessed intensity or its transformations, texture based methods. We present a method for tracking the LV motion, given a time sequence of 3D ultrasound images. We use two sources of information for tracking: textural and structural. We use textural information (e.g. speckle intensity) to align two consecutive images. We use structural information to assure that alignment using textural information locates the LV contour at the heart boundaries. Thus, we seek the position of the LV contour in the next time volume, so that its texture coinsides with the texture of its previous position, and so that its location is on the heart’s boundary. The implementation of our method has a low computation complexity, which is desirable in clinical settings. Combining textural and structural information can greatly improve the accuracy of the motion tracking in 3D ultrasound imaging.
2
Combining Textural and Structural Information
Consider two 3D ultrasound images, It (x, y, z) and It+1 (x , y , z ), obtained at time t and t + 1 respectively. Assume an initial boundary segmentations of the LV endocardium and epicardium are available at time t. These two boundaries define the LV contour Ct . Boundary initialization is itself a complicated problem; we will describe a procedure in Section 3.4. For tracking, our goal is to find the contour position Ct+1 at time t + 1. Using the textural information, we register image It with It+1 and use the resulting transformation to transform the initial contour Ct to its new position Ct+1 . The goal of image registration is to find the underlying transformation T : (x, y, z) → (x , y , z ) that maps any point of one image into its corresponding point in the other image. In our implementation, we use image intensity as a textural representation of the images; other features, such as image gradient or moments, are also appropriate. Using the structural information, we want to align the contour Ct to the high intensity gradient region of the image It+1 . We adopt the active contour segmentation approach [8], while using the same FFD transformation T (Ct+1 = T (Ct )) as for the image registration. The motion of left ventricle is complex. We choose a Free Form Deformation (FFD) model [9] to parameterize the transformation T . FFD has been successfully used for non-rigid medical image registration [10] and for tracking of cardiac MRI [11]. The basic idea of FFD is to deform a 3D object by manipulating a mesh of control points. Since FFD is volumetric transformation, we can transform
430
A. Myronenko, X. Song, and D.J. Sahn
T −→
? (a)
(b)
Fig. 1. Schema of the method. (a) Given two 3D ultrasound images It and It+1 , and the initial position of the contour Ct , find the contour position Ct+1 . (b) We use FFD to parametrize the transformation T by a mesh of control points (+). By manipulating the control points positions, we want to solve two tasks simultaneously: to deform the image It to align it with It+1 , and to deform contour Ct to align it to the high intensity gradient region of the image It+1 .
several objects (e.g., 3D image, endocardium and epicardium contours) simultaneously with a single FFD. The main advantage of FFD is that the complex non-rigid transformation is defined by a small number of parameters (control points positions). Our tracking method is a joint non-rigid image registration and active contour segmentation, where we use a single FFD parametrization of transformation T . We minimize the energy function E = Etext + λEstruct
(1)
where Etext is a non-rigid image registration objective function that represents a texture based tracking, and Estruct is an active contour energy function that represents a structure based tracking. Parameter λ represents a trade-off between the textural and structural information influences. Figure 1 illustrates the schema of the method.
3
Method
3.1
Non-rigid Image Registration Using FFD
We use FFD framework [9,10] to parametrize the non-rigid transformation T . We denote the image volume as Ω = {(x, y, z)|0 ≤ xn ≤ N, 0 ≤ ym ≤ M, 0 ≤ zk ≤ K}. We place a nx × ny × nz mesh of equally spaced control points pi,j,l over the image domain. The number of control points defines the complexity of the transformation. Then, the transformation T is a 3D tensor product of the 1D cubic B-splines: (x , y , z ) = T (x, y, z; p) =
3 3 3 k=0 m=0 n=0
Bk (ux )Bm (vy )Bn (wz )pi+k,j+m,l+n (2)
LV Motion Tracking from 3D Echocardiography
431
where i = x/nx − 1, j = y/ny − 1, l = z/nz − 1, ux = x/nx − x/nx , vy = y/ny − y/ny , wz = z/nz − z/nz , and Bk represents the k th B-spline basis function: B1 (u) = (1 − u)3 /6
B2 (u) = (3u3 − 6u2 + 4)/6
B3 (u) = (−3u3 + 3u2 + 3u + 1)/6
B4 (u) = u3 /6
Putting all control points in a matrix Pnx ny nz ×3 , we can rewrite Eq. 2 as X = T (X, P) = BP
(3)
where BMN K×nx ny nz is a sparse matrix of basis function values with M N K ×64 non-zero elements. We precompute the matrix B taking advantage of the known 3D grid structure of digital images. Matrix XMN K×3 denotes the coordinates of all voxels in the image It . We want to align image It with It+1 by minimize the following energy function: Etext (P) =
N,M,K
(It+1 (T (xn , ym , zk , P)) − It (xn , ym , zk ))2 =
n,m,k=0
(It+1 (T (X, P)) − It (X))2 =
(It+1 (BP) − It )2
(4)
where It+1 (BP) represents a column vector of image intensities at coordinates X = BP, and summation is over all image voxels. Little attention has been paid to image registration for 3D echocardiography motion analysis [12]. To our best knowledge, this paper is the first that applies FFD for 3D echocardiography non-rigid registration and motion tracking. 3.2
Active Contour Segmentation
We use structural information to assist the image registration process by applying the active contour approach [8] to align the contour Ct to the high intensity gradient region of the image It+1 . We use the same transformation T as for the image registration to parametrize the contour deformation: Ct+1 = T (Ct , P). The active contour moves to minimize its energy function, which consist of internal and external energy. The internal energy keeps the active contour smooth; the FFD parametrization of T innately constrains the contour motion by construction [11]. The external energy has lower intensity values at the desirable position of the active contour. For the external energy, we use a negative gradient norm of 3D ultrasound images preprocessed by 3D anisotropic diffusion [13]. This way we remove the texture information, preserve the contrast, and emphasize the boundaries [4]. To reduce the computational complexity, we discretize the initial contour Ct and precompute the B-splines basis function values in the matrix D. We minimize the energy function: Ft+1 (Ct+1 ) = − Ft+1 (T (Ct , P)) = − Ft+1 (DP) (5) Estruct (P) = − where Ft+1 denotes the feature-map of the image It+1 .
432
A. Myronenko, X. Song, and D.J. Sahn
LV contour tracking – – – –
Initialize the contour C1 in the first volume (Sec. 3.4) Uniformly allocate the mesh of B-splines control points P over the image domain. Precompute B-splines matrices B and D. For t = 2 : T (through all volumes): • Minimize the objective function (Eq. 6) with respect to control points positions Pt , using the steepest descent method (Sec. 3.3). • New contour position is Ct = DPt Fig. 2. Pseudo-code of LV contour tracking in 3D ultrasound image sequence
3.3
Optimization
To minimize the energy function in Eq. 1, we rewrite it as E(P) = (It+1 (BP) − It )2 − λ Ft+1 (DP)
(6)
Taking the gradient of the function with respect to P, we obtain ∂E(P) = 2BT [It+1 (T (X, P)) − It (X)]∇It+1 − λDT ∇Ft+1 (T (X, P)) ∂P
(7)
We use a steepest descent optimization method to minimize the energy function. 3.4
Contour Initialization
Our approach requires an initial contour segmentation Ct . We choose an approach similar to Bardinet et al. [11] to parametrize the LV contour. We start from two parametric half-ellipsoids, parametrized by 9 parameters: 6 parameters for rigid transformation and 3 for scaling. We use the active contour approach, similar to Sec. 3.2, to roughly prealign two half-ellipsoids to the endocardium and epicardium. Then we fix the rigid motion and scaling parameters, parametrize both half-ellipsoids by a single FFD, and refine the segmentation. The FFD, rigid and scaling parameters define the parametric form of the initial contour Ct . The contour is parametric, because parametric objects remain parametric after FFD. The FFD used for initial segmentation is not related to the FFD parametrization used in our tracking method. Once we obtain the initial segmentation Ct , all its parameters remain constant during tracking. We use another set of B-spline control points to parametrize transformation T . Furthermore, we discretize Ct to reduce computational complexity by precomputing the corresponding B-spline basis function values in the matrix D. We summarize our LV tracking method in Fig. 2.
4
Results
We test the method on a set of 3D ultrasound sequences (3D+T) from openchest porcine hearts. The scans are acquired using Philips Sono 7500 with EKG
LV Motion Tracking from 3D Echocardiography Half-ellipsoids.
FFD refinement.
433
Initial countour segmentation (Ct ).
Fig. 3. Boundaries initialization procedure. The first column shows the segmentation results using two half-ellipsoids. Three perpendicular 2D cross planes are demonstrated. The second column shows the FFD refinement of the segmentation. The last image shows the initial contour Ct .
gating. We use detected post-scan converted images with echo envelop signals, represented in the Cartesian coordinates. The spatial resolution is 160×144×192 voxels. Ten different scans (3D+T) are taken from a single pig. Each scan consists of ten 3D volumes. We run the experiments for each of the scans. We implement the algorithm in Matlab, and test it on a Pentium4 CPU 3GHz with 4GB RAM. For the anisotropic diffusion parameters, we use exponential function with K = 0.2 and limit the number of iterations to 20 [13]. We put the FFD control points uniformly with 10 voxels spacing, which makes it a 16 × 15 × 20 mesh. We use the trade-off parameter λ = 0.1, which we found empirically to give the best performance. On average, the tracking algorithm between two volumes converges in less than 10 minutes and requires around 150 iterations. We show the initialization procedure for a particular scan in Fig. 3. First, we use the active contour approach described in Sec. 3.4 to segment the images using 2 half-ellipsoids parametrized by the rigid motion. Second, we use FFD to refine the contour segmentation Ct . Having the initial contour, we track the LV position through the sequence. We show the recovered LV motion in Fig. 4. For validation, we define 12 points on endocardium and epicardium and manually track them across the sequence. We ask three interpreters to track the points manually, given its initial locations. We provide them with a tool to visualize and select the points for the arbitrary 2D cross section of 3D volume. The cross-expert variation is 0.6315 voxels, averaged over time. We use mean values of obtained points’ trajectories as a ground truth of the points motion. We compare our automated tracking against ground truth: The tracking error (averaged over all scans, volumes and number of speckles) is 1.0311 ± 0.6265 voxel.
434
A. Myronenko, X. Song, and D.J. Sahn
volume 1
volume 2
volume 3
volume 4
volume 5
volume 6
volume 7
volume 8
volume 9
volume 10
Fig. 4. The contour position found during the cardiac cycle (10 consecutive volumes). LV achieves the maximum contraction at volume 3, then the LV dilates (diastolic phase) up to the volume 9 and starts contracting again (systolic phase). Table 1. Mean-square-error (MSE) of tracking results using combined objective function, only textural, only structural or without registration Objective function Etext + λEstruct Etext Estruct without registration
MSE (voxels) 1.0311 ± 0.6265 3.2141 ± 1.3442 10.2571 ± 4.5941 17.2331 ± 5.1224
We also consider two extreme cases when λ = 0 or λ = inf, which correspond to textural information only or structural information only cases. For instance, with λ = 0, we optimize only the Etext objective function. We register all images and use the transformation to propagate the motion of the initial contour Ct . With λ = inf, we consequently segment the images starting form the initial contour Ct . We show the mean square error (MSE) of tracking results using only textural, only structural, both objective functions and without any registration (initial positions remain constant) in Table 1. Our experiments show that accurate motion tracking in 3D ultrasound benefits from using both textural and structural information.
5
Discussion and Conclusions
We present a method for LV motion tracking in 3D ultrasound. The method combines textural and structural information to overcome the low 3D ultrasound image quality. We use non-rigid image registration, parametrized by FFD,
LV Motion Tracking from 3D Echocardiography
435
to take advantage of textural information. We also use model-based active contour segmentation to take advantage of structural information. We show the optimization algorithm for the joint non-rigid registration and segmentation. Our non-rigid image registration implementation uses a sum of squared differences similarity function, which implies a Gaussian noise assumption between the images. Other similarity functions, such as. those based on the Rayleigh multiplicative noise model, can also be used within this framework. We use 3D ultrasound B-mode images. The alternative of using radio frequency data may provide extra information in the axial direction, but the increased computation load may be prohibitive for real applications. Our method has a low computational complexity due to the FFD parametrization with precomputed basis function values for registration and segmentation. We demonstrate the performance of the method on 10 different 3D echocardiography scans from open chest porcine hearts; the method shows accurate and robust performance.
References 1. Lang, M., Mor-Avi, V., Sugeng, L., Nieman, P., Sahn, J.: Three-dimensional echocardiography. Jour. of the Am. Coll. of Cardiology 48(10), 2053–2069 (2006) 2. Papademetris, X., Sinusas, A.J., Dione, D.P., Constable, R.T., Duncan, J.S.: Estimation of 3D left ventricular deformation from medical images using biomechanical models. IEEE Trans. Med. Imaging 21(7), 786–800 (2002) 3. Noble, A.J., Boukerroui, D.: Ultrasound image segmentation: a survey. IEEE Transactions on medical imaging 25(8), 987–1010 (2006) 4. Montagnat, J., Sermesant, M., Delingette, H., Malandain, G., Ayache, N.: Anisotropic filtering for model-based segmentation of 4D cylindrical echocardiographic images. PRL- S. Issue on Ultr. Im. Proc. and Anal. 24(4-5), 815–828 (2003) 5. Sanchez-Ortiz, G.I., Declerck, J., Mulet-Parada, M., Noble, J.A.: Automating 3D echocardiographic image analysis. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 687–696. Springer, Heidelberg (2000) 6. Chen, X., Xie, H., Erkamp, R., Kim, K., Jia, C., Rubin, J.M., O’Donnell, M.: 3-D correlation-based speckle tracking. Ultrasonic Imaging 27, 21–36 (2005) 7. Song, X., Myronenko, A., Sahn, D.J.: Speckle tracking in 3D echocardiography with motion coherence. In: Comp. Vision and Patt. Recognition (CVPR) (2007) 8. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Inter. J. Computer Vision 1(4), 321–331 (1988) 9. Sederberg, T.W., Parry, S.R.: Free-form deformation of solid geometric models. In: SIGGRAPH, pp. 151–160 (1986) 10. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans. Image Processing 18(8), 712–721 (1999) 11. Bardinet, E., Cohen, L., Ayache, N.: Tracking and motion analysis of the left ventricle with deformable superquadrics. Medical Image Analysis 1(2) (1996) 12. Makela, T., Clarysse, P., Sipila, O., Pauna, N., Pham, Q.C., Katila, T., Magnin, I.: A review of cardiac im. reg. meth. IEEE Tr. on Med.Im. 21(9), 1011–1021 (2002) 13. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Machine Intell. 12(7), 629–639 (1990)
A Novel 3D Multi-scale Lineness Filter for Vessel Detection H.E. Bennink1 , H.C. van Assen1 , G.J. Streekstra2 , R. ter Wee2 , J.A.E. Spaan2 , and B.M. ter Haar Romeny1 1
Biomedical Image Analysis, Faculty of Biomedical Engineering, Eindhoven University of Technology, The Netherlands 2 Department of Medical Physics, Academic Medical Center, University of Amsterdam, The Netherlands
Abstract. The branching pattern and geometry of coronary microvessels are of high interest to understand and model the blood flow distribution and the processes of contrast invasion, ischemic changes and repair in the heart in detail. Analysis is performed on high resolution, 3D volumes of the arterial microvasculature of entire goat hearts, which are acquired with an imaging cryomicrotome. Multi-scale vessel detection is an important step required for a detailed quantitative analysis of the coronary microvasculature. Based on visual inspection, the derived lineness filter shows promising results on real data and digital phantoms, on the way towards accurate computerized reconstructions of entire coronary trees. The novel lineness filter exploits the local first and second order multiscale derivatives in order to give an intensity-independent response to line centers and to suppress unwanted responses to steep edges.
1
Introduction
It is necessary to improve the insight in the detailed structure of the coronary microvascular bed, as this will lead to a better understanding of e.g. the causes of local and borderline ischemia, local perfusion, and vessel remodeling. However, conventional 3D imaging does not have the required micrometer resolution. The imaging cryomicrotome is a unique instrument that allows 3D imaging of large, frozen tissue samples, with a resolution of several micrometers [1]. In contrast with microscopy, the imaging cryomicrotome does not image the slices, but it images the cutting surface of the remaining bulk. Because the deformation is negligible, microtomy of an entire heart yields a continuous 3D stack that contains a few billion voxels and millions of distinguishable microvessels. Due to scattering and the large intensity differences in the imaging cryomicrotome data, it is impossible to segment the smallest vascular trees using only a simple threshold. As a result of scattering, the voxels close to, but outside large vessels are much brighter than the voxels inside the smaller vessels. The goal of the lineness filter is to enhance all line-like structure in the imaging cryomicrotome data and to suppress noise in such a way that the output can yet be segmented by a threshold [Figure 1]. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 436–443, 2007. Springer-Verlag Berlin Heidelberg 2007
A Novel 3D Multi-scale Lineness Filter for Vessel Detection
(a)
437
(b)
Fig. 1. (a) An inverse Maximum Intensity Projection (MIP) of 100 imaging cryomicrotome slices, showing a nutrient artery branching into smaller arterioles on the outer surface of the myocardium. The dimensions of this (sub)volume are 8×8×4 mm3 . (b) The response of the novel lineness filter at a scale of 60 m.
2 2.1
Method Local Properties of Linear Structure
Whether structure is linear or not, depends on the scale at which it is observed. Therefore, images that contain objects of different sizes need multi-scale structure filters. Imaging cryomicrotome stacks of microvascular trees are a good example of multi-scale structure, because they contain line-like structure in a broad range of diameters. Using the concepts of linear scale-space theory [2], differentiation of image L(x) is defined as a convolution with derivatives of Gaussians, ∂ ∂ L(x, σ) = σ γ L(x) ∗ G(x, σ), ∂x ∂x
x ∈ RD , σ ∈ R+ , γ ∈ R0+ .
(1)
Here, the D-dimensional Gaussian G(x, σ) is defined as G(x, σ) =
|x|2 1 − 2σ2 e . (2πσ 2 )D/2
(2)
The parameter γ was introduced by Lindeberg [3] to define a family of normalized derivatives. Normalization is particularly important for a justified comparison of the response of differential operators at multiple scales. In fact, γ = 1 causes the unit of the derivatives to be just ‘cd’ — candela, the SI base unit of luminous intensity — instead of ‘cd m−n ’, where n denotes the order of the derivative. Using these differential operators, the neighborhood of an image point L(x) can be described by its local derivatives. On the center line of a line-like object, the first order derivatives are close to zero, because this point is an extremum. The second order information, however, can be used to estimate local orientation.
438
H.E. Bennink et al.
Table 1. Second order structure in relation to the eigenvalues of the Hessian. ‘0’ indicates a relatively small eigenvalue, ‘+’ a significant positive value, and ‘-’ a significant negative value. In the 2D case, the first column and second and third rows should be discarded. λ1 0 + + +
λ2 0 0 0 + +
λ3 Structure 0 No noticeable structure Plate-like, bright 0 Plate-like, dark 0 Line-like, bright 0 Line-like, dark 0 Blob-like, bright Blob-like, dark +
This can be done via eigenvalue analysis of the Hessian matrix (H), which is defined as the Jacobian matrix of the gradient of a function, ⎞ ⎛ ∂2 f ∂2f ∂2f ∂x1 ∂x2 · · · ∂x1 ∂xn ∂x21 ⎟ ⎜ ∂2 f ∂2f ∂2f ⎟ ⎜ ⎜ ∂x2 ∂x1 ∂x22 · · · ∂x2 ∂xn ⎟ . (3) H[f (x1 , x2 , . . . , xn )] = ⎜ . .. .. ⎟ .. ⎟ ⎜ . . . . . ⎠ ⎝ ∂2 f ∂2f ∂2f · · · 2 ∂xn ∂x1 ∂xn ∂x2 ∂x n
We will discuss the eigenvalues assuming they are sorted with respect to their absolute values: |λ1 | ≥ |λ2 | ≥ |λ3 |, λn ∈ R (4) When λ1 , λ2 , and λ3 share the same sign, the ratios between these eigenvalues indicate either a plate-like, line-like or blob-like structure, and their sign indicates if the structure is either brighter or darker in relation to its surroundings [4,5]. Table 1 summarizes the relations between λ1 , λ2 , and λ3 for different structures. When the eigenvalues all have a relatively low absolute value, the image has no evident second order structure at the specified position in 4D scale-space. Figure 2 uses ellipsoidal glyphs to show a geometrical interpretation of the ˆ 2, κ ˆ 3 ) form an three second order structures. The normalized eigenvectors (ˆ κ1 , κ orthonormal basis that is oriented to the structure. 2.2
Second Order Gaussian Line Filters
As linear structure has orientation, line filters must be rotatable in order to find the best response. Using the Hessian matrix H, the second order Gaussian derivative at image point L(x) can be calculated in any direction in 3D space, given by a vector v: ˆT H[L(x)]ˆ v, ∇2v L(x) = v ˆ is the normalized version of v, i.e. v ˆ= where v tional Laplacian operator, oriented to v.
v ∈ R3 , v |v| ,
(5)
and ∇2v denotes the direc-
A Novel 3D Multi-scale Lineness Filter for Vessel Detection
439
λ1 λ2,3 λ2,3
λ1,2 λ3
λ1,2,3 λ1,2,3
λ1,2
λ1,2,3 Fig. 2. Isosurface representations of plate-like, line-like, and blob-like structure, according to Table 1. The length and direction of the arrows represent respectively the eigenvalues and eigenvectors of the Hessian.
The term ‘steerable filter’ is frequently used to describe a class of filters of arbitrary orientation synthesized as a linear combination of a set of basis filters ˆ 1, [6]. The 2D second order Gaussian derivative in the direction of κ ∇2κˆ 1 G(x),
ˆ 1 ∈ R2 , κ
(6)
is a good 2D line filter, since the curvature will be large in that direction. Here, ˆ 1 , is the normalized eigenvector of H with the largest absolute eigenvalue (λ1 ). κ ˆ 1 and κ ˆ 2 , because both vectors The 3D equivalent of this filter involves both κ ˆ 3 points in the direction of the form a plane perpendicular to the line, while κ linear structure [Figure 2]: (∇2κˆ 1 + ∇2κˆ 2 )G(x),
ˆ n ∈ R3 κ
(7)
Although this filter is frequently used, Jacob and Unser [7] showed that there is a better alternative without any extra cost, using Canny-like criteria [8]. The general form for a second order line filter is defined as c(∇2 − (α + 1)∇2κˆ 3 )G(x),
α ∈ R, c ∈ R+ ,
(8)
ˆ 3 is the feature orientation, and c is where α is the parameter to be optimized, κ a normalization factor. It turns out that α = 23 optimizes the criteria. Note that α = 0 gives the ‘default’ filter as in (7). 2.3
A Novel Lineness Filter
Using Gaussian derivatives and the knowledge of the Hessian eigenvalues and the Canny-optimized filter, a novel lineness filter is constructed. This filter is intensity independent and suppresses the undesired response to strong edges as well — two issues where other line filters often fail, but that are particularly important for smooth enhancement and segmentation of line centers. Table 2 gives an overview of the three basic components of this filter.
440
H.E. Bennink et al.
Table 2. Components of the novel lineness filter. Because of the normalization factor σ γ in (1), the unit of these measures depends on the value of γ. Measure |λ2 | |λ1 |
Unit (γ = 1) Description 1
∇2 − 53 ∇2κˆ 3 L(x)
cd
|∇L(x)|
cd
Deviation from plate-like structure, equals 0 in case of a perfect plate [Figure 2]. Response to the Canny-optimized second order line filter (8). Gradient magnitude, equals 0 in mathematical extrema, thus in the center of the structure.
Note that some of the measures in this table have the unit ‘cd’, i.e. these measures are dependent on the signal intensity. This dependency is undesired, because bright linear structure should not be considered ‘more linear’ than less bright linear structure. This problem can be overcome in a simple fashion by division by L(x), which requires a low threshold to prevent division by nearlyzero background values. In modalities like MRA or CTA, any intensity bias should be corrected beforehand. The gradient magnitude, |∇L(x)|, is hardly used in common line filters, although it is a fundamental property of structural centers that the gradient vanishes This property appears to be very effective regarding the suppression of undesired responses that most line filters have to strong edges. According to the components in table 2, ‘lineness’ is defined as 2 c1 1 5 2 λ2 2 ∇ − |∇L(x)| , (9) L(x) − c ∇ L(x) = 2 λ21 L(x) 3 κˆ 3 where c1 ∈ R0+ denotes the sensitivity to the roundness of line profiles and c2 ∈ R0+ is the edge suppression factor. c1 = 1 and c2 = 1.5 give satisfying results on the imaging cryomicrotome data. The multi-scale response is given by the maximum response over a certain range of scales, and because of the normalization factor σ γ (1), this maximal response is given at a scale σ that is √ 2 times the vessel radius, assuming the vessel has a Gaussian profile [3].
3 3.1
Results Results on Digital Line Phantoms
In order to compare the lineness filter L(x) to other filters, four general types of phantoms are made that deal with the following common issues; – – – –
additive, Gaussian distributed noise (type N); lines of a different scale for which the intensities differ (type S); curved lines (type C); bifurcations (type B).
A Novel 3D Multi-scale Lineness Filter for Vessel Detection
(a)
(b)
(c)
441
(d)
Fig. 3. (a) The medial slice of phantom N-64-G, containing a line with a Gaussian profile and noise with a standard deviation of 64. (b), (c), and (d) show the multiscale responses of respectively Frangi’s filter, the Canny-optimized filter, and the novel lineness filter to N-64-G (1 ≤ σ ≤ 8 pixels).
Because filter responses are also dependent on the line profile, both, a Gaussian profile (type G) and a box profile (type B) are used. Note that the profiles of the vessels in the imaging cryomicrotome data are neither Gaussian, nor boxed, but somewhat in between. Besides the two line profiles, each of the phantoms shows four variations in scale or intensity, resulting in a total of 32 line phantoms. All phantoms are filtered on four subsequent scales with the vessel-likeliness function introduced by Frangi [4] (type F), the Canny-optimized second order line filter [7] (type C), and the novel lineness filter (type L). The parameters for Frangi’s vessel-likeliness function are α = 0.5, β = 0.5, and c = 32 (the line intensity is 128), except for the type S phantom, where c is set to a quarter of the intensity of the thin line. The parameters for L(x) are set to c1 = 1 and c2 = 1.5, and all of the filters are subject to a threshold of 12. The type N phantoms, with Gaussian distributed noise, show that Frangi’s filter and L(x) are quite insensitive to this kind of noise, while the Cannyoptimized filter shows a significantly higher response to the noise [Figure 3]. Apart from the fact that Frangi’s vessel-likeliness function and the Cannyoptimized filter are intensity dependent, all filters perform quite comparably on the type S phantoms, on the condition that the line profile is Gaussian. Although, when the line profile shows steeper gradients, as in the box profile [Figure 4a], Frangi’s filter (type F) and the Canny-optimized filter (type C) both give undesired responses to the edges of thick lines [Figure 4b and c]. L(x) however, only responds to the center line [Figure 4d and e]. Curved lines that have a sinusoidal or spiraling twist are a difficult case in multi-scale line filtering. Like a rope or a DNA-strand, these objects appear as a twisted line on a small scale, but as a more or less straight line on a larger scale. This is a ‘problem’ of all multi-scale line filters, which is illustrated in Figure 5. Although bifurcations are part of networks of linear structure, a bifurcation is not linear by itself. Eigenvalue analysis of the Hessian matrix [Table 1] would indicate rather blob-like or plate-like structure instead of like-like structure, i.e. true line filters should not respond to bifurcations. Type B phantoms [Figure 6] confirm that the response of all line filters to a bifurcation is weak. 3.2
Results on Imaging Cryomicrotome Data
Figure 7 shows a comparison of Frangi’s vessel-likeliness function, the Cannyoptimized second order line filter, and the lineness filter on a scale of 60 m.
442
H.E. Bennink et al.
(a)
(b)
(c)
(d)
(e)
Fig. 4. (a) The medial slice of phantom S-32-B, containing a thin and a 16 pixel thick line, both having a box profile (type B). (b), (c), and (d) show the responses of respectively Frangi’s filter, the Canny-optimized filter, and the novel lineness filter to S32-B at σ = 4 pixels. (e) The multi-scale response of L(x) to S-32-B (1 ≤ σ ≤ 8 pixels). Note that the novel lineness filter does not respond at all at σ = 4 pixels.
(a)
(b)
(c)
(d)
Fig. 5. (a) Medial slice of phantom C-4-G, showing a curved line with a radius of 4 pixels and Gaussian profile. (b), (c), and (d) show the responses of respectively Frangi’s filter, the Canny-optimized filter, and the novel lineness filter at a scale of σ = 8 pixels. The curves of the phantom have almost disappeared on this scale.
(a)
(b)
(c)
(d)
Fig. 6. (a) Medial slice of phantom B-4-B, a bifurcation of lines with radii of 8 and 4 pixels, both having a box profile. (b), (c), and (d) responses of respectively Frangi’s filter, the Canny-optimized filter, and the novel lineness filter at a scale of σ = 4 pixels.
(a)
(b)
(c)
(d)
Fig. 7. (a) The volume shown in Figure 1a. (b), (c), and (d) show inverse MIPs of the responses of respectively Frangi’s vessel-likeliness function, the Canny-optimized second order line filter, and the lineness filter on a scale of 60 m. The parameters for the Frangi’s filter are α = 0.5, β = 0.5, and c = 25. The parameters for the lineness filter (9) are c1 = 1 and c2 = 1.5, and its threshold is set to L(x) > 12 (the maximum intensity is 255). Note that in (d) some vessels appear discontinuous because of the threshold or because their medial axis moves out of the imaged volume.
A Novel 3D Multi-scale Lineness Filter for Vessel Detection
443
Especially the undesired responses to large, bright vessels on smaller scales are significantly reduced using the novel lineness filter; note that both Frangi’s filter and the Canny-optimized filter give a high response to the large artery. Due to the threshold, however, some of the smaller vessels became discontinuous.
4
Discussion
The novel lineness filter, L(x), proves to get the best out of the cryomicrotome data. Because the values c1 = 1 and c2 = 1.5 are obtained more or less empirically, these could possibly be optimized to yield even better results. A drawback of L(x) that can be seen in the phantoms as well as in most of real images are the weak responses to bifurcations, which result in disconnected branches. However, this phenomenon is a ‘problem’ that occurs in any line filter. A high order filter or postprocessing step is required to correct for this problem. A second drawback of the filter is its multi-scale response to twisted lines, as shown in subsection 3.1. This type of structure is not very rare in microvascular images, because vessels in areas of angiogenesis show similar behavior. A solution to this (potential) problem may be found in the field of scale-space analysis. For example, edges and other extrema, such as line centers, can be traced down to smaller scales using scale-space signatures, as described in [2]. An implementation in C++ makes this method applicable in practice, even for the complete 20003-voxel volumes used here. Overall, the derived filter shows promising results on the way to accurate computerized reconstructions of entire coronary trees. Finally, accurate diameter measurements are necessary to complete the quantitative reconstruction of coronary arterial trees.
References 1. Spaan, J.A.E., ter Wee, R., van Teeffelen, J.W.G.E., Streekstra, G., Siebes, M., Kolyva, C., Vink, H., Fokkema, D.S., VanBavel, E.: Visualization of intramural coronary vasculature by an imaging cryomicrotome suggests compartmentalisation of myocardial perfusion areas. Med. Biol. Eng. Comput. 43(4), 431–435 (2005) 2. ter Haar Romeny, B.M.: Front-End Vision and Multi-Scale Image Analysis: MultiScale Computer Vision Theory and Applications, written in Mathematica, 1st edn., vol. 27. Kluwer Academic Publishers, Boston, MA (2003) 3. Lindeberg, T.: Edge detection and ridge detection with automatic scale selection. Int. J. Comput. Vision 30(2), 77–116 (1996) 4. Frangi, A.F.: Three-Dimensional Model-Based Analysis of Vascular and Cardiac Images. PhD thesis, University of Utrecht (2001) 5. Sato, Y., Tamura, S.: Detection and quantification of line and sheet structures in 3-D images. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 154–165. Springer, Heidelberg (2000) 6. Freeman, W.T., Adelson, H.A.: The design and use of steerable filters. IEEE T Pattern Anal. 13(9), 891–906 (1991) 7. Jacob, M., Unser, M.: Design of steerable filters for feature detection using Cannylike criteria. IEEE T Pattern Anal. 26(8), 1007–1019 (2004) 8. Canny, J.: A computational approach to edge detection. IEEE T Pattern Anal. 8(6), 679–698 (1986)
Live-Vessel: Extending Livewire for Simultaneous Extraction of Optimal Medial and Boundary Paths in Vascular Images Kelvin Poon1 , Ghassan Hamarneh2 , and Rafeef Abugharbieh1 1
Biomedical Signal and Image Computing Lab, University of British Columbia 2 Medical Image Analysis Lab, Simon Fraser University, Canada Abstract. This paper incorporates multiscale vesselness filtering into the Livewire framework to simultaneously compute optimal medial axes and boundaries in vascular images. To this end, we extend the existing 2D graph search to 3D space to optimize not only for spatial variables (x, y), but also for radius values r at each node. In addition, we minimize change for both scale and the smallest principle curvature and incorporate vessel boundary evidence in our optimization. When compared to two sets of DRIVE expert manual tracings, our proposed technique reduced the overall segmentation task time by 68.2%, had a similarity ratio of 0.772 (0.775 between manual), and was 98.2% reproducible.
1
Introduction
Segmentation is vital for medical image analysis, but this task is difficult because biological structures typically exhibit significant variations due to subject diversity and pathology. In addition, imaging introduces artifacts, noise, and inter-camera variability. While automatic segmentation schemes are typically preferred, they still require human validation. On the other hand, manual tracing is often considered accurate, but is time consuming and may suffer from operator error and inter- and intra-operator variability [1]. Due to these difficulties, semi-automated methods such as Livewire were introduced, offering high accuracy, efficiency, and reproducibility rates [2]. Livewire utilizes sparsely spaced user-defined seedpoints to efficiently compute the optimal contour in between using Dijkstra’s algorithm [3]. Delineation accuracy can be increased in noisy regions at the expense of segmentation speed by providing a more dense set of seedpoints. However, for 2D images such as angiograms and retinal images, gradient-based Livewire implementations become inefficient due to overlapping vessels, complex branching networks, and low vessel contrast in thin vessels. In previous work, angiography images were enhanced using ‘vesselness’ as an image feature [4,5]. These methods chose the ‘optimal’ scale σ such that maximal vesselness response for individual pixels is achieved, but this selection can be highly influenced by noise. Sofka et al used multiscale matched filters to determine vessel medials, and used edge measures to affect the confidence of medial nodes [6]. However, they gave no consideration to the shape of the vessel and did not find an optimal medial. The problem of finding minimal paths in N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 444–451, 2007. c Springer-Verlag Berlin Heidelberg 2007
Live-Vessel: Extending Livewire for Simultaneous Extraction
445
3D space was explored by Deschamps and Cohen [7], where a front propagation in (x, y, z) was implemented, along with a method to approximate the medial axes of tubular structures using level sets. However, this work does not focus on identifying the boundaries of tubular structures. In [8], Young et al employed the 3D vesselness filter of Frangi et al [4] and used a front-propagation based algorithm to segment the vessel and extract the medial axis. However, because maximal vesselness response was chosen for each pixel prior to segmentation, this solution is not optimal. In the ‘vessel crawlers’ approach [9], locally optimal σ was derived from the radius of the leading layer of the crawler. In [10], the locations and sphere radii of medial nodes were optimized to segment vascular structures. However, this method relied on near-uniform vessel intensities for optimization, and did not use vesselness features, which provide important magnitude and vessel direction information. In [11], Wink et al applied Dijkstra’s algorithm to find the optimal medial and corresponding scale values based on vesselness. In this paper, we improve on their work by (i) adopting and extending the Livewire framework to compute the optimal path in (x,y,r) space; (ii) including optimization costs to favour gradual change in both scale and the smallest principal curvature direction along the path to mitigate noise; and (iii) incorporating vessel boundary evidence from image pixels at a scale-dependent distance away from the medial as part of the optimal graph search. Furthermore, our method allows vessels to be segmentated with less seedpoints than traditional Livewire, and can simultaneously extract the medials and boundary points of vessels with radii down to 0.5-pixel. The proposed technique was applied to the Digital Retinal Images for Vessel Extraction (DRIVE) archives [12]. Our results were validated against manual segmentation.
2
Method
Our proposed framework is based on the original Livewire technique [2], where sparse seedpoints along an object boundary are input by a user and optimal contours connecting these points are found using Dijkstra’s algorithm. This algorithm is a determinisic, exhaustive graph search that always finds the globally optimal path. Here, the medial axis of a blood vessel is defined as the optimal spatial coordinates (x, y) and radius values r between two points in 3D space (x, y, r). This optimization is achieved by minimizing the cumulative cost function at each (x, y, r) node. The incremental cost function, which describes the cost from node q = (x, y, r) to a neighboring node p = (x , y , r ), is defined as: Cost(q, p) = w1 CV (p) + w2 CEv (q, p) + w3 CIe (p) + w4 CR (q, p) + w5 CS (q, p) (1) CV is associated with Frangi’s proposed multiscale vesselness filter [4], CEv refers to the vessel direction change between q and p, and CIe is a measure of the fitness of a medial node by assessing the edge evidence of equidistant pixels on either side. Also, internal smoothness cost terms (CR and CS ) are used to penalize paths in which the radius r or the spatial variables (x, y) fluctuate rapidly. These cost terms are explained in Section 2.2, and are normalized to lie in the range [0,1].
446
2.1
K. Poon, G. Hamarneh, and R. Abugharbieh
Livewire Framework in (x, y, r) Space: Live-Vessel
We extend the original Livewire framework [2] to segment vascular structures. Rather than delineating a vessel by guiding the Livewire contour along boundaries, our method guides the Livewire along the vessel medial. This reduces the amount of seedpoints required, since a single optimal path defines three contours (the medial and two boundaries on either side). In 2D Livewire implementations, given an image I(a), where a = (x, y), the only local path choices from a are to one of eight neighboring pixels b = (x , y ). In our proposed method, we optimize the medial with respect to three variables: the two spatial variables x and y, and the vessel radius variable r. Accordingly, we extend the traditional Livewire graph search from 2D to 3D. Since the medial axis in reality is 2D and cannot connect q = (x, y, ri ) to p = (x, y, rj ) (i = j, i.e. only a single radius value can be associated with each medial node), our 3D graph search is restricted in this manner. To accommodate for vessels that dilate and constrict rapidly, the radius value can change to any other set of values (albeit penalized differently, as described in Section 2.2). This increases the computational complexity, but the optimal medial path and optimal vessel thickness are guaranteed. Since the paths from the seedpoint to all nodes along the optimal path are also optimal, our proposed method only needs to record the location of the previous optimal node in its path. By following this node order, the optimal path from the seedpoint to any other point is quickly determined. To perform the actual segmentation, the medial path is first projected back onto the x,y plane. Each medial node has an optimal radius value r, and its preceding and succeeding nodes form a direction vector. These elements are used to determine the two vessel boundary points on either side of the medial node. Repeating this for all medial nodes, the vessel is segmented. 2.2
Live-Vessel External and Internal Costs
The external cost terms in (1) are derived from the vessel filter (CV (p)) and from edge measures (CIe (p)). The internal cost terms are included to favour gradual changes in vessel direction (CEv (q, p)) and radius (CR (q, p)), as well as a spatially smooth medial (CS (q, p)). Vesselness. To detect vascular structures in an image, we employ the multiscale vessel enhancement filter proposed by Frangi et al [4] for 2D images. The eigenvalues |λ1 | ≤ |λ2 | of the Hessian matrix Hσ of a Gaussian smoothed image (Iσ = I ∗ G, where G has variance σ 2 ) are used to calculate vesselness as follows: 1 if λ2 > 0 . (2) CV (q) = Vσ (λ1 , λ2 ) = R2B T2 1 − exp − 2c2 otherwise 1 − exp − 2β 2 RB = λ1 /λ2 represents the eccentricity of a second order ellipse and T = λ21 + λ22 . β and c affect filter sensitivity and have values 0.5 and 0.3 respectively. These values are similar to those used in other vessel filter studies [4,11].
Live-Vessel: Extending Livewire for Simultaneous Extraction
(a)
(b)
447
(c)
Fig. 1. (a) Original retinal image, cropped and magnified for clarity. (b) Maximal response vesselness filter output, scaled for contrast. Note the erroneous values at areas of bifurcation, vessel overlap, and high levels of noise. (c) Segmentation result with our proposed method.
While choosing the highest filter response for each pixel gives an estimated vessel size at each pixel location, this method however is susceptible to noise and does not enforce spatial continuity (Figure 1(b)). Our proposed method, in contrast, stores the results over a range of scales separately1 . By combining these results with other cost factors, we place restrictions on the relationship of neighboring nodes; hence, the proposed method is robust to noise and poor image quality (Figure 1(c)). Vessel Direction Consistency. The eigenvector Ev(x, y, r) of Hσ , corresponding to λ1 , points in the direction of the minimum principal curvature, which estimates the vessel direction [4]. By choosing paths that minimize the change in direction of Ev, we can mitigate local noise that occurs in vascular images. We therefore incorporate the cost term CEv (q, p) from (1) and define it as: Ev(p) · Ev(q) 2 , CEv (q, p) = arccos (3) π |Ev(p)| |Ev(q)| where q = (x, y, σ1 ), p = (x , y , σ2 ), are node coordinates. Since Ev(x, y, r) points arbitrarily in either directions of a bidirectional vessel (± 180◦ ), we take the absolute value in (3) to obtain the smaller angle between the vectors. Image Evidence Using Edge Detection. Our proposed method also uses edge evidence to favour medial nodes that are located at the centre of a vessel cross-section with radius r. To compute this, we employ gradient, Canny, and Laplacian of Gaussian (LoG) edge detection to estimate vessel boundaries and average their responses into R(x, y). We chose these filters because Canny and LoG filters are less sensitive to noise, whereas the gradient magnitude filter does not involve pre-smoothing thus complementing the Canny and LoG filters by detecting weak structural edges. 1
Defining boundaries as zero crossings of the Laplacian of a Gaussian intensity profile across the vessel yields σ = r, where r = vessel radius. This is derived by equating the second derivative of the Gaussian kernel to zero.
448
K. Poon, G. Hamarneh, and R. Abugharbieh
(a)
(b)
(c)
(d)
Fig. 2. (a) A node’s vessel direction is denoted by the thick arrow, and the evenly spaced points along the bidirectional arrows are tested for edge detection response. (b) Image evidence cost function CS (q, p) of a retinal image at radius r = 1. Medial nodes of vessels with this radius exhibit the lowest cost. Larger vessels’ edges are faintly detected, but their medials are not. (c)-(d) Cost function CS (q, p) of the same retinal image at r = 3,4 respectively. Medials of larger vessels exhibit the lowest cost. Medials of smaller vessels are no longer detected.
For each q = (x, y, r) node in the image, we combine the vesselness direction Ev(x, y, r) and R(x, y) to define another measure of node ‘medialness’. By finding the unit vector that is normal to Ev in the (x, y) plane and scaling it by r, we can determine two locations at which the vessel wall should be located. Then, our method retrieves the corresponding R(x, y) at these points and at adjacent points PR = {(xi , yi ); i = 1, 2...N } (N total points to analyze, including both sides) in parallel directions to Ev (Figure 2a). The image evidence cost CIe (q) in (1) is N then defined as CIe (q) = 1 − (1/n) i=1 R(PR ) to average a potentially noisy response. This cost term is minimized for a medial node that has equidistant (distance=r) vessel boundaries, which is expected for vessel medial nodes. Spatial and Radius Smoothness Constraints. Our proposed approach also imposes a small constant value cost for each additional pixel added to the path (connecting points q = (x, y, r) to p = (x , y , r )), which accumulates during the graph search operation. This cost, CS (q, p) in (1), is proportional to (x − x )2 + (y − y )2 . By encouraging the medial axis to be shorter, contour jaggedness is avoided. Similarly, we penalize medial paths with rapidly changing radius values for two reasons. Firstly, the vesselness filter is noise sensitive, and estimating the radius based solely on the filter output is unreliable. Secondly, vessel radii tend to not change rapidly unless branching occurs. By adding this cost CR (q, p) = |r − r | /(rmax − rmin ) to (1), the vessel width in the segmentation result is rendered smoother with gradually changing radius values. Here, rmax and rmin are the maximum and minimum values r can take.
3
Results and Discussion
Our proposed method was validated by segmenting images from the DRIVE database [12]. To demonstrate its capabilities, our method’s performance
Live-Vessel: Extending Livewire for Simultaneous Extraction
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
449
Fig. 3. (a)-(c) Original images, cropped for illustrative purposes. (d)-(f) Segmentation masks using Live-Vessel. (g) ‘Live’ aspect of Live-Vessel. A seedpoint is input by the user (circle), and optimal paths depending on the cursor location are displayed (two shown). (h) Another seedpoint locks the first segment, and this process is repeated until the vessel is segmented. (i) Resulting mask from using 3 seedpoints.
during these tasks was quantified based on three recommended criteria for semiautomatic segmentation [13]: reproducibility, accuracy, and efficiency. Sensitivity of our method to parameter setting was also analyzed. We were able to segment vessels of varying widths, down to one pixel-wide vessels (radius r = 0.5). Qualitative result examples (showing 140×140 pixels from the original 565×584 images) are illustrated in Figure 3. Reproducibility was measured by performing several segmentation trials of the same task, which differed in seedpoint selection. Since reproducibility does not compare segmentations to an absolute truth, we subjected the results of these trials to pairwise Dice similarity tests. We found the reproducibility rate to be 0.987, 0.979, and 0.982 for our test images (Figure 3). These rates are high and consistent with previous Livewire validations, reflecting that the locations of sparse seedpoints do not greatly impact the optimal contour in between. Accuracy of a segmentation is normally quantified by analyzing the percentage of false positives and false negatives of a segmentation compared to the golden truth. However, in retinal vessel segmentation, blood vessel boundaries are oftentimes unclear and thin vessels are perceived differently by different observers.
450
K. Poon, G. Hamarneh, and R. Abugharbieh
Table 1. Summary of the average accuracy and efficiency results of our proposed Live-Vessel compared to manual traces (image size of 565×584 pixels). Accuracy was measured as a Dice similiarity to each image’s two DRIVE expert segmentations. Efficiency was measured as the reduction in manual task time needed to generate the mask. Results shown are averages over a span of three trials.
Image1 Image2 Image3
Image1 Image2 Image3
Segmentation Accuracy Trace1 vs Trace2 Live-Vessel vs Trace1 Live-Vessel vs Trace2 0.762 0.769 0.762 0.766 0.773 0.791 0.799 0.769 0.772 Segmentation Efficiency Tracing (hh:mm:ss) Live-Vessel (hh:mm:ss) Time reduction (%) 01:47:32 00:31:43 70.3 01:53:00 00:33:58 69.9 01:28:12 00:31:12 64.5
Therefore, we calculated the similarity to each manual tracing and compared that to how similar the manual tracings are to each other (Table 1). We found our results to be similar to both manual segmentations and in one case, even exceeded the similarity between the manual tracings themselves. Efficiency was measured as the reduction in the manual segmentation task time when using our proposed method (Table 1). We manually segmentated each image using a simple paint tool, and the task times were similar to those reported in [12]. As expected, Live-Vessel, similar to Livewire, drastically reduced the manual task time required. Sensitivity analysis was done to determine the change in accuracy while varying each weight value. Our implementation by default uses equal weighting (w1..5 = 1) for each term in (1). We found that varying each weight by ±50% did not change the accuracy by more than 5.2% for our test images.
4
Conclusions
This paper introduced a novel semi-automated method for segmentation of vascular images. The method incorporates multiscale vesselness filtering into the conventional Livewire framework to efficiently compute optimal medial axes. Along a medial, we simultaneously optimize for the spatial variables (x, y) as well as the vessel thickness variable r, extending the traditional 2D graph search to 3D (x, y, r). In addition, we also optimize for vessel direction from smallest principle curvature direction and boundary evidence at a scale-dependent distance away from the boundary. When applied to 2D retinal images, our proposed method had a high reproducibility rate and significantly reduced segmentation task time compared to manual tracing. Also, accuracy was comparable to manual tracing. Our future work will focus on reducing the computation time required for the 3D graph search to increase efficiency. By reducing computation time, it
Live-Vessel: Extending Livewire for Simultaneous Extraction
451
will enable our method to be extended even further to (x, y, z, r) for 3D vessel segmentation and still support user-interaction. Also, automatic determination of suitable seedpoint candidates along the medial is currently being investigated.
References 1. Rajapakse, J., Kruggel, F.: Segmentation of MR images with intensity inhomogeneities. Image and Vision Computing 16, 165–180 (1998) 2. Barrett, W., Mortensen, E.: Interactive live-wire boundary extraction. Medical Image Analysis 1, 331–341 (1997) 3. Dijkstra, E.: A note on two problems in connexion with graphs. Numerical Mathematik 1, 269–270 (1959) 4. Frangi, A., Niessen, W., Vincken, K., Viergever, M.: Multiscale vessel enhancement filtering. In: Wells, W.M., Colchester, A.C.F., Delp, S.L. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 130–137. Springer, Heidelberg (1998) 5. Krissian, K., Malandain, G., Ayache, N., Vaillant, R., Trousset, Y.: Model based multiscale detection of 3D vessels. In: IEEE CVPR, pp. 722–727. IEEE Computer Society Press, Los Alamitos (1998) 6. Sofka, M., Stewart, C.: Retinal vessel centerline extraction using multiscale matched filters, confidence and edge measures. IEEE TMI 25, 1531–1546 (2006) 7. Deschamps, T., Cohen, L.: Fast extraction of minimal paths in 3D images and applications to virtual endoscopy. Medical Image Analysis 5, 281–299 (2001) 8. Young, S., Movassaghi, B., Weese, J., Rasche, V.: 3D vessel axis extraction using 2D calibrated x-ray projections for coronary modeling. SPIE Medical Imaging, 1491–1498 (2003) 9. McIntosh, C., Hamarneh, G.: Vessel crawlers: 3D physically-based deformable organisms for vasculature segmentation and analysis. In: IEEE CVPR, pp. 1084– 1091. IEEE Computer Society Press, Los Alamitos (2006) 10. Li, H., Yezzi, A.: Vessels as 4D curves: Global minimal 4D paths to 3D tubular structure extraction. In: IEEE Computer Society Workshop on MMBIA, IEEE Computer Society Press, Los Alamitos (2006) 11. Wink, O., Niessen, W., Viergever, M.: Multiscale vessel tracking. IEEE TMI 23, 130–133 (2004) 12. Niemeijer, M., Staal, J., van Ginneken, B., Loog, M., Abramoff, M.: Comparative study of retinal vessel segmentation methods on a new publicly available database. SPIE Medical Imaging, 648–656 (2004) 13. Olabarriaga, S., Smeulders, A.: Interaction in the segmentation of medical images: A survey. Medical Image Analysis 5, 127–142 (2001)
A Point-Wise Quantification of Asymmetry Using Deformation Fields: Application to the Study of the Crouzon Mouse Model ´ Hildur Olafsd´ ottir1,2 , Stephanie Lanche1,2,3 , Tron A. Darvann2 , 2,4 Nuno V. Hermann , Rasmus Larsen1, Bjarne K. Ersbøll1 , Estanislao Oubel5 , Alejandro F. Frangi5 , Per Larsen2 , Chad A. Perlyn6 , Gillian M. Morriss-Kay7, and Sven Kreiborg2,4,8 1
Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark 2 3D-Laboratory, School of Dentistry, University of Copenhagen; Copenhagen University Hospital; Informatics and Mathematical Modelling, Technical University of Denmark, Copenhagen, Denmark 3 Ecole Sup´erieure de Chimie Physique Electronique de Lyon (ESCPE Lyon), France 4 Department of Pediatric Dentistry and Clinical Genetics, School of Dentistry, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark 5 Computational Imaging Lab, Department of Technology - D.326, Pompeu Fabra University, Barcelona, Spain 6 Division of Plastic Surgery, Washington University School of Medicine, St. Louis, MO, USA 7 Department of Physiology, Anatomy and Genetics, Oxford University, Oxford, UK 8 Department of Clinical Genetics, The Juliane Marie Centre, Copenhagen University Hospital, Copenhagen, Denmark
Abstract. This paper introduces a novel approach to quantify asymmetry in each point of a surface. The measure is based on analysing displacement vectors resulting from nonrigid image registration. A symmetric atlas, generated from control subjects is registered to a given subject image. A comparison of the resulting displacement vectors on the left and right side of the symmetry plane, gives a point-wise measure of asymmetry. The asymmetry measure was applied to the study of Crouzon syndrome using Micro CT scans of genetically modified mice. Crouzon syndrome is characterised by the premature fusion of cranial sutures, which gives rise to a highly asymmetric growth. Quantification and localisation of this asymmetry is of high value with respect to surgery planning and treatment evaluation. Using the proposed method, asymmetry was calculated in each point of the surface of Crouzon mice and wild-type mice (controls). Asymmetry appeared in similar regions for the two groups but the Crouzon mice were found significantly more asymmetric. The localisation ability of the method was in good agreement with ratings from a clinical expert. Validating the quantification ability is a less trivial task due to the lack of a gold standard. Nevertheless, a comparison with a different, but less accurate measure of asymmetry revealed good correlation. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 452–459, 2007. c Springer-Verlag Berlin Heidelberg 2007
A Point-Wise Quantification of Asymmetry Using Deformation Fields
1
453
Introduction
Crouzon syndrome was first described nearly a century ago when calvarial deformities, facial anomalies, and abnormal protrusion of the eyeball were reported in a mother and her son [1]. Later, the condition was characterised as a constellation of premature fusion of the cranial sutures (craniosynostosis), orbital deformity, maxillary hypoplasia, beaked nose, crowding of teeth, and high arched or cleft palate. Heterozygous mutations in the gene encoding fibroblast growth factor receptor type 2 (FGFR2 ) have been found responsible for Crouzon syndrome [2]. Recently a mouse model was created to study one of these mutations (F GF R2Cys342T yr )[3]. Incorporating advanced small animal imaging techniques such as Micro CT, allows for a detailed examination of the craniofacial growth disturbances. A recent study, performing linear measurements on Micro CT scans, proved the mouse model applicable to reflect the craniofacial deviations occurring in humans with Crouzon syndrome [4]. Previously, we have extended this study to assess the local deformations between the groups by constructing a deformable shape- and intensity-based atlas of wild-type (normal) mouse skulls [5]. Statistical models of the deformation fields indicated that the skulls of Crouzon mice were more asymmetric than the wild-type skulls [6]. Asymmetry is highly relevant for the syndrome since the full or partial fusion of cranial sutures at either side of the skull and at different times causes the skull to grow asymmetrically. An accurate and localised assessment of asymmetry will improve surgery planning and treatment evaluation of children with Crouzon syndrome and other related diseases. In a previous study of ours, asymmetry in children with Deformational Plagiocephaly was measured using the ratio of left and right distances to a midpoint of a deformed symmetric template [7]. Another study on craniofacial malformations defined asymmetry as the deviation of the midsagittal surface with respect to the midsagittal plane [8]. The estimation of asymmetry has also received some attention in the field of brain image analysis. In [9], structural asymmetry was defined as the significant deviation from a symmetric template. In [10] asymmetries were defined by warping grouprepresentative left and right hemispheric images to each other. In [11], voxel-wise structural and radiometric asymmetry was assessed in tumour brain images by defining a symmetry plane in each image and registering to the reflection. We propose a novel asymmetry measure based on the deformation vectors resulting from nonrigid registration of a perfectly symmetric atlas image to a given subject image. The main advantage of the proposed method compared to [8], [10] and [11] is that it avoids defining a symmetry plane in each subject. This is important since defining such a symmetry plane in a skull affected by malformation is prone to errors. With respect to [9], where the left and right side relationship is ignored, the proposed approach compares the corresponding left and right deformations. We apply the asymmetry measure to locally quantify the asymmetry in Crouzon subjects and compare it to the control group.
454
2 2.1
´ H. Olafsd´ ottir et al.
Materials and Methods Data Material
Production of the F gf r2C342Y /+ and F gf r2C342Y /C342Y mutant mouse (Crouzon mouse) has been previously described [3]. All procedures were carried out in agreement with the United Kingdom Animals (Scientific Procedures) Act, guidelines of the Home Office, and regulations of the University of Oxford. For three-dimensional (3D) CT scanning, 10 wild-type and 10 F gf r2C342Y /+ specimens at six weeks of age (42 days) were sacrificed using Schedule I methods and fixed in 95% ethanol. They were sealed in conical tubes and shipped to the Micro CT imaging facility at the University of Utah. Images of the skull were obtained at approximately 46μm × 46μm × 46μm resolution using a General Electric Medical Systems EVS-RS9 Micro CT scanner. 2.2
Quantification and Localisation of Asymmetry
The proposed method makes use of a perfectly symmetric atlas created from healthy subjects. In a previous study, we have created a wild-type mouse atlas [5]. A midsagittal plane can then easily be determined using ear landmarks. Subsequently, a symmetric atlas was created by mirroring one half across the plane. In this way, correspondence between left and right voxels is known. To establish a left/right correspondence for any mouse image, the widely used B-spline-based nonrigid registration algorithm [12,13], was used to create correspondence fields between the symmetric atlas and a subject image. The transformation model of the nonrigid registration algorithm consists of a global (affine) and a local model (B-splines). For asymmetry calculations, only the local displacements are considered, in order not to make pose and scale differences affect the measure. Now, asymmetry can be calculated in each point of the deformed symmetric atlas. The basic idea of the proposed asymmetry measure is to compare a displacement vector on one side to the corresponding displacement vector on the other side. More formally, asymmetry, AP of a point P involves the comparison of the local displacement vector, vP in point P and the corresponding vector, vP in point P on the opposite side. The approach taken here is to use the mirrored vector vm P (x, y, z) = vP (−x, y, z). The absolute value of asymmetry is then defined by the magnitude of the vector difference, |AP | = ||vP − vm P ||.
(1)
This, obviously, gives AP = 0 if the original vectors are perfectly symmetric. We define the absolute asymmetry at P and P to be equal. A sign is used to indicate whether the surface has expanded or depressed with respect to the point on the other side. Thus, AP and AP are defined by the following. m if vP − vm P points outwards then AP = ||vP − vP || and AP = −AP else AP = −||vP − vm P || and AP = −AP
(2) (3)
A Point-Wise Quantification of Asymmetry Using Deformation Fields
455
This way of defining the direction of asymmetry is limited to surfaces extracted from the volume, since the surface normal is required. A volumetric measure could be obtained using the determinant of the Jacobian as in [11]. Fig. 1 illustrates the vectors involved in the calculation of the asymmetry measure.
Fig. 1. Schematic figure of vectors involved in asymmetry calculation. (a) Displacement vectors shown on the symmetric atlas. (b) Displacement vectors placed in the origin. (c) Mirroring of vP . (d) Difference vector. The magnitude of the difference vector defines the absolute asymmetry, |AP |.
3
Results and Validation
Fig. 2 presents the results of the asymmetry computations in three example subjects from each group of mice. Fig. 3 provides a comparison of the groups in terms of absolute mean asymmetry. Since the proposed method is based on results from image registration, the registration accuracy is essential for the method to be reliable. An extensive landmark validation using two sets of manual expert annotations as a gold standard was carried out in [5]. Landmark positions generated by the registration results were found to be non-significantly different from the gold standard and with significantly lower variance. To evaluate the asymmetry detection itself, we consider that the proposed method both localises and quantifies asymmetry. In order to validate the two different aspects of the method, two approaches were taken. To evaluate the localisation ability of the method, a clinical expert rated nine different regions of anatomical interest on the skull of the original Crouzon surfaces. Those were (see Fig. 2(a,b)) the nose (viewed from above and below), zygoma, anterior skull, mid skull, posterior skull, basal maxilla, anterior cranial base and posterior cranial base. The expert marked each region by 0 or 1 depending on whether the given region was symmetric or asymmetric, relatively. Similar ratings were obtained
456
´ H. Olafsd´ ottir et al.
Fig. 2. Example results for (a) three wild-type mice and (b) three Crouzon mice displayed on the deformed symmetric atlas. Asymmetry values shown in mm according to the colorscale. The scale ranges from blue (depressed) to red (expanded). Note that AP = −AP , i.e. each value on the left side has a corresponding negative value on the right side. For a visual comparison the corresponding original surfaces of (c) the wild-type mice and (d) the Crouzon mice are shown.
Mean Absolute Asymmetry [mm]
0.35
0.3
0.25
0.2
0.15
0.1 Normal
(a)
(b)
Crouzon
(c)
Fig. 3. Difference between groups. Mean absolute asymmetry of (a) wild-type and (b) Crouzon mice, displayed on the symmetric atlas in top and bottom views. Note that the colorscale is different from the one in Fig. 2. (c) Global mean absolute asymmetry in Crouzon mice and wild-type mice compared in a box plot.
from the automatic method where regions with |AP | > 0.25 mm were marked by 1 and the remaining regions by 0. Fig. 4(a) gives the number of regions where the automatic approach and the expert rating agreed.
A Point-Wise Quantification of Asymmetry Using Deformation Fields
457
Validating the quantification ability of the asymmetry measure is more problematic, since a gold standard is not present. Here we take the approach to compare our method to a simple, crude measure of asymmetry. The original surfaces of all subjects were mirrored and closest point difference to the original surface was calculated. This method provides no point correspondences and is therefore not exact but the differences should correlate with the asymmetry values calculated by the proposed method. This is shown in Fig. 4(b). Expert Automatic
0.25 Mean Asymmetry by Closest Point Difference [mm]
10 9
Agreement with Expert Rating
8 7 6 5 4 3 2 1 0
1
2
3
4
5 6 7 Crouzon Mouse #
(a)
8
9
10
Controls Crouzon Regression line 0.2
0.15
0.1
0.05
0
0
0.05
0.1 0.15 0.2 0.25 0.3 0.35 Mean Asymmetry by the proposed method [mm]
0.4
(b)
Fig. 4. a) Validation with respect to expert rating. The point of reference (or the gold standard) is defined by the expert (black bars), which indeed agrees with herself in rating of the nine regions for each mouse. The gray bars denote the number of regions where the automatic method agrees with the expert. b) Correlation of mean asymmetry using the proposed method vs. a closest point difference approach. R2 = 0.75.
4
Discussion
Fig. 2(a) shows that the three wild-type mice have a few asymmetric regions of up to approximately 0.5 mm in absolute asymmetry. This is not obvious by an amateur inspection of these regions on the original surfaces in Fig. 2(c). The level of detection would be higher for a clinical expert. However, for the more asymmetric Crouzon mice, one can easily detect asymmetry on the original surfaces by the eye in the nose, zygoma and posterior skull (Fig. 2(d)). This is in good agreement with the automatic approach in Fig. 2(b). The asymmetry of the anterior skull detected by the proposed method is closer to the symmetry plane, and is harder to confirm by the eye. Fig. 3 shows that the two groups differ considerably in terms of absolute asymmetry. From Fig. 3(a) and (b) we note that the trend is similar, i.e. asymmetry appears in similar regions, apart from the nose seen from above. However, as expected, the Crouzon mice have much higher degree of asymmetry. This is confirmed in the box plot in Fig. 3(c), which indicates that on average, Crouzon mice are more asymmetric than wild-type mice. This is also confirmed by a t-test
458
´ H. Olafsd´ ottir et al.
on the absolute mean asymmetries (p-value of 10−6 ). This is exactly what was expected from the analysis, i.e. the premature fusion of the cranial sutures leads to an asymmetric skull. Fig. 4(a) shows that the localisation ability of the automatic approach is in excellent agreement with the clinical expert. Most of the cases where the automatic method did not agree with the expert, were borderline, i.e. the asymmetry was just below or just above the selected threshold. The threshold of 0.25 approximately corresponds to the regions just turning into yellow in Fig. 2(b). Obviously, the choice of threshold is extremely important and the most correct way would probably be to use multiple parameter hypothesis testing to determine the threshold of significant asymmetry. For validation purposes, we believe that the fact that we are using the same threshold for all mice is important. For clinical practice it is perhaps even useful to be able to tune the threshold (or the range of the colorscale) with respect to the different experts’ philosophical definitions of asymmetry. Fig. 4(b) shows that the proposed method gives a relatively good correlation with a crude measure of asymmetry, with an R2 of 0.75. The fact that the correlation is not higher is understandable due to the closest point difference approach’s lack of point correspondences. Nevertheless, it definitely shows that the two methods have the same trend and we believe that this is a good indication of that the quantification is correct.
5
Conclusion
Using the proposed asymmetry measure, Crouzon mice were seen to have significantly higher average asymmetry than the wild-type mice, confirming the clinical hypothesis. The localisation ability of the method was seen to correspond well with expert rating. It was considered more problematic to validate the quantification ability of the proposed method but a comparison to a crude asymmetry measure gave an acceptable correlation. In conclusion, a novel 3D asymmetry measure was developed, providing a detailed, surface map of asymmetry.
Acknowledgements For all image registrations, the Image Registration Toolkit was used under Licence from Ixico Ltd.
References 1. Crouzon, O.: Une nouvelle famille atteinte de dysostose cranio-faciale h´er´edit`ere. Bull Mem. Soc. M´ed. Hˆ op. 39, 231–233 (1912) 2. Reardon, W., Winter, R.M., Rutland, P., Pulleyn, L.J., Jones, B.M., Malcolm, S.: Mutations in the fibroblast growth factor receptor 2 gene cause Crouzon syndrome. Nat. Genet. 8, 98–103 (1994)
A Point-Wise Quantification of Asymmetry Using Deformation Fields
459
3. Eswarakumar, V.P., Horowitz, M.C., Locklin, R., Morriss-Kay, G.M., Lonai, P.: A gain-of-function mutation of fgfr2c demonstrates the roles of this receptor variant in osteogenesis. Proc. Natl. Acad. Sci. 101, 12555–12560 (2004) 4. Perlyn, C.A., DeLeon, V.B., Babbs, C., Govier, D., Burell, L., Darvann, T., Kreiborg, S., Morriss-Kay, G.: The craniofacial phenotype of the Crouzon mouse: Analysis of a model for syndromic craniosynostosis using 3D MicroCT. Cleft Palate Craniofacial Journal 43(6), 740–747 (2006) ´ 5. Olafsd´ ottir, H., Darvann, T.A., Hermann, N.V., Oubel, E., Ersbøll, B.K., Frangi, A.F., Larsen, P., Perlyn, C.A., Morriss-Kay, G.M., Kreiborg, S.: Computational mouse atlases and their application to automatic assessment of craniofacial dysmorphology caused by the crouzon mutation Fgfr2C342Y . Journal of Anatomy 211(1), 37–52 (2007) ´ 6. Olafsd´ ottir, H., Darvann, T.A., Ersboll, B.K., Hermann, N.V., Oubel, E., Larsen, R., Frangi, A.F., Larsen, P., Perlyn, C.A., Morriss-Kay, G.M., Kreiborg, S.: Craniofacial statistical deformation models of wild-type mice and crouzon mice. In: Pluim, J.P.W., Reinhardt, J.M. (eds.) Medical Imaging 2007: Image Processing, SPIE, vol. 6512, p. 65121C (2007) ´ 7. Lanche, S., Darvann, T.A., Olafsd´ ottir, H., Hermann, N.V., Pelt, A.E.V., Govier, D., Tenenbaum, M.J., Naidoo, S., Larsen, P., Kreiborg, S., Larsen, R., Kane, A.A.: A statistical model of head asymmetry in infants with deformational plagiocephaly. In: Ersbøll, B.K., Pedersen, K.S. (eds.) SCIA 2007. LNCS, vol. 4522, pp. 898–907. Springer, Heidelberg (2007) 8. Christensen, G., Johnson, H., Darvann, T., Hermann, N., Marsh, J.: Midsagittal surface measurement of the head: an assessment of craniofacial asymmetry. Proceedings of the SPIE - The International Society for Optical Engineering 3661, 612–619 (1999) 9. Ashburner, J., Hutton, C., Frackowiak, R., Johnsrude, I., Price, C., Friston, K.: Identifying global anatomical differences: Deformation-based morphometry. Human Brain Mapping 6(5-6), 348–357 (1998) 10. Lancaster, J., Kochunov, P., Thompson, P., Toga, A., Fox, P.: Asymmetry of the brain surface from deformation field analysis. Human Brain Mapping 19(2), 79–89 (2003) 11. Joshi, S., Lorenzen, P., Gerig, G., Bullitt, E.: Structural and radiometric asymmetry in brain images. Medical Image Analysis 7(2), 155–170 (2003) 12. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans. on Medical Imaging 18(8), 712–721 (1999) 13. Schnabel, J.A., Rueckert, D., Quist, M., Blackall, J.M., Castellano-Smith, A.D., Hartkens, T., Penney, G.P., Hall, W.A., Liu, H., Truwit, C.L., Gerritsen, F.A., Hill, D.L.G., Hawkes, D.J.: A generic framework for non-rigid registration based on non-uniform multi-level free-form deformations. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, pp. 573–581. Springer, Heidelberg (2001)
Object Localization Based on Markov Random Fields and Symmetry Interest Points Ren´e Donner1,2 , Branislav Micusik2 , Georg Langs1,3 , Lech Szumilas2 , Philipp Peloschek4, Klaus Friedrich4 , and Horst Bischof2 1
2
Institute for Computer Graphics and Vision, Graz University of Technology, Austria [email protected] Pattern Recognition and Image Processing Group, Vienna University of Technology, Austria {donner,micusik,lech}@prip.tuwien.ac.at 3 GALEN Group, Laboratoire de Math´ematiques Appliqu´ees aux Syst`emes, Ecole Centrale de Paris, France [email protected] 4 Department of Radiology, Medical University of Vienna, Austria {philipp.peloschek,klaus.friedrich}@meduniwien.ac.at
Abstract. We present an approach to detect anatomical structures by configurations of interest points, from a single example image. The representation of the configuration is based on Markov Random Fields, and the detection is performed in a single iteration by the max-sum algorithm. Instead of sequentially matching pairs of interest points, the method takes the entire set of points, their local descriptors and the spatial configuration into account to find an optimal mapping of modeled object to target image. The image information is captured by symmetrybased interest points and local descriptors derived from Gradient Vector Flow. Experimental results are reported for two data-sets showing the applicability to complex medical data.
1
Introduction
The reliable and fast detection and segmentation of anatomical structures is a crucial issue in medical image analysis. It has been tackled by a number of powerful approaches, among them are active shape models [3] , active appearance models [4], and graph-cuts [2]. They have successfully been employed to segment structures in cardiac MRIs [13] or for registration in functional heart imaging [15]. These methods need to be initialized: ASMs and AAMs need to be placed with considerable overlap with the object of interest. Graph-cuts need manually annotated seed points placed within and outside of the object. This initialization is either done manually or by application specific approaches.
This research has been supported by the Austrian Science Fund (FWF) under grants P17083-N04 (AAMIR) and P17189-N04 (SESAME), as well as the European Union Network of Excellence FP6-507752 (MUSCLE) and the Region ˆIle-de-France.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 460–468, 2007. c Springer-Verlag Berlin Heidelberg 2007
Object Localization
461
An approach to a detection of such initialization positions is to use local descriptors like SIFT [10], shape context [1] or PCA-SIFT [6]. They match interest points between a source (i.e. example) image and the until now unseen target image, and typically rely on a robust estimation method like RANSAC [5]. These approaches have several drawbacks: For complex non-rigid transformations between source and target image a large number of correct interest points matches is required to correctly estimate the unknowns of the transformation, which considerably increases computation time for the robust matching. Information about the spatial relation of adjacent descriptors is difficult to incorporate into the matching process. In this paper we propose a deterministic method based on Markov Random Fields (MRF) that incorporates both interest point positions and local features to perform the detection of landmark configurations from a single example. The detection is performed in a single iteration by the max-sum algorithm [16]. The approach uses all interest point features and positions and finds a solution which minimizes the combined costs of non-rigid deformations and local descriptor feature differences. Arbitrary interest points and local descriptors can be used. We report results for interest points based on local symmetry and a complementary local descriptor derived from gradient vector flow [17]. Local symmetry detectors were investigated in [8,12], but they are either computationally expensive or use radial symmetry detectors of predefined radii. Recently [11] proposed an approach to detect symmetry in the constellation of interest points detected by existing point detection methods. The paper is structured as follows: In Sec. 2 we explain the interest point detector and local descriptor. In Sec. 4 the mapping of the source- to the target points by MRFs will be explained in detail. In Sec. 5 we present the experimental evaluation of our approach, followed by a conclusion and an outlook in Sec. 6.
2
Symmetry Based Interest Points and Descriptors
Many structures of interest to medical experts, like bones, veins and many anatomical structures or their parts exhibit a shape with a high degree of symmetry w.r.t. an axis. This property of (local) symmetry is well preserved even when dealing with 2D slices of 3D data sets like MRIs, as the cross sections of these body parts will appear as round or elongated structures. Even regions of interest that do not exhibit this property can be localized by observing their neighborhood, e.g. an initialization for e.g. meniscoids can be provided by correctly localizing the discs and vertebrae of the spine. 2.1
Interest Points from Local Symmetry
Popular interest point detectors which are often used in conjunction with SIFT are the Harris corner detector and the difference of Gaussians (DoG) approach, neither of them possessing an affinity to local symmetry. A comparison of the interest points detected by DoG and interest points derived from local symmetry
462
R. Donner et al.
(a)
(b)
(c)
Fig. 1. Comparison of the (a) interest points found by difference of gaussians (DoG) and (b) the symmtery points found as minima of GVF magnitude. Note how the symmetry points pick up the structures which are of interest to medical experts, greatly facilitating the correct localization of these structures. (c) Depicts the scale and orientation estimates obtained around the symmetry points.
is shown in Fig. 1 (a,b). To detect points of high local symmetry we use the gradient vector flow (GVF) field originally proposed in [17] to increase the capture range of active contours. Its strengths include the ability to detect even weak structures while being robust to high amounts of noise in the image when used for symmetry detection. To further reduce the influence of noise the image can be median-filtered prior to computing GVF. The GVF can be computed either from a binary edge map or directly from the gray level image I. We compute the GVF of an image as G = u + i ∗ v = GV F (I), yielding the complex matrix G used for all subsequent computations. The resulting field G is depicted in Fig. 2 for synthetic examples and a section of a hand radiograph, overlaid over the image I. The field magnitude |G| is largest in areas of high image gradient, and the start- and endpoints of the field lines of G are located at symmetry maxima. E.g. in the case of a symmetrical structure formed by a homogeneous region surrounded by a different gray level value the field will point away form or towards the local symmetry center of the structure, as shown in Fig. 2 (a,b). The symmetry interest points are thus defined as the local minima of |G|. After detecting the interest points the orientation bi ∈ [0, π] of the local region surrounding the interest point can be estimated. It is computed as bi = G(xi + Δxi , yi + Δyi ), which is the orientation of G at a pixel in a local r × r-pixel neighborhood satisfying (Δxi , Δyi ) =
argmin Δyi ∈{−r/2,...,r/2} Δxi ∈{0,...,r/2}
|( G(xi + Δxi , yi + Δyi ) − G(xi − Δxi , yi − Δyi )|. (1)
Object Localization
(a)
(b)
463
(c)
Fig. 2. (a,b) Examples of GVF with the detected symmetry interest points (diamonds). (c) Descriptor extraction from the GVF field. Around each symmetry point patches are extracted from the vector field according to their scale and orientation. The patch is then resampled to a 10 × 10 grid to form the actual descriptor. The image is displayed for better visualization, the symmetry points are marked as circles.
The scale si of the region around the interest point is estimated by the mean distance from (xi , yi ) to the two closest local maxima of |G| in the direction of bi ± π. Examples for the resulting estimates for orientation and scale are shown in Fig 1 (c). 2.2
Descriptors from Gradient Vector Flow Fields
A measure is needed to specify the similarity of the local regions around the symmetry interest points. Several local descriptors have been proposed in recent years, including SIFT [10] and Shape Context [1]. While most of these approaches yield descriptors suitable for building the MRF, they would require additional computations. In contrast, we can directly use G to describe local context. [6] use normalized patches of the image gradient according to the interest points’ orientation and scale as local descriptor. Similarly, we extract patches of G around the symmetry interest points, according to scale si and orientation bi . They are re-sampled to a 10 × 10 grid, as depicted in Fig. 2, to form the actual local descriptor. This encodes the information about the image gradients within and around the patch. Because of the GVF’s smooth structure, Euclidean distance can be used used to compute the distance between two descriptors. This eliminates the need for complex histogram construction as performed by SIFT for example, while still retaining a feature vector of low dimensionality. As the orientation of the local interest point is only uniquely defined up to ±π, the actual distance between two local descriptors D1 and D2 is computed as min(abs(D1 − D2 ), abs(D1 − D∗2 )), where D∗2 denotes the descriptor 2 rotated by π.
3
Markov Random Fields and the Max-Sum Problem
The Markov Random Fields considered in this paper represent graphs where each of the M nodes, called objects, has N fields, or labels, with associated qualities. The labels of two adjacent nodes are fully connected by N 2 edges,
464
R. Donner et al. Object 2
N 2 Edges
Object 1
nj Object 4
ni N Labels Object 3
Fig. 3. The MRF graph consists of M objects with N labels each. Qualities are assigned to both labels and edges. Finding the solution to a max-sum problem means selecting a label for each object, such that the sum of qualities of the selected labels and the edges connecting them is maximized.
again with a weight to encode quality. Which objects are adjacent is encoded in an additional graph A with a edges. This basic structure is depicted in Fig. 3. There are 4 objects with 3 labels each, with N 2 = 9 edges between the adjacent objects, a is 5. Of interest is now to select one label for each object, so that the sum of label and edge qualities of the resulting sub-graph becomes maximal, illustrated as thick lines. The max-sum solver can be used to tackle this problem. The max-sum (labeling) problem of the second order is defined as maximizing a sum of bivariate functions of discrete variables. The solution of a max-sum problem corresponds to finding a configuration of a Gibbs distribution with maximal probability. It is equivalent to finding a maximum posterior (MAP) configuration of an MRF with discrete variables [16]. Let the M × N -matrix C represent the label qualities for each of the objects, and the a× N 2 -matrix E represent the edge qualities between the pairs of labels. The total quality of the label selection S = {n1 , . . . , nM } with ni ∈ {1, . . . , N } is then defined as C(S) = C(m, S(m)) + E(α, β(E, S, α)), (2) α=1...a
m=1...M
where β(E, S, α) denotes the column representing the quality of the edge between the labels chosen to represent the edge A(α). Solving the max-sum problem means finding the set of optimal labels S ∗ = argmax C(S). S
(3)
Recently, a very efficient algorithm for solving this problem through linear programming relaxation and its Lagrangian dual, originally proposed by Schlesinger in 1976 [14], has been presented [16]. The max-sum solver permits several labels to be defined while still keeping the processing time within reasonable bounds. There are other attempts to solve the labeling problem for MRF using, e.g., second order cone programming [9], sequential tree-reweighted max-product message passing [7] or belief propagation methods [18]. However, neither of the
Object Localization
465
algorithms, nor the max-sum approach, solve the problem of a multi-label MRF exactly, as it is NP-hard. If the graph is a tree the global optimum of Eq. (3) is guaranteed [7], in the case of a non-tree graph max-sum takes various approximations into account to reach a possibly optimal solution.
4
Localization of Anatomical Structures
For a model image, a subset of interest points is manually selected to describe the medical object to be found. The Delaunay triangulation of these M model points yields the set A of index-tuples describing the edges. An example of the generated model is shown in Fig. 4 (a,b).
11 6
12
2
16 17
7 1
13 8
1 18
2 21
3 14
19
22
3 4
9 4
5 23
6 7
5 10
(a)
(b)
15 20
(c)
24
(d)
Fig. 4. (a,b) Model graph A automatically generated from the symmetry points selected on the model image. The additionally placed landmarks (circles) are not part of the model and are used only for visualization. (c,d) show the graphs matched to test images, including the landmarks propagated according to the correspondences found by the matched graph.
The M selected model points represent the objects of the MRF graph, while the indices of the N target interest points correspond to the labels. A solution S thus represents a mapping of the model interest points to a subset of the target interest points, assigning one target interest point to each model point. The quality of a (model point, target point)-match equals the negative distance between their local descriptors (as we are solving a maximization problem). All mutual distances between model and potential target correspondences are computed, resulting in the M × N -matrix C. The qualities of the aN 2 edges in the model are stored in E. The quality of an edge between two labels ni , nj in E is computed by comparing its length and angle with the edge between the corresponding objects (model nodes). As the medical structures under investigation can be assumed to be of similar scale, the edge quality e is set to e(α, ni , nj ) = − (|lengthA(α) − length(ny , nz )| + γ (| A(α) − (ni , nj )|)) , (4)
466
R. Donner et al. 30 2500
25 2000
20 1500
15
1000
10
500
5
0 0
20
40 60 Landmark Residual (pixels)
(a)
80
100
0 0
10
20 30 40 Landmark Residual (pixels)
50
60
(b)
Fig. 5. Result histograms of the distances of propagated landmarks to standard of reference landmark positions for (a) the hand and (b) the spine data set
where length(h, k) represents the pixel distance between interest points k and h, (h, k) is the orientation of the edge and γ is a normalization factor to compensate for the different scale of angles and lengths. It can occur that no interest point is detected in one location of the medical structure in the target image where the model would expect one. It is thus important to include the possibility of omitting a model point. This is achieved by adding one artificial target interest point (dummy point), yielding Cd and Ed of sizes M × N + 1 and a × (N + 1)2 , respectively. The last column of Cd is set to the mean of C multiplied by a factor f controlling how costly it should be to omit a model point. Similarly, the edges of Ed involving the dummy point are set to f times the mean of E. The max-sum solver is then applied on Cd, Ed, yielding the set S = {n1 , . . . , nM } of optimal labels for each model node, maximizing the quality C in Eq. 3. The presented method thus in effect performs a non-rigid registration of the partial model image to the test image. As the interest points are not necessarily at the locations medical experts are interested in, additional landmark points are manually set in the model image. They are not used for computing the match, but only for result visualization and evaluation.
5
Experiments
The approach was evaluated on 2 data sets (Fig. 4). 1. For a set of 30 hand radiographs (300×450 pixels) standard of reference annotations (landmarks) for 24 joints in each image were available. 2. On 5 spine MRIs (280×320 pixels) manual annotations of 7 inter-vertebral discs were used. To evaluate the matching accuracy the landmarks were propagated according to the match and the pixel error between propagated and correct landmarks was recorded. Piecewise affine transformation of the Delaunay triangulation of the selected source symmetry points is used to propagate the source landmarks to the target image. The typical number of detected interest points was between 400 and 600, the
Object Localization
467
model graphs contained 10 to 25 nodes. In Fig. 4 (a,b) the source model graphs for two examples are depicted. The model graph is depicted by blue lines, green circles are manual annotations used only for validation. In Fig. 4 (c,d) matching results are depicted: red lines represent the model graph matched to the target image, while green circles indicate the propagated landmarks. Quantitative analysis was performed by a leave-one-out procedure i.e a single image was chosen as source and the model graph was matched to the remaining 29 or 4 images. The mean/median error for matches is 14.2 / 9.7 pixels for hand data (a typical joint width is 25 pixels) and 10.85 / 4.8 pixels for the spine data. This is sufficient for most initialization purposes. The error histograms in Fig. 5 show the pixel distances for all propagated landmarks to the correct target landmark positions from all runs. Typical run times for solving the MRF for one source-target match are in the order of a few seconds.
6
Conclusion and Outlook
We present a framework for the matching of anatomical structures from a single example. Configurations of interest points are represented by graphs and Markov random fields, and the matching is performed in one iteration by the max-sum algorithm. The approach integrates local descriptor similarities and deformation constraints in a single optimization step. Results indicate that the method provides the localization accuracy necessary for the initialization of subsequent segmentation algorithms. Future research will focus on using combined model graphs from several model images, and the extension to 3D data sets.
References 1. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE PAMI 24(4), 509–522 (2002) 2. Boykov, Y., Jolly, M.-P.: Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In: Proc. ICCV, pp. 105–112 (2001) 3. Cootes, T.: Active shape models - ‘smart snakes’. In: BMVC (1992) 4. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. PAMI 23(6), 681–685 (2001) 5. Fischler, M.A., Bolles, R.C.: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM 24 (1981) 6. Ke, Y., Sukthankar, R.: PCA-Sift: A more distinctive representation for local image descriptors. In: CVPR (2), pp. 506–513 (2004) 7. Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. PAMI 28(10), 1568–1583 (2006) 8. Kovesi, P.: Symmetry and asymmetry from local phase. In: Proceedings of the Tenth Australian Joint Conference on Artificial Intelligence, pp. 185–190 (1997) 9. Kumar, M.P., Torr, P.H.S., Zisserman, A.: Solving Markov random fields using second order cone programming. In: Proc. CVPR, pp. I:1045–1052 (2006) 10. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)
468
R. Donner et al.
11. Loy, G., Eklundh, J.-O.: Detecting symmetry and symmetric constellations of features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, Springer, Heidelberg (2006) 12. Loy, G., Zelinsky, A.: Fast radial symmetry for detecting points of interest. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 959–973 (2003) 13. Mitchell, S.C., Bosch, J.G., Lelieveldt, B.P.F., van der Geest, R.J., Reiber, J.H.C., Sonka, M.: 3-d active appearance models: Segmentation of cardiac MR and ultrasound images. IEEE TMI 21(9), 1167–1178 (2002) 14. Schlesinger, M.: Sintaksicheskiy analiz dvumernykh zritelnikh signalov v usloviyakh pomekh (syntactic analysis of two-dimensional visual signals in noisy conditions). Kibernetika (4), 113–130 (1976) (in Russia) ´ 15. Stegmann, M.B., Olafsd´ ottir, H., Larsson, H.B.W.: Unsupervised motioncompensation of multi-slice cardiac perfusion MRI. Medical Image Analysis 9(4), 394–410 (2005) 16. Werner, T.: A linear programming approach to Max-sum problem: A review. Research Report CTU–CMP–2005–25, Czech Technical University (2005) 17. Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Trans. on Image Proc. 7(3) (March 1998) 18. Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)
2D Motion Analysis of Long Axis Cardiac Tagged MRI Ting Chen, Sohae Chung, and Leon Axel Radiology Department, New York University, School of Medicine 600A 650 First Ave, New York City, NY, 10016, USA
Abstract. The tracking and reconstruction of myocardial motion is critical to the diagnosis and treatment of heart disease. Currently, little has been done for the analysis of motion in long axis (LA) cardiac images. We propose a new fully automated motion reconstruction method for grid- tagged MRI that combines Gabor filters and deformable models. First, we use a Gabor filter bank to generate the corresponding phase map in the myocardium and estimate the location of grid tag intersections. Second, we use a non-rigid registration module driven by thin plate splines (TPS) to generate a transformation function between tag intersections in two consecutive images. Third, deformable spline models are initialized using Fourier domain analysis and tracked during the cardiac cycle using the TPS generated transformation function. The splines will then locally deform under the influence of gradient flow and image phase information. The final motion is decomposed into tangential and normal components corresponding to the local orientation of the heart wall. The new method has been tested on LA phantoms and in vivo heart data, and its performance has been quantitatively validated. The results show that our method can reconstruct the motion field in LA cardiac tagged MR images accurately and efficiently. Keywords: tagged MRI, LA, myocardial motion tracking, Gabor filter, deformable model, TPS.
1
Introduction
For decades, heart disease has been the leading cause of death in Western countries [10]. Tagged MRI [2] [3] provides a promising approach to the diagnosis and treatment of the disease, by noninvasively both revealing the anatomical structure of the myocardium, and displaying the motion of the myocardium during the cardiac cycle. However, it is still a big challenge to quantitatively track and reconstruct the motion in the living myocardium. Currently, most tag tracking methods, e.g., Harmonic Phase imaging (HARP) [8], have concentrated on solving the problem of reconstructing the myocardial motion in short axis (SA) images. However, 3D motion analysis including LA image analysis is necessary for a better understanding of the anatomic cause of the heart disease, and to fully evaluate the local heart function. In [9] and [12], motions in multiple SA images have been interpolated to reconstruct the N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 469–476, 2007. c Springer-Verlag Berlin Heidelberg 2007
470
T. Chen, S. Chung, and L. Axel
motion in 3D. However, these methods did not explicitly address the motion through the SA image plane. In [14], through-plane motion has been encoded as phase information in SA images but, nevertheless, its performance deteriorates at the apex. Moreover, all these methods may generate overly smoothed results because of the small number of SA slices to be interpolated. Therefore, we need to analyze LA images directly, in order to track and reconstruct the myocardial motion in 3D. The HARP method makes use of the inverse Fourier transform of one harmonic peak of the transform of the MR image in the frequency domain to estimate the local tag phase distribution, and then uses the phase changes over the cardiac cycle to derive the displacement and strain. HARP generates the phase map based on a set of constant global parameters, so it lacks adaptivity to changing local tag spacing and orientation. Its insensitivity to local structure may cause erroneous local phase patterns, such as bifurcations in the phase map. In [13] and [15], responses to a bank of Gabor filters [1] have been used to adaptively detect the local tag spacing and orientation, which are used in turn to estimate the local phase. The phase outputs of Gabor filter banks have high accuracy. However, their performance is limited by the following two factors. First, the performance of Gabor filters degrades near the myocardial boundaries. This is because around the myocardium boundaries, the input to the Gabor filter is a combination of myocardial motion and the independent motion of blood or adjacent issue. Second, the Gabor filters cannot capture large deformations in the myocardium with the same accuracy as for small deformations, as their performance is limited by the size and shape of the filters. Both HARP and the Gabor filter bank method have been implemented for SA image analysis, but there are no current reports of quantitative LA analysis using either method. In [15], Chen et al. used a spline-based scheme to reconstruct the motion of tags through the cardiac cycle. A hybrid of the tag phase and the gradient flow approaches was used to converge spline models and tags. However, the tracking may fail when there is large and nonlinear deformation, since the spline model is initialized as a linear combination of tag locations in previous cardiac phases. To overcome problems with the existing motion tracking methods, and to find a solution for LA motion analysis, we propose a new motion tracking method. First, two tag phase maps (vertical and horizontal) are generated for grid-tagged MR images, using a Gabor filter bank. The intersections of tags in two consecutive images are nonrigidly registered, using the thin plate spline (TPS) [11] to generate a smooth 2D transformation function between the pair of images. Each segment of a tag in LA images is modeled as a deformable spline, whose initial location in the current image is decided by previous images and the transformation function. The spline is deformed under the influence of the phase map and local gradients. We then use the virtual spline method in [13] to reconstruct the full 2D displacement and strain in the LA image. We give a description to our method in more detail in Section Two. In Section Three, experimental results and validations are presented. We discuss the strengths of our method and draw some conclusions in Section Four.
2D Motion Analysis of Long Axis Cardiac Tagged MRI
2 2.1
471
Method Gabor Filter Bank
A Gabor filter [1] is a sinusoidally modulated Gaussian that can be convolved with an image to extract the local periodic ”stripe” content. It is simply expressed in the image domain as a Gaussian multiplied by a complex sinusoid: h(x, y) = g(x , y ) · exp[−i2π(u0 x + v0 y)]
1
(1)
x
y
( σx )2 +( σy )2
) is a Gauswith center frequency (u0 , v0 ); g(x , y ) = 2πσ σ exp(− 2 x y sian filter with the spatial standard deviations σx , σy . The complex function h(x, y) can be split into its real and imaginary components hR and hI (even and odd functions, respectively), whose Fourier transforms are: HR (u, v) =
1 (G(u − u0 , v − v0 ) + G(u + u0 , v + v0 )) 2
(2)
i (G(u − u0 , v − v0 ) − G(u + u0 , v + v0 )) (3) 2 where G(u, v) is the Fourier transformation of g(x , y ), which is also a Gaussian. Therefore the real and imaginary filters in (2) and (3) are the sum of two coupled Gaussian functions in the Fourier domain, centered at the frequencies (u0 , v0 ) and (−u0 , −v0 ). The final form of a 2D Gabor filter in the Fourier domain is: HI (u, v) =
H(u, v) = HR (u, v) + iHI (u, v)
(4)
The coupled 2D Gabor filters in the Fourier domain can be parameterized as ν, θ, σ, where ν is the reciprocal of the tag spacing, θ is the orientation, and σ is the size of the filters. Its response is maximal when the input tagged image has the same local frequency and orientation. During systole, the local spacing (corresponding to ν) and orientation (corresponding to θ) of tags change and are variably distributed in the myocardium. To find the local frequency and orientation, we use a filter bank, which consists of Gabor filters with 5 different values 1 1 1 1 ν0 , 1.05 ν0 , ν0 , 0.95 ν0 , 0.9 ν0 ]) and θ ([−π/10, −π/20, 0, π/20, π/10]). ν0 is of ν ([ 1.1 the inverse of the initial tag spacing. The value of σ is dependent on ν to allow a fixed range of output for filters in the bank. Each filter is multiplied with the Fourier transform of the tag image and then undergoes the inverse Fourier transformation to generate a corresponding magnitude image of its output. For each pixel in the tag image, only the three filters that have the largest output at that location in the magnitude image are considered. Parameters of those three filters are nonlinearly interpolated to create an optimal estimation for local frequency (inverse of tag spacing) and orientation at each pixel. The outputs of these three filters are also nonlinearly combined to form a complex image, from which the local phase can be computed using the arctan function.
472
2.2
T. Chen, S. Chung, and L. Axel
Motion Tracking
We track the myocardial motion in two steps. Given two consecutive tag images Ii and Ii+1 , we first use a nonrigid registration method to initialize spline models in Ii+1 based on the tag locations in Ii , and then use deformable models to improve the tracking. Nonrigid Registration: During the cardiac cycle, there is motion through the LA image plane, which can cause the appearance and disappearance of tags in the image of the curved heart wall, the papillary muscles may move into or out of the LA image plane, and tags decay as a function of time. All these facts may dramatically change the appearance and shape of the heart in LA images during the cardiac cycle so that it is difficult to register two images directly. To overcome these difficulties, we convert the registration into a problem of matching the corresponding tag intersections in two grid tag images. Suppose we are tracking the motion of tags from Ii to Ii+1 and assume we have a good segmentation of the myocardium. The tag intersections in the myocardium in these two images are closely correlated except that the throughplane motion may cause the intersections close to the myocardial boundaries to disappear or appear. We denote tag intersections in Ii and Ii+1 as P = {pj , j = 1, 2, . . . , M } and Q = {qk , k = 1, 2, . . . , N }, respectively. Assume that from Ii to Ii+1 , the underlying motion field can be expressed as a non-rigid transformation function f . A point pj ∈ P in Ii is mapped to its new location pj = f (pj ) in Ii+1 . Thus the matching problem is equivalent to the minimization of fuzzy assignment-least square energy function EY,f : E(Y, f ) =
M N j=1 k=1
yjk qk − f (pj )2 + λf 2 + T
M N j=1 k=1
yjk log yjk − ζ
M N
yjk
j=1 k=1
(5) λ and ζ are both positive weights for terms in the energy function, λ controls the strength of the smoothness constraint, and ζ controls the strength of the internal correlations between points. Y is the correlation matrix between two point sets. M+1 −qk −pj 2 ) satisfies the constraint j=1 yjk = 1 for Its element yjk = T1 exp( 2T N +1 k = 1, 2, . . . , N and k=1 yjk = 1 for j = 1, 2, . . . , M with yjk ∈ [0, 1]. Also, an M N entropy term, T j=1 k=1 yjk log yjk , is added to the energy function, following the approach of deterministic annealing [6]. The temperature parameter T is higher at the start of the tracking process so that the energy function favors fuzzy correspondence to maintain its convexity, and gradually decreases to zero during the tracking for a global binary solution of Y . The second term is a smoothness constraint on the transformation function f , and the fourth term is used to control the existence of outliers in the final matching result. Notice the first two terms in the function are in the form of a thin plate spline (TPS) function and f can be solved for using the QR decomposition [4]. By iteratively updating Y and f , we can solve for f in the form of a combination of the global translation matrix t, and local deformation vertices w at tag intersections. The
2D Motion Analysis of Long Axis Cardiac Tagged MRI
473
local deformation vertices will be smoothly interpolated to generate a 2D local deformation map for each pixel in Ii . Given the location of tags in Ii , their corresponding locations in Ii+1 can be estimated using the global translation followed by the local deformation. Deformable Models: We initialize spline models at the locations computed using the non-rigid registration. We initialize independent splines for each segment of a tag so that the motions in the septum and free wall will not interfere with each other. The contours move under the influence of the external gradient force fext and the phase constraint fphase , following the Lagrangian equation: d˙ + Kd = fext + fphase
(6)
where K is the stiffness matrix that controls the smoothness of the deformable contour [7], d is the displacement corresponding to the initial locations of spline models (after the nonrigid registration at each cardiac phase), and d˙ is the speed of the deformation. The external force is derived from the tagged MRI image, in the form of gradient flow: fext = −∇(g · I))
(7)
where g is the Gaussian operator, and I is the original image. The external force alone cannot guarantee an accurate convergence of the deformable contour and the corresponding tag because of local noise, imaging artifacts, and small tag spacing. Therefore we design a supplementary phase force field, which is also derived from the output of the Gabor filter bank, to constrain the movement of deformable splines. Given the wrapped phase output R of the Gabor filter bank, the phase constraint at pixel x ∈ Ii+1 has the following form: fphase (x) = −|U (Ri (x) − Ri+1 (x))|−1 · ∇(π − |R(x)|)
(8)
where U (·) is an unwrapping operator. The magnitude of the phase constraint is inversely proportional to the local phase change so that the phase influence decreases when the local deformation increases, and vice versa. 2.3
Displacement and Strain Reconstruction
The shape and motion pattern of the myocardium in LA cardiac images are different from those in SA images. In SA images, the myocardium around the left ventricle (LV) has an annulus shape and the motion of the myocardium can be described as a combination of radial stretching (pointing towards the centroid of the LV) and circumferential shortening. In LA images, the myocardium around the LV has the shape of a horseshoe and the motion cannot readily be divided into radial and circumferential components. Therefore we calculate the principal strains, P1 and P2, which are usually coincident with the normal and tangential components of the local heart wall orientation, respectively. After spline models converge to follow the tags, we use the virtual tag methods [13] to generate a 2D displacement map. A smoothness constraint is used where there are not enough tags to be reliably interpolated. The strain is then calculated using the algorithm introduced in [5].
474
T. Chen, S. Chung, and L. Axel
Fig. 1. The TPS-driven non-rigid point matching in the numerical phantom from phase 1 to 8. Yellow circles are tag intersections to be matched. The red arrows show the direction of the transformation function. At phase 4 there is an erroneous intersection, which is discarded in the following phase since there is no correspondence.
3
Experimental Results
We first tracked the motion in a numerical phantom using our tag tracking method. The phantom deforms under a given displacement field. The LV in LA images is modeled as two co-focus ellipsoids. The phantom is thinner at the apex to resemble the anatomic structure of the myocardium. The motion in the myocardium is originally defined in directions that are tangential and normal to the local orientation of the phantom, and then transformed into x− and y− displacements in the Cartesian coordinates. At the starting cardiac phase, tags are initialized as straight dark lines in the phantom, using a sine function for the tag intensity profile. the tags fade during the simulated cardiac cycle. Two Gabor filter banks are used to generate tag phase maps in the x and y directions. The intersections of tags are defined as local minima in both phase maps. The nonrigid registration tool is used to find the correspondence between intersections in two images and generate a transformation function. In Fig. 1 we show the motion tracking process. The final displacement map is calculated using the deformable model. In Fig. 2 we display the x− displacement map at the simulated cardiac phases 2,3, and 4. We can see the displacement has been well captured qualitatively. To quantitatively validate the tracking performance of our method, we also calculated the root mean square (RMS) error in the computed displacement map during the simulated cardiac cycle. The result is shown in Tab. 1. The mean magnitude of the ”ground truth” x− displacement map is compared with the RMS error,
2D Motion Analysis of Long Axis Cardiac Tagged MRI
475
Fig. 2. The x− displacement map of the phantom at cardiac phases 2, 3, and 4 Table 1. The comparison between the RMS error and the mean magnitude of the x− displacement (both in pixels) phase 1 2 3 4 5 6 7 8 9 Mean Disp. 0.3 0.9 1.5 1.8 1.2 0.6 0.36 0.24 0.12 RMS error 0.0401 0.1013 0.1511 0.1768 0.1411 0.1230 0.0615 0.0303 0.0297 Ratio (%) 13.37 11.26 10.07 9.82 11.76 20.50 17.08 12.63 24.75
10 0.06 .0156 26.00
both in pixels. Notice in the first 5 phases, the RMS errors are below 15% of the corresponding mean displacement. However, in the last 5 phases the error ratios increase. This can be explained by the accumulative error in deformable model fitting, and the decreasing contrast between tagged and untagged regions in the phantom (to simulate the decay of tags). We tested our method on six in vivo heart data sets. In Fig. 3, we show the result of tracking of myocardial motion and the corresponding strain distribution of a patient with hypertrophic cardiomyopathy. The total processing time for one image sequence (usually including 10 100 by 100 frames) on a PC with 2GHz CPU is within one minute.
Fig. 3. From left to right, the deformed tagged MR image; x− displacement, maximal x− displacement 6.113 pixel, minimal x− displacement -7.2496 pixel; y− displacement, maximal y− displacement 7.1756 pixel, minimal y− displacement -1.6411 pixel; and the P1 and P2 strain map projected onto the undeformed image, in which red lines show the magnitude and orientation of P1 strain, and blue lines represents the magnitude and orientation of P2 strain. It is clear that the free wall and the septum have different strain distributions that may be of clinical importance.
476
4
T. Chen, S. Chung, and L. Axel
Conclusions
We propose in this paper a new motion tracking method for the analysis of myocardial motion in LA images. Experimental motion tracking results of both the phantom and the in vivo heart data demonstrate the method’s effectiveness. More experiments are necessary for finding its optimal performance point. The motion tracking method can also be used in the analysis of SA images. In the future, we will extend the method into a true 3D motion analysis tool for use in clinical studies. In particular, we hope to collect enough in vivo data with large LV deformation, where 3D motion may be critical for pathological analysis. Some preliminary work along these lines, including the development of a TPS-based 3D tracking module, has been started.
References 1. Gabor, D.: Theory of communication. J. IEE 93(3), 429–457 (1946) 2. Axel, L., Dougherty, L.: MR imaging of motion with spatial modulation of magnetization. Radiology 17l, 841–845 (1989) 3. Axel, L., Dougherty, L.: Improved method of spatial modulation of magnetization (SPAMM) for MRI of heart wall motion. Radiology 172, 349–350 (1989) 4. Wahba, G.: Spline models for observational data. SIAM, Philadelphia, PA (1990) 5. Fung, Y.C.: Biomechanics: Mechanical Properties of Living Tissues, 2nd edn. Springer Science, New York (1993) 6. Gold, S., Rangarajan, A.: A graduated assignment algorithm for graph matching. IEEE Trans. Pattern Analysis and Machine Intelligence 18(4), 377–388 (1996) 7. Metaxas, D.: Physics-based Deformable Models: Application to Computer Vision, Graphics and Medical Imaging. Springer, Heidelberg (1996) 8. Osman, N.F., McVeigh, E.R., Prince, J.L.: Imaging heart motion using Harmonic Phase MRI. IEEE Trans. on Medical Imaging 19(3), 186–202 (2000) 9. Amini, A.A., Chen, Y., Elayyadi, M., Radeva, P.: Tag surface reconstruction and tracking of myocardial beads from SPAMM-MRI with parametric B-spline surfaces. IEEE Trans. on Medical Imaging 20(2), 94–103 (2001) 10. American heart association 2002 annual report, by American Heart Association (2002) 11. Chui, H., Rangarajan, A.: A new point matching algorithm for non-rigid registration. Computer Vision and Image Understanding 89(2-3), 114–141 (2003) 12. Chang, H., Moura, J.M.F., Wu, Y., Sato, K., Ho, C.: Reconstruction of 3D dense cardiac motion from tagged MR sequences. In: Proceedings of ISBI, pp. 880–883 (2004) 13. Axel, L., Chen, T., Manglik, T.: Dense myocardium deformation estimation for 2D tagged MRI. In: Frangi, A.F., Radeva, P.I., Santos, A., Hernandez, M. (eds.) FIMH 2005. LNCS, vol. 3504, pp. 446–456. Springer, Heidelberg (2005) 14. Abd-Elmoniem, K.Z., Stuber, M., Osman, N.F., Prince, J.L.: ZHARP: Threedimensional motion tracking from a single image plane. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 639–651. Springer, Heidelberg (2005) 15. Chen, T., Axel, L.: Using Gabor filters bank and temporal-spatial constraints to compute 3D myocardium strain. In: Proceedings of EMBC (2006)
MCMC Curve Sampling for Image Segmentation Ayres C. Fan1 , John W. Fisher III1,2 , William M. Wells III2,3 , James J. Levitt3,4 , and Alan S. Willsky1 1 Laboratory for Information and Decision Systems, MIT, Cambridge, MA Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 3 Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 4 Dept. of Psychiatry, VA Boston HCS, Harvard Medical School, Brockton, MA [email protected]
2
Abstract. We present an algorithm to generate samples from probability distributions on the space of curves. We view a traditional curve evolution energy functional as a negative log probability distribution and sample from it using a Markov chain Monte Carlo (MCMC) algorithm. We define a proposal distribution by generating smooth perturbations to the normal of the curve and show how to compute the transition probabilities to ensure that the samples come from the posterior distribution. We demonstrate some advantages of sampling methods such as robustness to local minima, better characterization of multi-modal distributions, access to some measures of estimation error, and ability to easily incorporate constraints on the curve.
1
Introduction
Curve evolution methods are a class of algorithms which seek to segment an image I with a curve C by finding a local optimum of a given energy functional E(C; I). In general, having a single local optimum provides little insight as to how close the result is to the global optimum or how confident one should be in the answer. For low signal-to-noise ratio (SNR) or ill-posed problems, there are many local optima, and there can be multiple answers that plausibly explain the data. A common alternative is to view the problem as one of probabilistic inference by viewing E(C; I) as the negative log of a probability density: p(C | I) ∝ exp(−E(C; I)) .
(1)
Having a probabilistic interpretation allows the use of many standard inference algorithms such as stochastic optimization [1], particle filtering [2], or Markov chain Monte Carlo (MCMC) methods [3] to avoid local minima. We propose an algorithm to draw samples from p(C | I) which is, in general, a complex distribution and non-trivial to sample from. Samples are useful because not only can they help avoid local minima, they can also be used to characterize multi-modal distributions and estimation uncertainty by more fully exploring the configuration space. We will show examples of noisy images where the global maximum a posteriori estimate does not provide a satisfactory segmentation due N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 477–485, 2007. c Springer-Verlag Berlin Heidelberg 2007
478
A.C. Fan et al.
to the large amount of noise, whereas a constellation of samples can help provide greater information as to likely locations for the true segmentation. MCMC methods [4, 5] were developed for situations when one wishes to draw samples from a distribution, but it is not possible to do so directly. Instead a proposal distribution q is defined, and samples from q are accepted in such a way as to guarantee that samples from p are generated asymptotically. They have been widely used for image segmentation since Geman and Geman [6] used a MCMC approach to segment images with a Markov random field (MRF) model. The advantage of sampling curves instead of from MRFs is that curve sampling is an inherently geometric process that enables one to work explicitly in the space of shapes and encode statistical properties of shape directly into the model such as global object characteristics. Tu and Zhu [3] also propose an approach using MCMC, but their primary focus is on finding global optima using a simulated annealing approach, and they do not generate large numbers of samples from the posterior distribution. One of our key results is to show how to ensure that detailed balance holds when sampling from the space of closed curves (a necessity to ensure that we asymptotically generate true samples from the posterior) and how to adapt these sampling methods to use user input to perform conditional simulation in order to reduce the estimation variance.
2
Curve Evolution Methods
Given an image domain Ω ⊂ R2 , a scalar-valued image I : Ω → R, and a closed curve C : [0, 1] → Ω, active contour methods are formulated by specifying an energy functional E(C | I) and evolving the curve according to the gradient descent of that functional. Introducing an artificial time variable t, this results in a geometric PDE of the form ∂C ∂t (s) = f (s)N C (s) where f (s) is a force function and N C (s) is the outward normal to the curve. A classical energy functional is Euclidean curve length: E(C | I) = C ds with ds being the differential arc length along the curve. The resulting force function is f (s) = −κC (s) where κC is curvature. This flow has a smoothing effect on the curve [7] and is typically used as a regularization term. Region-based energy functionals (e.g., Chan-Vese [8] and Mumford-Shah [9]) separate regions using the image statistics and are now widely used for image segmentation due to their robustness to noise. The earliest curve evolution methods by Kass et al. [10] tracked discrete marker points on the contour. Level set methods [7] were later introduced to more naturally handle topological changes and reduce reinitialization problems. With level sets, a surface Ψ (x) is created whose zeroth level set is the curve: Ψ (C(s)) = 0 ∀ s ∈ [0, 1]. By convention, Ψ is negative inside the curve and positive outside the curve. To ensure that the zeroth level set of Ψ tracks C, ∂C ∂C we need: ∂Ψ ∂t = − ∂t · ∇Ψ . As ∂t is only defined on the zeroth level set of Ψ , the PDE is extended to the rest of Ω by a technique known as velocity extension [7].
MCMC Curve Sampling for Image Segmentation
3
479
Formulation
For MCMC methods, an ergodic Markov chain with p(C | I) as its stationary distribution is constructed [5], so simulating the chain has the probability distribution of the state asymptotically approach p(C | I) for any initial state C 0 . The chain’s transition probability T(C (t)→C (t+1) ) is the product of a proposal distribution q(Γ (t+1) | C (t) ) and an acceptance probability function a(Γ (t+1) | C (t) ). A sample from T(C (t) → C (t+1) ) is drawn by accepting a sample Γ (t+1) from q(Γ | C (t) ) with probability a(Γ (t+1) | C (t) ) otherwise C (t+1) = C (t) . The proposal distribution is chosen so as to be easy to sample from, as MCMC methods change the problem of sampling from p to one of drawing many samples from q. For discrete state spaces, a sufficient condition for p(C|I) to be the stationary distribution of the chain is detailed balance: p(C (t) |I)T(C (t) → C (t+1) ) = p(C (t+1) |I)T(C (t+1) → C (t) ) .
(2)
For continuous state spaces, a similar statement can generally be made [5]. A common acceptance rule is Metropolis-Hastings [11]. For an iterate C (t) and a candidate sample Γ (t+1) , the Metropolis-Hastings acceptance probability is defined as a(Γ (t+1) |C (t) ) = min 1, η(Γ (t+1) |C (t) ) where the Hastings ratio η is η(Γ (t+1) |C (t) ) = p(Γ (t+1) )q(C (t) | Γ (t+1) )/p(C (t) )q(Γ (t+1) | C (t) ). The algorithmic steps for a Metropolis-Hastings sampler involve: 1) sample from q(Γ (t+1) |C (t) ); 2) evaluate a(Γ (t+1) |C (t) ); 3) accept or reject Γ (t+1) . 3.1
Proposal Distribution
We implicitly define q by explicitly defining how to sample from it. To generate a candidate sample Γ (t+1) , we randomly perturb the previous iterate C (t) : Γ (t+1) (s) = C (t) (s) + f (t+1) (s)N C (t) (s)δt
(3)
where f (t+1) (s) is a random field. The problem of generating Γ (t+1) is now the problem of generating f (t+1) (s). In this work, we focus on generating f (t) (s) composed of a correlated zeromean Gaussian random process r(t) (s) and a mean function μ(t) (s): f (t) (s) = μ(t) (s) + r(t) (s). We construct r(t) (s) by circularly convolving white Gaussian noise n(t) (s) with a smoothing kernel h(s) (e.g., a Gaussian kernel). Many other choices for generating r(s) are possible such as using Fourier or wavelet bases. The mean process is chosen to increase the convergence rate of the sampling algorithm. Here, we define it as μ(t) (s) = −κC (t) (s) + γ (t) where γ (t) is a positive inflation term counteracting the curve shortening term κC . As discussed earlier, f (s) = −κC (s) is a regularizing flow which creates smooth curves, so this biases our proposal distribution to create smooth curves. 3.2
Detailed Balance
Metropolis-Hastings sampling requires that we be able to evaluate both the forward and reverse proposal distributions q(Γ (t+1) |C (t) ) and q(C (t) |Γ (t+1) ).
480
A.C. Fan et al.
This computation needs to be understood in order to ensure detailed balance and guarantee that our samples come from the posterior. For our curve perturbations, this is non-trivial because q is asymmetric due to the mean component. The perturbation defined in equation (3) is a differential in the direction of the normal, so each random perturbation maps one curve uniquely to another. This remains approximately true for small finite δt. Thus evaluating q(Γ | C) is equivalent to evaluating the probability of generating f (s). To implement (3), we generate a noise vector n of fixed length, multiply it by a circulant matrix H (which implements the circular convolution), and add a mean vector μ. This results in a Gaussian random vector f ∼ N(μ, HH T ). Note that f is deterministically generated from n so pf (Hn + μ) = pn (n) ∝ exp(− 12 nT n). To compute the probability of the reverse perturbation, we construct the analog to equation (3): C (t) (s) = Γ (t+1) (s) + g (t+1) (s)N Γ (t+1) (s)δt ,
(4)
and the reverse perturbation probability is the probability of generating g (t+1) . For small δt, a reasonable estimate is g (t+1) (s) ≈ −f (t+1) (s)/N C (t) (s)·N Γ (t+1) (s) which is obtained using locally-linear approximations to Γ (t+1) and C (t) . Note that this explicit correspondence we construct here means that even though we implement the perturbations using level sets, topological change is not valid for our chain. To allow splitting or merging of regions, a jump-diffusion process must be used [3]. Once we have computed g(s), we can write it as g = Hn +μ (obtaining μ from Γ (t+1) exactly as we obtain μ from C (t) ) and compute its probability as pg (Hn + μ ) = pn (n ) ∝ exp(− 12 nT n ). We can then use the analysis we just performed (a detail that most implementations ignore) to ensure detailed balance by computing the ratio of the forward and reverse proposal distributions (for use in the acceptance rule) as q(C (t) | Γ (t+1) ) = exp − 21 nT n − nT n (t+1) (t) q(Γ |C )
(5)
with n = H −1 (f − μ) and n = H −1 (g − μ ). 3.3
Conditional Simulation
In many application domains of interest, segmentations are currently performed by an expert. Rather than trying to remove them completely from the loop, we can create a feedback system that allows them to focus their expertise and knowledge on the most difficult portions of the problem. With an optimizationbased approach, this would require one to do constrained optimization which is hard for high-dimensional problems. With a sampling approach, we can use a technique known as conditional simulation where we fix part of the state space and sample from the distribution conditioned on the known part.
MCMC Curve Sampling for Image Segmentation
481
Let C k : [0, β] → Ω be the known part of the curve (β ∈ [0, 1]) and C u : [β, 1] → Ω be the unknown part of the curve. Then C(s) = C k (s) and C(s) = C u (s) on [0, β] and [β, 1] respectively. It is straightforward to generalize this approach for multiple fixed intervals. We wish to sample from p(C u |I, C k ) = p(C u , C k |I)/p(C k ) ∝ p(C | I). Thus, we can see that computing the conditional target distribution is unchanged (except part of C no longer changes). To ensure that our samples from our proposal distribution stay on the manifold of curves that contain C k , we need to modify our proposal distribution to impose zero variance on C k . A simple way to implement this is to multiply the random perturbations defined in Sec. 3.1 by a scalar field: r˜(s) = d(s)r(s) with d(s) = 0 for s ∈ [0, β], d(s) = 1 for s ∈ [β + , 1 − ] and > 0. From (β, β + ] and [1 − , 1), d(s) smoothly transitions from 0 to 1 so there is not a strong variance mismatch at the end points of C k . Computationally, this is equivalent to multiplying our random vector r by a ˜ ∼ N(Dμ, DHH T D). This is a degenerate diagonal matrix D resulting in r probability distribution as some entries of r have zero variance, so we should only evaluate q(· | ·) using the the perturbation ru on the unknown part of the curve. Otherwise the computation is identical to that described in Sec. 3.2.
4
Results
In this section we present results on a prostate magnetic resonance (MR) example and a thalamus MR segmentation problem. For each application, we generated 1000 samples from p(C | I). Computation time per sample ranged from 10-30 seconds for 256 × 256 images on a 2 GHz Opteron workstation. Each sample is generated independently of the others, so sample throughput can be easily increased using parallel computers. For both examples, we assume that pixels are independent and identically distributed (iid) given the curve and learn (from segmented training data) nonparametric histogram distributions p(I(x)|0) and p(I(x)|1) for the intensity distribution outside and inside the curve respectively (shown in in Figs. 1(a) and 2(a)). Using the Heaviside (or indicator) function H and a curve length prior, this results in an overall posterior probability of: ds) p (I(x) | H(−ΨC (x))) . (6) p(C | I) = exp(−α C
x
To display the samples, we will use three main visualization approaches that are only possible because we are able to draw a large number of statisticallycorrect samples from the posterior distribution: 1. Displaying the highest probability samples (e.g., Fig. 1(b)). The most likely samples can be viewed as proxies for what a global optimizer would find. 2. Histogram images (e.g., Fig. 1(c)). For each x we count the number of C i for which x is inside the curve (i.e., ΨC i (x) < 0). This is thus the marginal distribution over segmentation labels at each x.
482
A.C. Fan et al.
true #1 #2
−3
2
x 10
inside outside
1.8
true 10/90% 50%
1.6
P[intensities | label]
1.4 1.2 1 0.8 0.6 0.4 0.2 0 −1000
−500
0
500 intensities
1000
1500
2000
(a)
(b)
(c) true #1 #2
(d)
(e)
true 10/90% 50%
(f)
Fig. 1. Prostate segmentation using non-parametric intensity distributions. (a) Pixel intensities for each class. (d) Initial curve. (b) Two most likely samples (very different from the correct curve). (c) Marginal confidence bounds and histogram image. (e)-(f) Most likely samples and marginal bounds for prostate-only cluster.
3. Marginal confidence bounds (e.g., Fig. 1(c)). Given a histogram image, we plot the level contours. These can be viewed as confidence bounds (e.g., the 10% confidence bound is the contour outside of which all pixels were inside fewer than 10% of the samples). The 50% confidence bound can be viewed as being analogous to a median contour. Confidence bounds have been used in some previous image segmentation or reconstruction approaches [12], but those dealt with parametric shape representations (so the uncertainty was over a finite set of parameters). It is important to note that our confidence representations are marginal statistics from infinitedimensional non-parametric shape distributions. 4.1
Prostate Segmentation
In Fig. 1, we show results from a noisy T1-weighted prostate MR image. The histogram image and the marginal confidence bounds in Fig. 1(c) show this distribution has three primary modes: one around the correct prostate segmentation (the red contour); one containing only the rectum (the dark region beneath the prostate); and one encompassing both the prostate and the rectum. As can be seen in Fig. 1(b), the most likely mode contains the curves that segment the two regions together, and this is what a gradient-based curve evolution implementation of (6) also finds. The reason for this can be seen in the image intensity likelihoods in Fig. 1(a). Due to the noise and our simple iid model, the model prefers all pixels with intensity below some threshold (including the rectum) to
MCMC Curve Sampling for Image Segmentation
0.08
expert init1 init2
outside inside
0.07
483
0.06
0.05
0.04
0.03
0.02
0.01
0
2
4
6
8
10
12
14
16
18
20
(a)
22
(b) expert #1 #2
(e)
(c) expert 10/90% 50%
(f)
(d) expert #1 #2
(g)
expert 10/90% 50%
(h)
Fig. 2. Conditionally-simulated thalamus segmentation using non-parametric intensity distributions. (a) Pixel intensities for each class. (b)-(c) Observed image (original and zoomed). (d) Expert and initial curves. Two most likely samples ((e) and (g)) and marginal confidence bounds and histogram image ((f) and (h)) with a point on the top fixed and points on the top and the bottom fixed respectively.
be inside the curve. The sampling process enables us to see the multiple possible solutions. Without having any additional a priori information, it would be difficult to say which of these three scenarios is the correct one. In fact, it is possible in some applications where multiple modes all provide reasonable explanations of the data. One approach we can take here is to utilize the information our sampling procedure provides to us. While the aggregate marginal statistics do not appear to be providing very useful information (though the 90% confidence boundary is located within the true prostate boundary), it is easy to create three clusters of samples. An expert user or a shape-driven classifier could then pick the correct cluster. We show the most-likely samples and the marginal confidence boundaries for the prostate-only cluster in Fig. 1(e) and (f). 4.2
Thalamus Segmentation
Segmenting sub-cortical structures is a challenging task due to the low amount of contrast between tissue types. One approach to reduce the ill-posedness involves using strong prior shape models [13]. As there is too little contrast for an unconstrained approach to succeed here, we apply the conditional simulation version of our approach and specify small portions of the curve a priori (indicated with the magenta line segments in Fig. 2).
484
A.C. Fan et al.
We begin by fixing a small portion of the top of each half of the thalamus and generating samples conditioned on that information. Two separate level sets are evolved for each half of the thalamus. The two most likely samples in Fig. 2(e) correctly segment most of the thalamus except the bottom which is least constrained by the fixed portion at the top. Note, though, that the marginal confidence bounds in Fig. 2(f) show that the expert contour location is mostly bracketed between the 90% and 10% confidence contours, and the median contour is quite close to the expert-segmented boundary location. Note that the sampling method actually provides information about where the greatest uncertainty is and, thus, where expert assistance is most needed. We can see in Fig. 2(f) that there is a more diffuse histogram image (and a larger gap between the confidence bounds) at the bottom of the thalamus indicating a greater amount of sample variability. In Fig. 2(g)-(h), we take the knowledge gained from the first experiment and interactively revise the information provided to the sampler by specifying a location on the bottom of the thalamus as well. With this additional information, the most likely samples are now both reasonable, and the estimation variance is greatly reduced.
5
Conclusion
In this paper, we presented an approach to generate samples from probability distributions defined on spaces of curves by constructing a MCMC algorithm and showing how to properly compute the proposal distribution probabilities to ensure detailed balance and asymptotic convergence to a desired posterior distribution. The sampling approach provided robustness to local minima in low-SNR and ill-posed problems, and we showed how a large number of curve samples can be used to provide useful aggregate statistics (such as non-parametric marginal confidence bounds) about the likely location of the true curve locations. We demonstrated the usefulness of this aggregate information even when the most likely curves were not providing satisfactory segmentations, and we showed how constraints can be easily imposed on the samples (unlike a gradient-based optimization method) to provide a semi-automatic segmentation approach. Future work in this space involves developing faster sampling algorithms by utilizing better proposal distributions or multiresolution methods; extending the framework to non-closed curves, unknown topology, and volumetric segmentation; and creating uncertainty measures that provide information about the local characteristics of the shape manifold. Acknowledgments. We wish to thank Clare Tempany and Martha Shenton for their assistance in acquiring image data. Our research was primarily supported by a grant from Shell International Exploration and Production, Inc. with additional support from NSF grant EEC9731748 and NIH grants U41-RR019703, P41-RR13218, R01-CA109246, R01-CA111288, K02-MH01110, R01-MH50747, and U54-EB005149.
MCMC Curve Sampling for Image Segmentation
485
References 1. Juan, O., Keriven, R., Postelnicu, G.: Stochastic motion and the level set method in computer vision: Stochastic active contours. Intl. J. Comp. Vis. 69(1) (2006) 2. de Bruijne, M., Nielsen, M.: Shape particle filtering for image segmentation. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 168–175. Springer, Heidelberg (2004) 3. Tu, Z., Zhu, S.C.: Image segmentation by data-driven Markov chain Monte Carlo. IEEE Trans. Patt. Anal. Mach. Intell. 24(5), 657–673 (2002) 4. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equations of state calculations by fast computing machines. J. Chem. Phys. 21(6) (1953) 5. Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Univ. of Toronto (1993) 6. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE PAMI 6, 721–741 (1984) 7. Sethian, J.: Level Set Methods and Fast Marching Methods. Cambridge University Press, Cambridge (1999) 8. Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Imag. Proc. 10, 266–277 (2001) 9. Tsai, A., Yezzi, A., Willsky, A.: Curve evolution implementation of the MumfordShah functional. IEEE Trans. Imag. Proc. 10(8), 1169–1186 (2001) 10. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Intl. J. Comp. Vis. 1(4), 321–331 (1988) 11. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970) 12. Ye, J.C., Bresler, Y., Moulin, P.: Asymptotic global confidence regions in parametric shape estimation problems. IEEE Trans. Inf. Theory 46(5), 1881–1895 (2000) 13. Pohl, K.M., Fisher, J., Grimson, W.E.L., Kikinis, R., Wells, W.: A Bayesian model for joint segmentation and registration. Neuroimage 31, 228–239 (2006)
Automatic Centerline Extraction of Irregular Tubular Structures Using Probability Volumes from Multiphoton Imaging A. Santamar´ıa-Pang1, C.M. Colbert2 , P. Saggau3 , and I.A. Kakadiaris1 1
Computational Biomedicine Lab, Dept. of CS, Univ. of Houston, Houston, TX 2 Dept. of Biology and Biochemistry, Univ. of Houston, Houston, TX 3 Dept. of Neuroscience, Baylor College of Medicine, Houston, TX
Abstract. In this paper, we present a general framework for extracting 3D centerlines from volumetric datasets. Unlike the majority of previous approaches, we do not require a prior segmentation of the volume nor we do assume any particular tubular shape. Centerline extraction is performed using a morphology-guided level set model. Our approach consists of: i) learning the structural patterns of a tubular-like object, and ii) estimating the centerline of a tubular object as the path with minimal cost with respect to outward flux in gray level images. Such shortest path is found by solving the Eikonal equation. We compare the performance of our method with existing approaches in synthetic, CT, and multiphoton 3D images, obtaining substantial improvements, especially in the case of irregular tubular objects.
1
Introduction
A central goal of modern neuroscience is to elucidate the computational principles and cellular mechanisms that underlie brain function in both normal and diseased states. For over a century it has been appreciated that neuronal morphologies are highly variable (presenting tubular-like shapes), and today it is clear that morphology contributes significantly to the unique computations performed by different classes of neurons. To achieve realistic modeling of neuronal morphology, one must use centerline extraction from imaging data of dendrites—highly irregular tubular structures (Fig. 1). The challenges in centerline extraction of dendritic structures include: i) a poor signal-to-noise ratio, ii) the objects of interest are at the limit of optical imaging (resolution is typically on the order of 0.2 μm), iii) a non-homogeneous distribution of optical intensity throughout the cell, and, most importantly, iv) there is an extreme variation in shape among dendrites. Different methods have been proposed to extract the medial axis of tubular structures by using the distance transform to identify the centerline. In Deschamps et al. [1] centerline extraction of segmented tubular objects is accomplished by evolving monotonic fronts, where the cost function is a distance function from the edges of the binary object of interest. Similarly, Hossouna et al. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 486–494, 2007. c Springer-Verlag Berlin Heidelberg 2007
Automatic Centerline Extraction of Irregular Tubular Structures
487
Fig. 1. Neuron morphology
[2] proposed centerline extraction from monotonic front evolution, where the centerline follows the maximal distance from the boundary of the binary object. In recent work, Bouix et al. [3] used the average outward flux through a Jordan curve. In this case, the gradient vector field of the Euclidean distance to the boundary was exploited to approximate the centerlines. However, these methods require binary images to make the result dependent on the quality of the segmentation. In work related to vessel segmentation, Vasilevskiy and Siddiqi [4], and Nain et al. [5] developed a geometric flow framework to perform 2D and 3D vessel segmentation using prior knowledge of the vessel shape. Kimmel [6] offers a review of different geometric measures in variational methods. Complementary, elegant statistical methods for vessel segmentation have been proposed by Florin et al. [7], where shape variations of the vessel are learned on-line during the segmentation. Similarly, in [8] we presented a method for detection of tubular structures by learning different tubular shapes. Such methods allow segmentation or detection of the overall tubular model as opposed to enhancing the centerline of the tubular object. In this paper, we propose a robust centerline extraction method of tubularlike structures with considerable cross sectional and radius variation. Our main contributions are: i) a novel approach for learning and predicting generalized 3D centerline models, ii) a centerline extraction method using a shortest path formulation in terms of regions of maxima outer flux, and iii) centerline extraction is accomplished without prior segmentation.
2
Materials and Methods
Experimental Data: Multiphoton data correspond to cells of interest from mouse cortical CA1 pyramidal neuron cells (Fig. 4(k)) from rat hippocampi. Images were acquired with a customized multiphoton Galvo microscope and loaded with Alexa Fluor 594 dye. We have collected twelve image datasets consisting of seven or more partially overlapping stacks with approximately size of 640 x 480 x 150 each, with voxel size 0.3 μm in the x-y axis and 1.0 μm in the z axis. Excitation wavelength was set to 810 nm while the lens and index of refraction both correspond to water.
488
A. Santamar´ıa-Pang et al.
(a)
(b)
(c)
Fig. 2. (a) Visualization of a three-dimensional tubular model, (b) distribution of the normalized eigenvalues λ1 ,λ2 (notice the overlap region between the classes), and (c) estimated sigmoid function for different values of σ
Algorithm Overview: Dendrite centerline extraction is accomplished by the following steps: 1) frames-based denoising, 2) learning generalized tubular models, 3) probabilistic wave propagation, and 4) centerline extraction. Step 1 - Frames-based Denoising: We constructed a non-separable 3D Parseval frame to remove noise that follows a Poisson distribution from optical images. Our method uses 3D multidirectional filters, which are used both for analysis and synthesis. Noise is suppressed by shrinking frame coefficients to zero according to an ensemble operator [9]. Step 2 - Learning Generalized Tubular Models: Dendritic neurons do not follow complete cylindrical or elliptical shape patterns as vessels; instead they present highly irregular tubular-like patterns in addition to adjoining structures (Fig. 1). Different types of tubular measures have been proposed in the literature (Frangi et al. [10], Sato et al. [11]) by assuming ideal cylindrical or elliptical geometrical shapes. These measures discriminate among different structural features such as: plates, lines, and blob-like structures. However, they cannot be applied to irregular tubular structures since the structural information they contain does not fulfill the hypothesis of the assumed model. We hypothesize that regular and irregular cylindrical shape models lead to different probability density functions that can encode tubular shape descriptors. Then, we reformulate the problem of detecting tubular structures to learning a structural tubular model from the object of interest itself (as opposed to defining an ideal tubular measure). Let us consider the volumetric representation of a 3D tubular structure (Fig. 2(a)) and use the centerline to select the eigenvalues which correspond to the centerline itself. The selected eigenvalues capture shape information along the tubular model, which includes branching points and branches with different diameters at a given scale σ [10]. Then, we want to define a function which takes high values in the centerline of the object and low values in the boundary. Figure 2(b) depicts the class distribution of the normalized eigenvalues. Note that there is high overlap of the two classes. Thus, in order to estimate the class
Automatic Centerline Extraction of Irregular Tubular Structures
489
distributions, an algorithm with flexible decision limits and still able to generalize well (due to the sparseness of the data) is required. Support Vector Machines (SVMs) estimate a decision function f (x): l f (x) = yi αi K(xi , x) + b, (1) i=1
where x ∈ R , xi ∈ R , i = 1, ..., l are the support vectors, K is a kernel function, l yi ∈ {−1, 1}, with αi > 0, such that i=1 yi αi = 1, and b is a learned constant. For classification problems, class prediction is performed by finding a threshold value for the function f and by assigning a penalty value. Instead of using SVMs for classification, we use them to robustly estimate a posterior probability density function using the approached proposed by Platt [12]. The posterior probability P (y = 1|f ) is approximated by fitting a parametric sigmoid function as: 1 , (2) P (y = 1|f ) = 1 + e(Af (x)+B) n
n
where the parameters A and B are computed by defining a new training set (fi , ti ), with ti = (yi2+1) and using a maximum likelihood estimation: min Z=(A,B)
F (Z) = −
l
[ti log(pi ) + (1 − ti )log(1 − pi )],
(3)
i=1
1 and fi = f (xi ). Figure 2(c) depicts the estimated sigmoid where pi = 1+e(Af i +B) function for different values of σ.
Step 3 - Probabilistic Wave Propagation: Given two points p0 , p ∈ R3 , the optimal path between p0 and p with respect to a cost function G(x), is defined as the path c(s) = {x(s), y(s), z(s)} that minimizes the function T (p) defined as: p
T (p) = min c
G(c(s))ds.
(4)
p0
where c(0) = p0 , c(L) = p, L is the total Euclidian arc-length, and ds2 = dx2 + dy 2 + dz 2 . The level set C(t) = {(x, y, z) : T (x, y, z) = t} is a strictly monotonic front and it is the set of points that can be reached from p0 with minimum cost at time t. Assume that C evolves according to: 1 ∂C(x) = N(x) , ∂t G(x)
(5)
where N(x) is a normal vector to the surface C at the point x. Then T (C(p, t)) = t implies the well known Eikonal equation [1] : U (x) ∇T (x) = 1,
with T (p0 ) = 0
where U > 0 and U (x) = 1/G(x) is the speed of the front propagation.
(6)
490
A. Santamar´ıa-Pang et al.
(a)
(b)
Fig. 3. (a) Propagation of the front with maximal curvature at the centerline of the dendrite segment. (b) Detection of the terminating points of dendrites.
If we relate Eq. 5 with the inward flux of a given vector field V defined in R3 , as it was proved by Vasilevskiy and Siddiqi [4], the direction of the inward flux of V through the surface C is increasing most rapidly according to: ∂C(x) = div(V(x))N(x) . The divergence of V can be approximated as: ∂t ∂Vy ∂Vz ∂Vx 1 + + , (7) V · n dS = div(V) ≡ lim ΔV →0 ΔV ∂x ∂y ∂z S where n is the outward unit vector, normal to the surface S surrounding a volume element ΔV . From Eq. 7, it can be deduced that the divergence of V at a point x is the flux of V per unit of volume. If div(V(x)) > 0, we say that x is a source point and the outward flux is positive, and if div(V(x)) < 0, we say that x is a sink point and the outward flux is negative [4]. Next, we propose to define an energy function U such that the embedded level set C evolves with higher curvature at the center of a tubular structure, guided by: i) a tubular morphological operator, and ii) the outward flux in gray level images governed by the following partial differential equation: ∇I(x) (8) g(P (x)) div − ∇T (x) = 1, with T (p0 ) = 0 ∇I(x) which is a case of the Eikonal Eq. 6. The term g(P (x)) is a probabilistic morphological operator composed of a non-negative function g, and the posterior probability density function P (y = 1|f (x)) from a parametric sigmoid function (Eq. 2); it is used for enhancement of the centerline of tubular-like objects. It favors maximum propagation of C at the centerlines, preventing the front from propagating
∇I(x) is the outside the tubular object (Fig. 3(a)). The second term div − ∇I(x) mean 3D curvature k of the front C [4], and it favors fast front propagations orthogonal to the gradient ∇I(x). Step 4 - Centerline Extraction: For the specific application presented in this paper, morphological reconstruction is performed in four sub-steps. In the first sub-step, the soma center point p0 is automatically detected and a front with low curvature from p This is always possible since in Eq. 8 we can 0 is propagated.
∇I(x) set the term div − ∇I(x) = 1, and g can be a constant function in regions with greater or equal to a given posterior probability value. The surface evolution
Automatic Centerline Extraction of Irregular Tubular Structures
491
produces a distance map that captures the distance from p0 to every voxel. In the second sub-step, terminating voxels are found by creating a number of connected components and finding those components with maximum distance from p0 . In the third sub-step, a wave starting from the initial voxel p0 is initiated according to Eq. 8. Finally, centerlines are extracted by marching along the gradient from the terminating voxels to the initial voxel p0 (note that convergence is always guaranteed since the global minimum corresponds to p0 ). Figure 3(b) depicts the terminating voxels (marked in yellow) and the discretization of the initial distance map (overlapping areas marked in white).
3
Results
We have applied our method to both synthetic and real data. We created synthetic data to: i) learn a generic tubular shape model, and ii) detect tubular structures in unseen examples from synthetic and CT data. The model to be learned is depicted in Fig. 2(a). Its morphological properties include: i) variation of intensity, ii) radius variation from 0.5 to 1.5 μm, iii) variety of branching sections, and iv) high and low curvature segments. Voxel size was isotropic and it was set to 1.0 μm. The real data are described in Section 2. Training: In both synthetic and real data, parameter selection was performed with a grid search using three-fold cross-validation with different kernels, penalty, and sigma values. With respect to synthetic data, the best performance was obtained using the penalty value equal to 10, a linear kernel, and sigma equal to 0.5 μm. Optimal SVMs parameters were found to be: b = 4.19 (Eq. 1) with a number of support vectors equal to 638, while A = −1.7955 and B = −0.0539 (Fig. 2(c)). Concerning multiphoton data, for which the training data consisted of dendrite segments depicted in Fig. 4(g), the best performance was found using a linear kernel and a penalty value equal to 50. The estimated SVMs parameters were: A = −1.94, B = −0.222 (Eq. 3), b = 8.269 (Eq. 1), and σ = 1.5 μm. Synthetic Data: Figure 4(a) depicts an unseen example to detect the centerline. Note that the radius decreases gradually from the bottom (1.5 μm) to the top (0.5 μm). The centerline is overlaid in white color. Figure 4(b) depicts the predicted centerline with the estimated model in (Fig. 2(a)), while Figure 4(c) depicts the centerline according to Sato’s measure [11] (σ = 0.5 μm, α = 1, β = 1, and γ = 1). Note the difference of these two models, especially at the bottom of the structure. Figure 4(d) depicts the extracted centerline (white line) and the ground truth centerline (brown line). Figures 4(e)(f) illustrate centerline detection from two different CT volumes using the information learned from the synthetic tubular model in Fig. 2(a). We have created 20 synthetic phantoms with a variety of tubular structures and computed the distance from the extracted centerlines to the ground truth. The maximum and average distances from the extracted centerlines using our method were 0.87 μm and 0.57 μm, while the maximum and average distances from the extracted centerlines using Sato’s method were 0.94 μm and 0.63 μm.
492
A. Santamar´ıa-Pang et al.
(a)
(b)
(e)
(c)
(f)
(h)
(d)
(g)
(i)
(j)
(k) Fig. 4. (a) Visualization of synthetic data with estimated centerline, (b) predicted centerline from tubular model (Fig. 2(a)), (c) predicted centerline from Sato’s measure, and (d) extracted centerline (white) and ground true centerline (brown). (e),(f) Tubular structures detected in a CT angiography dataset using the model depicted in Fig. 2(a). (g) Dendrite segments used for training, (h) segment of a given dendrite, (i) after Sato’s measure, (j) after our measure. (k) Top: frames-based denoised multiphoton imaging volume, (k) middle: extracted centerlines with our method (white), and (k) bottom: manually extracted centerlines (magenta).
Automatic Centerline Extraction of Irregular Tubular Structures
493
Multiphoton Microscopy: Figures 4(h)-(j) depict a dendrite segment, centerline enhancement using Sato’s measure (parameters: α = 1, β = 1, and γ = 0.5) and our measure respectively. We have performed centerline extraction in all the datasets. In Fig. 4(k) the top subfigure depicts a typical cell with dimensions of 2546 x 912 x 121 voxels, the middle subfigure depicts the extracted centerline (white) with our method, and the bottom subfigure depicts the centerline (magenta) manually traced by an expert. Details about the volume registration and soma detection can be found in [13]. When expressing the morphology of a neuron a tree structure, the number of segments that we were able to detect automatically was 149, compared with 136, 150, and 161 that three human experts were able to trace.
4
Conclusion
We have presented a general framework for centerline extraction of tubularlike structures without prior segmentation. Our novel approach consists of: i) learning the structural patterns of a tubular-like structure as opposed to defining a tubularity measure, and ii) combining structural information to be used by a morphology-guided level set. Our method is general since it does not assume any particular tubular shape. Acknowledgements. We would like to thank B. Losavio and Y. Liang for their valuable assistance. This work was supported in part by NIH 1R01AG027577, NSF IIS-0431144, and NSF IIS-0638875. Any opinions, findings, conclusions or recommendations expressed in this material are the authors’ and may not reflect the views of the NIH or NSF.
References 1. Deschamps, T., Cohen, L.D.: Fast extraction of tubular and tree 3D surfaces with front propagation methods. In: Proc. International Conference on Pattern Recognition, Quebec, Canada, vol. 1, pp. 731–734 (2002) 2. Hassouna, M.S., Farag, A.A., Falk, R.: Differential fly-throughs (DFT): A general framework for computing flight paths. In: Proc. Medical Image Computing and Computer Assisted Intervention, Palm Springs, USA, vol. 1, pp. 654–661 (2005) 3. Bouix, S., Siddiqi, K., Tannenbaum, A.: Flux driven automatic centerline extraction. Medical Image Analysis 9(3), 209–221 (2005) 4. Vasilevskiy, A., Siddiqi, K.: Flux maximizing geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1565–1578 (2002) 5. Nain, D., Yezzi, A.J., Turk, G.: Vessel segmentation using a shape driven flow. In: Proc. Medical Image Computing and Computer Assisted Intervention, Saint-Malo, France, vol. 1, pp. 51–59 (2004) 6. Kimmel, R.: Fast Edge Integration. In: Level Set Methods and their Applications in Computer Vision, Springer, NY (2003)
494
A. Santamar´ıa-Pang et al.
7. Florin, C., Paragios, N., Williams, J.: Globally optimal active contours, sequential monte carlo and on-line learning for vessel segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 476–489. Springer, Heidelberg (2006) 8. Santamar´ıa-Pang, A., Bˆıldea, T.S., Colbert, C.M., Saggau, P., Kakadiaris, I.A.: Towards segmentation of irregular tubular structures in 3D confocal microscope images. In: Proc. MICCAI International Workshop in Microscopic Image Analysis and Applications in Biology, Copenhangen, Denmark, pp. 78–85 (2006) 9. Santamar´ıa-Pang, A., Bˆıldea, T.S., Konstantinidis, I., Kakadiaris, I.A.: Adaptive frames-based denoising of confocal microscopy data. In: Proc. Conf. on Acoustics, Speech, and Signal Processing, Toulouse, France, vol. 2, pp. 85–88 (2006) 10. Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: Proc. Medical Image Computing and Computer Assisted Intervention, Cambridge, USA, vol. 1496, pp. 130–137 (1988) 11. Sato, Y., Nakajima, S., Atsumi, H., Koller, T., Gerig, G., Yoshida, S., Kikinis, R.: 3D multi-scale line filter for segmentation and visualization of curvilinear structures in medical images. Medical Image Analysis 2(2), 143–168 (1998) 12. Platt, J.: Probabilistic outputs for support vector machines and comparison to regularize likelihood methods. In: Adv. in Large Margin Classifiers, pp. 61–74 (2000) 13. Urban, S., O’Malley, S.M., Walsh, B., Santamar´ıa-Pang, A., Saggau, P., Colbert, C.M., Kakadiaris, I.A.: Automatic reconstruction of dendrite morphology from optical section stacks. In: Proc. ECCV International Workshop on Computer Vision Approaches to Medical Image Analysis, Graz, Austria, pp. 190–201 (2006)
Γ -Convergence Approximation to Piecewise Smooth Medical Image Segmentation Jungha An1,2 , Mikael Rousson2 , and Chenyang Xu2 1
2
Institute for Mathematics and its Applications (IMA), University of Minnesota, Minneapolis, MN, USA Imaging and Visualization Department, Siemens Corporate Research, Princeton, NJ, USA
Abstract. Despite many research efforts, accurate extraction of structures of interest still remains a difficult issue in many medical imaging applications. This is particularly the case for magnetic resonance (MR) images where image quality depends highly on the acquisition protocol. In this paper, we propose a variational region based algorithm that is able to deal with spatial perturbations of the image intensity directly. Image segmentation is obtained by using a Γ -Convergence approximation for a multi-scale piecewise smooth model. This model overcomes the limitations of global region models while avoiding the high sensitivity of local approaches. The proposed model is implemented efficiently using recursive Gaussian convolutions. Numerical experiments on 2-dimensional human liver MR images show that our model compares favorably to existing methods.
1 Introduction Extracting structures of interest through image segmentation is an important task in medical imaging. Image segmentation is especially needed for better visualization, quantification of diseases, and planning an intervention. This task of segmenting a given region from the rest of the image usually relies on image information that can be edge-based, region-based, or a combination of both. MR imaging is a modality where these criteria are not sufficient. In MR imaging, various scanning parameters are used to highlight different living tissues. As a result, image characteristics can vary significantly from one acquisition to another, particularly due to inhomogeneities in the radio frequency field. If one wants to define a generic segmentation problem, robustness to these intensity variations, often referred to as bias field, is mandatory. One may want to correct for these artifacts prior to the segmentation [14], but both problems are intrinsically dependent on each other. In [15], an Expectation-Maximization technique was proposed to estimate the bias field jointly with the segmentation. In this paper, a more straightforward approach is proposed by considering a segmentation model that is naturally robust to smooth spatial variations of the intensity. Most of recent geometric segmentation methods use region statistics to model intensity distributions of the objects and the background. These global models have shown to be more robust to initialization and noise than local or edge-based approaches, but the assumption of a global intensity distribution is not relevant in most cases, particularly with MR images like the one shown in Figure 1. Interestingly, these models are modifications of the seminal work of Mumford and Shah (MS) [9] that originally did not make such an assumption of regional distributions, but simply intended N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 495–502, 2007. c Springer-Verlag Berlin Heidelberg 2007
496
J. An, M. Rousson, and C. Xu
MR image
zoom 1
zoom 2 (rotated by 180◦ )
Fig. 1. High bias in MR images - We zoom in two different parts of the image shown on the left. The second zoom is rotated by 180 degrees, so that we can see that the intensity level outside can be lower than the intensity inside. Our perceptual vision may tell us the contrary.
to recover a piecewise smooth approximation of the image. For this purpose, the image domain Ω is decomposed in a set of regions separated by a smooth boundary Γ. Then, minimizing the MS functional requires the joint estimation of the boundary Γ and the “ideal” smooth image. However, such loose assumptions on the objects to recover make this approach very sensitive to the initialization of Γ and computationally expensive. There have been several approaches developed to approximate the MS model numerically. Chan and Vese first proposed a level set formulation [10] of the simplified piecewise constant MS model in [4], before extending it to the more general piecewise smooth model [5,7]. The first model has the advantage of being relatively robust and has a low computational complexity, but it is also too simplistic for most applications. The second one is an elegant numerical implementation of the general MS model, but still depends highly on the initial conditions and has a high computational complexity. As an alternative, Ambrosio and Tortorelli [1] approximate the measurement of the length term in the MS model by a quadratic integral of an edge signature function. In [5,7], the segmentation is represented by characteristic functions using phase field models [3,8,12,13]. Finally, in [2,12], the piecewise constant MS model is reformulated using a Γ -Convergence approximation, which motivates our proposed model in this paper. A new region based variational formulated model for a piecewise smooth image segmentation is proposed. Image segmentation is obtained using a Γ -Convergence approximation and multi-scale local statistics. The proposed model is motivated by [2,4,5,11,12]. Our model differs from commonly used ones, like [2,4,5,12] by relying on local intensity averages rather than global statistics. Our formulation is closely related to [6,11]. The improvements brought by our model are twofolds: we introduce a Γ -Convergence approximation and we integrate multiple scales for local intensity models. We also present a validation on 2-dimensional slices of human liver MR images. This paper is organized as follows: In Section 2, both the single and multi-scale models are proposed. The Euler-Lagrange equations of the suggested model are also presented in this section. Experimental results of the proposed model with comparison to existing models are shown in Section 3. Finally, in Section 4, conclusions are drawn and future work is stated.
Γ -Convergence Approximation to Piecewise Smooth Medical Image Segmentation
497
2 Description of the Proposed Model In this section, we introduce a Γ -Convergence approximation motivated by [2,12] for the piecewise-smooth model proposed in [11]. For clarity, a single scale model is derived first and then the general model with multi-scale is presented. 2.1 Single Scale Model Image Segmentation is obtained using a Γ -Convergence approximation and single scale local statistics. Two phases are assumed for the simplicity of our model. The model aims at finding the phase field θ by minimizing the following energy: E(θ) =λ f (θ)(I − uin (θ))2 + (1 − f (θ))(I − uout (θ))2 dx Ω (1) f (θ)2 (1 − f (θ))2 2 ε1 |∇f (θ)| + dx, + (1 − λ) ε1 Ω where I is a given image, Ω ∈ R3 is its domain, f is a smooth version of the Heaviside function, ε1 is a positive parameter, and 0 < λ < 1 is a parameter balancing the influence of the two terms in the model. Following [11], uin and uout are expressed as local weighted intensity averages that can be obtained by Gaussian convolutions: uin (θ) =
gσ ∗ [f (θ)I] gσ ∗ [(1 − f (θ))I] and uout (θ) = , gσ ∗ f (θ) gσ ∗ (1 − f (θ))
where gσ is a Gaussian kernel with standard deviation σ, and “∗” stands for the convolution in the image domain Ω. This model is an efficient approximation of the general piecewise smooth MS model. One can point out that it becomes equivalent to the piecewise constant model [2,4], when the variance σ goes to infinity. A piecewise smooth approximation of the image can accommodate a wider range of problems than its piecewise constant counterpart. In particular, it is well-suited for image modalities with bias, as it is often the case in MRI. In the second term of Equation (1), ε1 1 controls the transition bandwidth. The Γ Convergence is used to approximate the length term in the MS model. In the theory of Γ -Convergence [1], the length of Γ is approximated by a quadratic integral of an edge signature function p. This model is combined with a double-well potential function W (p) = p2 (1 − p)2 with p ∈ H 1 (Ω). As ε1 → 0, the first term penalizes unnecessary interfaces and the second term forces the stable solution to take values of 1 or 0. The second term in our model is followed from [2,12,13]. For details on phase field models and double-well potential functions, please refer to [2,12,13]. 2.2 Multi-scale Model The model considered so far is based on local intensity averages (uin , uout ). The locality of these terms is determined by the standard deviation σ of the Gaussian kernel gσ , which has been supposed to be the same for all pixels. This may be a limitation, since
498
J. An, M. Rousson, and C. Xu
Fig. 2. Comparison between 3 models - From left to right : Chan and Vese model, An and Chen model, and our model with a single scale (σ = 15). In this figure, as well as in the next ones, the initialization is shown in dark (light green) and the final segmentation in bright (yellow).
σ should be big enough to attract the contour to edges and small enough to detect weak edges. To resolve this problem, σ should be defined pixel-wise. However, this would highly increase the complexity of the algorithm. An intermediate step is to consider only two different scales σ1 and σ2 , and combine them at each pixel with different weights. Let σ1 < σ2 . We want to use the region term with σ1 where the image gradient is high, and with σ2 where it is low. This leads us to the following modification of the first term of Equation (1):
Eregion (θ) = λ f (θ) g(|∇I|)(I − uσ1 ,in (θ))2 + (1 − g(|∇I|))(I − uσ2 ,in (θ))2 dx Ω + λ (1 − f (θ)) g(|∇I|)(I − uσ1 ,out (θ))2 + (1 − g(|∇I|))(I − uσ2 ,out (θ))2 dx, Ω
(2) where g(|∇I|) acts as an edge detector, |∇I| stands for the magnitude of the gradient of a smoothed version of the image that is normalized between 0 and 1, and the function g is an increasing function from [0, 1] to [0, 1]. This permits us to assign the lower sigma to low gradient edges that enables the model to capture weak boundaries. 2.3 Energy Minimization To obtain the best segmentation according to our piecewise smooth model, we need to minimize the corresponding energy with respect to the phase field θ. For this purpose, we need to derive the Euler-Lagrange equations with respect to θ. For clarity, we present the gradient descent obtained for the single scale energy (1): ∂θ = − λf (θ) (I − uin (θ))2 − qin − (I − uout (θ))2 + qout ∂t + (1 − λ) 2ε1 div(∇θ (f (θ))2 ) − 2|∇θ|2 f (θ)f (θ) f (θ)(1 − f (θ))(1 − 2f (θ))f (θ) − (1 − λ) , in Ω ε1 ∂θ = 0, on ∂Ω, ∂n
Γ -Convergence Approximation to Piecewise Smooth Medical Image Segmentation
σ=4
σ = 12
499
(σ1 = 4, σ2 = 12)
Fig. 3. Single scale versus multi-scale - When a single scale is considered for the computation of local averages, the contour either gets stuck (low σ) or leaks (high σ). Using the multi-scale formulation that combines a low σ for high gradient areas with a high σ for homogeneous regions, the contour does not get stuck in homogeneous regions and still correctly segments weakly defined edges.
where the terms qin and qout have the expressions: ⎧ 2f (θ)(I − uin (θ)) 2f (θ)(I − uin θ))uin (θ) ⎪ ⎪ q = I g ∗ − gσ ∗ σ ⎨ in gσ ∗ f (θ) gσ ∗ f (θ) ⎪ 2(1 − f (θ))(I − uout (θ)) 2(1 − f (θ))(I − uout θ))uout (θ) ⎪ ⎩ qout = I gσ ∗ . − gσ ∗ gσ ∗ (1 − f (θ)) gσ ∗ (1 − f (θ)) The gradient descent of the multi-scale model can be easily obtained by analogy.
3 Numerical Results In this part, we show different numerical results on 2-dimensional human liver MR images. Each experiment consists of contouring the liver boundary in axial slices. Several images include tumors inside the liver, which will be considered as background. These experiments include the minimization of the single scale and multi-scale criterion. They are minimized by finding a steady-state solution of the corresponding evolution equations. A finite difference scheme is applied for discretization. f (θ) = 1 2 θ 2 {1 + π arctan( ε )} is used in the numerical calculation and g was empirically chosen as g(v) = αv for 0 ≤ v < α1 , and 0 otherwise. Parameters (λ, ε, ε1 , α) are set to (0.5, 0.01, 0.01, 0.3) empirically for all the experiments. 3.1 Comparisons and Experiments In Figure 2, we compare the proposed model with existing methods [2,4]. The piecewise constant models use global image information and cannot discriminate the average intensity of the liver with neighboring structures. This is mainly due to the black region around the body that corrupts the mean value of uout . Using local averages makes the approach robust to any disturbing factors far from the object of interest. In the second experiment, shown in Figure 3, we show the influence of the parameter σ in the single scale version of our model. With too low values of σ, the contour gets stuck in homogeneous regions, because local intensity averages inside and outside the
500
J. An, M. Rousson, and C. Xu
Fig. 4. Result on an MR image with high synthetic bias - From left to right: modified image, final segmentation, and a zoom on this segmentation (σ1 = 4 and σ2 = 8). Our multi-scale approach is able to deal quite easily with this artifact. This is an important advantage since this type of intensity variation is very common in MR imaging.
Fig. 5. Liver segmentation in an MR volume - Segmentations obtained on 2-dimensional slices of an MR volume (σ1 = 4 and σ2 = 8)
contour are almost identical. On the other side, if σ is too big, the contour does not stop in homogeneous regions, but it is not able to capture weakly defined boundaries. The reason is that structures outside the liver will bias the outside local mean, outweighing the intensity gradient of weak edges. The last image of this figure shows the result obtained with the multi-scale formulation. In this case, we need to choose two smoothing parameters: σ1 and σ2 . Rather than complicating the approach, this allows us to specify one locality for pixels close to edges, another one for homogeneous regions, both being combined together to avoid any sharp decision (see Equation (2)). Figure 4 shows the numerical result on MR image with high synthetic bias.
Γ -Convergence Approximation to Piecewise Smooth Medical Image Segmentation
501
Fig. 6. Liver segmentation in an MR volume - Segmentations obtained on 2-dimensional slices of 3 other MR volumes (σ1 = 4 and σ2 = 8)
3.2 Validation The next experiment is a validation on a complete MR volume. The algorithm has been tested on each slice intersecting the liver. Figure 5 shows the results obtained on a few slices of one volume, while Figure 6 shows 3 slices segmented in 3 other patients. Most of the segmentations follow the liver boundary, even if there is a bias and weakly defined edges in the image. This approach is still not perfect since a few small leakages are visible. To validate more quantitatively the results, we compared them to the segmentations given by an expert. For each slice, we estimate the Dice coefficient and the average surface distance. As shown in Table 3.2, the algorithm shows very promising results. Image # 1 2 3 4 Average
Dice coefficient ± stdv 0.89 ± 0.05 0.86 ± 0.06 0.87 ± 0.05 0.92 ± 0.02 0.89 ± 0.04
Average contour distance 2.24 mm 2.75 mm 3.06 mm 2.00 mm 2.51 mm
Fig. 7. Validation of our model on 2-dimensional slices of 4 MR images - The same parameters were used for each image. The voxel size in each slice 1.4mm×1.4mm and the inter-slice distance is 3mm.
Regarding the speed of the algorithm, 1000 iterations were run until the convergence reached in around 20 seconds on a standard computer on a 256×200 image. Depending on the parameters, 1000 iterations is usually enough to get to a steady-state.
4 Conclusions and Future Work Multi-scale local intensity statistics have been introduced in a variational formulation for the segmentation of liver MR images. Our approach combines Γ -Convergence approximation with a new multi-scale piecewise model. We have shown that it was able to deal directly with MR images including spatial intensity inhomogeneities. Numerical results show the effectiveness of the proposed model. Future works will include a validation on 3D data and learning the different parameters that are now chosen empirically.
502
J. An, M. Rousson, and C. Xu
Acknowledgments We thank Bernhard Geiger for providing liver MR images, as well as Ali Khamene and Frank Sauer for their support. This work is funded by Siemens Corporate Research and Institute for Mathematics and its Applications (IMA) at the University of Minnesota.
References 1. Ambrosio, L., Tortorelli, V.: Approximation of functionals depending on jumps by elliptic functionals via Γ -convergence. Communications on Pure and Applied Mathematics 43, 999– 1036 (1990) 2. An, J., Chen, Y.: Region based image segmentation using a modified Mumford-Shah algorithm. In: Proc. Scale Space Varional Methods in Computer Vision, pp. 733–742 (2007) 3. Baldo, S.: Minimal interface criterion for phase transitions in mixtures of Cahn-Hilliard fluids. Annals of Institute Henri Poincare 7, 67–90 (1990) 4. Chan, T., Vese, L.: Active contours without edges. IEEE Transaction on Image Processing 10, 266–277 (2001) 5. Esedoglu, S., Tsai, R.: Threshold dynamics for the piecewise constant Mumford-Shah fuctional. Computational and Applied Mathematics Report. 04-63 UCLA (2004) 6. Li, C., Kao, C., Gore, J., Ding, Z.: Implicit active contours driven by local binary fitting energy. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Press, Los Alamitos (2007) 7. Lie, J., Lysaker, M., Tai, X.: A binary level set model and some aplications to Mumford-Shah segmentation. Computational and Applied Mathematics Report. vol. 31 (2004) 8. Modica, L.: The gradient theory of phase transitions and the minimal interface criterion. Archive for Rational Mechanics and Analysis 98, 123–142 (1987) 9. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics 42, 577–685 (1989) 10. Osher, S., Fedkiw, R.: Level set methods and dynamic implicit surfaces. Springer, New York (2003) 11. Piovano, J., Rousson, M., Papadopoulo, T.: Efficient segmentation of piecewise smooth images. In: Proc. Scale Space Varional Methods in Computer Vision, pp. 709–720 (2007) 12. Shen, J.: Γ -Convergence approximation to piecewise constant Mumford-Shah segmentation. In: Blanc-Talon, J., Philips, W., Popescu, D.C., Scheunders, P. (eds.) ACIVS 2005. LNCS, vol. 3708, pp. 499–506. Springer, Heidelberg (2005) 13. Wang, M., Zhou, S.: Phase field: A variational method for structural topology optimization. Computer Modeling in Engineering & Science 6, 547–566 (2004) 14. Hou, Z.: A review on MR image intensity inhomogeneity correction. International Journal of Biomedical Imaging, 1–11 (2006) 15. Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging 20, 45–57 (2001)
Is a Single Energy Functional Sufficient? Adaptive Energy Functionals and Automatic Initialization Chris McIntosh and Ghassan Hamarneh Medical Image Analysis Lab School of Computing Science, Simon Fraser University, Canada {cmcintos,hamarneh}@cs.sfu.ca http://mial.cs.sfu.ca
Abstract. Energy functional minimization is an increasingly popular technique for image segmentation. However, it is far too commonly applied with hand-tuned parameters and initializations that have only been validated for a few images. Fixing these parameters over a set of images assumes the same parameters are ideal for each image. We highlight the effects of varying the parameters and initialization on segmentation accuracy and propose a framework for attaining improved results using image adaptive parameters and initializations. We provide an analytical definition of optimal weights for functional terms through an examination of segmentation in the context of image manifolds, where nearby images on the manifold require similar parameters and similar initializations. Our results validate that fixed parameters are insufficient in addressing the variability in real clinical data, that similar images require similar parameters, and demonstrate how these parameters correlate with the image manifold. We present significantly improved segmentations for synthetic images and a set of 470 clinical examples.
1
Introduction
Despite its importance to medical image analysis (MIA), image segmentation remains a daunting task. Though many segmentation approaches exist, see [1] for a survey, an increasing number are relying on the minimization of objective functions. Examples include several landmark papers: from the seminal paper of Snakes for 2D segmentation [2] and other explicit deformable models [3] to implicit models [4,5], graph cuts approaches [6], and numerous variants thereof. Though seemingly different, the methods share a common ground. Each method requires four building blocks: (i) an objective function whose minima provide good segmentations; (ii) a set of parameters including weights to balance the terms of the energy function; (iii) a starting state and a stopping condition; and (iv ) a method for minimization, whether it be a local or a global solver. It is in this commonality that their problems lie. Specifically, the parameter setting, initialization, and minimization phases are well known to be problematic when the fully automated segmentation of real data sets is sought. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 503–510, 2007. c Springer-Verlag Berlin Heidelberg 2007
504
C. McIntosh and G. Hamarneh
35
35
30 0.75
Minor axis length
Minor axis length
25
20
0.8
← (c)
30
← (d)
15
0.7
25
0.65
20 0.6
0.55
15
0.5
10
10 0.45
5 5
10
15
20
25
Major axis length
(a)
30
35
5 5
10
15
20
25
30
35
Major axis length
(b)
(c)
(d)
Fig. 1. Segmenting ellipses with varying lengths of major and minor axes in noisy images. (a) optimal values of w for each noisy ellipse image, given its known segmentation, as calculated by our algorithm (Sec. 2.3). Circles are along the diagonal. Brighter pixels imply higher values of w. (b) optimal values of w for each image, without knowing its segmentation, as calculated by our algorithm (Sec. 2.4). Note the similarity of (b) to (a). (a) and (b) span the intrinsic 2D space (corresponding to change in major and minor axes) of the image manifold learned from 225 ellipses. The locations of the example images in (c) and (d) are indicated with small arrows in (a). (c) Segmentation of a noisy circle image with w = 0.347 (optimal parameter from (d)) (top) and with its optimal parameter w = 0.826 (bottom). (d) segmentation of a noisy ellipse image with its optimal parameter w = 0.347 (top) and with w = 0.826 (optimal parameter from (c)) (bottom). Note the need for different weights to properly segment the different images.
Even with globally optimal graph cuts, seeds must be set and a weight that balances the regional and boundary terms in the cost function must be chosen. Classically, the underlying assumption has been made that once the function, parameters1, initialization, and solver are in place, they can be fixed across all images. For example, consider the weight w ∈ [0, 1] in the simplified formulation: energy(shape, image) = w×internal(shape)+(1−w)×external(shape, image). Segmentation is performed by finding the shape that minimizes the energy for a given image. As w approaches zero, the model will be attracted solely to image features. As w approaches one, the objective function will favor smooth shapes (only internal energy) with no regard to the image data. The correct w is typically chosen empirically based on the inspection of a few images in the data set. However, what if the characteristics of the images vary across the data set? For noisy images, smoothness must be emphasized, while for highly curved structures smoothness must be relaxed. Different patients have different anatomy, and may be imaged at different times with different noise. Therefore, is it realistic for a fixed set of weights to address the variability present in meaningful medical image data sets? We argue that it is not, demonstrated in Fig. 1 using a more elaborate toy example. The training data set used to provide the shape 1
We identify two types of common parameters found in energy functionals: the weights w which balance competing terms of the functional; and the training data set used to provide the prior shape and appearance knowledge.
Is a Single Energy Functional Sufficient?
505
E
E (S , I0 )
(I0 )
(I1 )
E (S , I 2 )
(I2 )
PN
I1 I 0
I2 P1
(S0 )
(S1 )
S1 S0
G ( I 2 ) = S2
S2
(S2 )
Fig. 2. Ideal energy functions E(shape, image). Let graylevel image I, and shape model S, be represented as images with N pixels (P1 to PN ). Then {I1 , I2 , ..., IN } ∈ I, {S1 , S2 , ..., SN } ∈ S, are sets of images and corresponding segmentations, where each element is a point in RN . Samples from I and S lie on manifolds MI and MS , respectively. Three (S,I) pairs (left panel), positioned on their respective manifolds (right panel). Segmenting an image is represented by a smooth mapping G : RN → RN , since for each Ii there exists a corresponding segmentation Si . Similar images, I0 and I1 , are nearby on MI and require similar segmentations, S0 and S1 . The ideal energy functional E(S, I0 ), is a convex functional minimized at the segmentation S0 on MS .
and appearance priors is also typically fixed over the data set. A better approach is to limit the training data to those most similar to the correct shape. If the shape prior is, for example, a uniform distribution over a set of training shapes then restricting that set to a few likely shapes is clearly advantageous. Similarly, the initialization should also be image specific since the correct segmentation of that image is the best initialization. Though different images need different parameters similar images will likely have similar parameters. Recent research has begun to address the idea that groups of similar images embedded in RN , where N is the number of image pixels, lie on manifolds [7]. However, there has been no work on performing segmentation in a way that respects the image manifold, i.e. method parameters reflect the data’s variability and minima of the energy functional lie on the segmentation manifold (Fig. 2). The aforementioned problem of selecting parameters and initializing models is a serious one that remains unsolved. This work addresses the problem. The goal is to produce energy functionals with global, or at least local, minima for every (segmentation, image) pair; since anything less yields a suboptimal solution for some of these pairs (Fig. 2). Our approach is to more expressly modify the energy functional with every new image to be segmented by varying its parameters. The main contributions of this work are the following. To the best of our knowledge ours is the first work to: (1) explore the need for varying the energy functional in energy-minimizing segmentation techniques; (2) analytically derive the optimal energy functional weights, of competing energy terms, for training (segmentation, image) pairs; (3) calculate optimal parameters of novel images utilizing learned image manifolds; (4) provide a general formulation that can be directly applied to any energy minimizing segmentation technique (e.g.
506
C. McIntosh and G. Hamarneh
[4,5,8,9]) to improve its segmentation and increase its ability to generalize to larger datasets; (5) to fully automatically initialize and tune parameters optimally, demonstrated on a large (N = 470) data set of images; (6) to provide the automatic cropping of images prior to segmentation for a 10 to 20 times speedup. To the best of our knowledge there are only two papers on a related topic. In [10], Gennert et al. find weights that minimize the energy functional, but do not encourage its convexity (Fig. 2), or provide a way to obtain the optimal parameters and initialization for a new image. By contrast, we use the image manifold to encourage energy functional convexity, and obtain optimal initializations and parameters for new images. In [11], Koikkalainen et al. use a nearest neighbors approach to initialize the segmentation procedure but do not define optimal parameters for the model, nor do they make use of manifold learning to calculate distances, perform interpolation, or obtain optimal parameters for the model.
2
Methods
Our approach is to use MI (Fig. 2) to automatically obtain parameters and initializations for novel images, encouraging energy functionals like those presented in Fig. 2. Hence the main concept of our paper is designing an energy functional that is more expressly a function of MI . First, we formulate the segmentation problem (Sec. 2.1). Then given a set of training image and segmentation pairs we learn their respective manifolds (Sec. 2.2). With the geometry of the energy functional’s domain known, we can calculate the optimal weights, w, for each training image by minimizing a convex, quadratic objective function (Sec. 2.3). Now we assume that w(I) is smooth over its domain, the space of application images, and that the mapping G, from images to segmentations, is smooth. Then for a given novel image from the same class, we identify its intrinsic coordinates on MI , and assign it an optimal w and an initialization based on the optimal w’s and segmentations of “nearby” images (Sec. 2.4). 2.1
Energy Functional and Free Parameters
The first step in our proposed framework is the identification of the form of the energy functional and the associated free parameters. Our examples use the level set framework, where the contour or surface is embedded as the zero level-set of a scalar function, Φ(x), defined for every point x ∈ Ω, where Ω is the domain of the image to be segmented2 . See [4,5,8,9] for detailed discussions, examples, and derivations. In general, level-set based energy functionals E(Φ, Ij , w) = Ω w1 J1 (Φ, Ij ) + w2 J2 (Φ, Ij ) + .... + wn Jn (Φ, Ij )dx for a fixed image Ij and fixed parameters w, have update equations of the form: Φt = w1 T1 (Φ, Ij ) + .... + wn Tn (Φ, Ij ) obtained by making Φ a function of time and applying the Euler Lagrange equations, where w = [w1 , ..., wn ] are weights, and Ti is the gradient of the Ji th term. Minimizing E is then typically performed using gradient descent Φ(x, n + 1) = Φ(x, n) + ΔtΦt (x, n). 2
In what follows we drop the dependance on x for clarity.
Is a Single Energy Functional Sufficient?
507
In this work we propose functionals where the weights vary as a function of the image, yielding update equations of the form: Φt = w1 (Ij )T1 (Φ, Ij ) + .... + wn (Ij )Tn (Φ, Ij ). The method for finding w(Ij ) will be explained in Sec. 2.4, and is designed to encourage functions of a convex nature (Fig. 2). 2.2
Learning the Image Manifold
Manifold learning methods are a special class of nonlinear dimensionality reduction techniques that enable the calculation of geodesic distances between data points. Given a set of expert-segmented images we apply manifold learning to learn the image manifold, MI , and the segmentation manifold, MS . The learned manifolds are then used to estimate geodesic distances between pairs of images or segmentations. Let dMI (Ia , Ib ) be the learned geodesic distance function for MI , and dMS (Φa , Φb ) for MS (See Fig. 2 for definitions). Let NI be a neighborhood of points on MI , and NS a neighborhood of points on MS . Then define FN (a, b) as a geodesic weighting function for a given neighborhood, N , normalized to sum to one. It stands to reason that similar images will have similar segmentations, which implies a smooth mapping between manifolds. In this case dMI (Ia , Ib ) ∝ M (dMS (Φa , Φb )) where M : R → R is monotonically increasing. 2.3
Estimation of Optimal Weights for Training Data
For each {Ij , Ss } ∈ {I, S} (Fig. 2), the task is to find the optimal values for the free weights w(Ij ). This section explores the notion of ‘optimal’. Essentially, after this phase of the algorithm we will have |I| samples of w(I), from which we can interpolate to find values at new points (images). One potential, but computationally intractable, approach for finding the optimal parameter is to try all possible parameter combinations against all possible (or likely) initializations and run the segmentation method then select the parameters with the least error. A better approach for PDE-based methods is to find the parameter that minimizes the magnitude of the time derivative of the shape model, Φt , at the solution3 Φs . Doing so encourages the correct solution, Φs , to be a stopping point of the simulation (i.e. Φst =0). Since Φt is itself a scalar 2 field and wi (Ij ) a scalar function (see below), we measure its magnitude as |Φst | . Better still is to minimize the time derivative of the energy given the current pair and maximize it for all other possible segmentations (in a direction toward the optimal solution), thereby encouraging Φs to be the minimum of a convex function (Fig. 2). For a given shape Φi , a point in RN , (Φs − Φi ), taken as a vector4 , represents the direction towards Φs . Since Φit is the vector in RN dictating in what direction, and by what amount, the solution will change at the point Φi , a normalized dot-product will measure how much in the right direction Φit points. For computational feasibility, we settle for nearby segmentations NS ; a reasonable step given our method for initializing is an interpolation over NS . 3 4
The Euler Lagrange constraints ensure such a point will be a stationary point of E. Using (Φi − Φs ) would encourage Φs to be a maximum.
508
C. McIntosh and G. Hamarneh
So for an energy function with a form like those in Sec. 2.1, we need to find the parameters w(Ij ) that minimize i s i · (Φ − Φ ) Φ 2 (1) arg min |Φst | − FNS (Φs , Φi ) t s |Φ − Φi | i∈NS
where FNS is used to give more weight to nearby segmentations as a function of their geodesic proximity. The neighborhood term is negative and |Φit | is omitted from the normalized dot-product to reward large steps in the correct direction, and to allow fast solutions for w. Weights w are embedded in Φt . Generally each parameter can vary spatially, or with time. For many energy functionals this can cause Eq. 1 to itself become a PDE. Consequently, we limit this work to scalar parameters that are fixed in time, causing Eq. 1 to become a convex, quadratic optimization problem in Rn ; shown as −1 follows. Letting Ci = FNS (Φs , Φi )(Φs − Φi ) Φs − Φi , and expanding the norm and thedot-product sum over the image domain gives: in Eq. 1 to expressly
2 n n s i arg min − i∈NS Ci p=1 wp Tp (Φ ) . Define x∈Ω p=1 wp Tp (Φ ) s two 1 × n vectors: T with entries Tp = x∈Ω Tp (Φ ); and V where Vp = i and simplifying yields a x∈Ω i∈NS Ci Tp (Φ ) . Collecting like terms quadratic equation in standard form: arg min wT TT Tw − V w , where TT T is by definition positive semi-definite; ensuring the optimization problem is convex. 2.4
Optimal Parameters and Initialization for a Novel Image
Our approach assumes the optimal parameters for a novel image will be “similar” to the optimal parameters of “similar” images; we assume that w(I) is a smooth function. Therefore, for a given novel image we identify its coordinates on the learned manifold, and then assign it an optimal w and an initialization by smoothly interpolating NI using a normalized Gaussian kernel on FNI . For the priors, we limit the training data to NI , NS , since we are more confident that the correct shape and appearance information lies in those neighborhoods.
3
Results
We tested our method on a synthetic data set of 225 noisy ellipse images, where only the major and minor axes lengths of the ellipses are varied; a 2-manifold. To learn the manifolds we used a MATLAB implementation of k-ISOMAP [12] from http://isomap.stanford.edu/. As this paper is about the application of manifolds to segmentation, issues related to learning the manifold will not be addressed in this work. Sample images are shown in Fig. 1 (c) and (d), while (a) shows how the smoothness weight w varies gradually as a function of the manifold, with more eccentric ellipses having the least smoothness requirement. For this example we set k = 7 in k-ISOMAP, and reduce the image space to a 2-manifold. Adopting the Chan-Vesse energy functional of [5] and using leave-one-out validation, we
1
1
0.8
0.8
% of images ≤ Error
% of images ≤ Error
Is a Single Energy Functional Sufficient?
0.6
0.4
0.2
0.6
0.4
0.2
Optimal Parameters Fixed Parameters (Averaged Optimal)
Optimal Parameters Fixed Parameters (hand−tuned) 0 0
0
0.2
0.4
0.6
Error/Mx1
0.2
0.8
0.4
1
509
0 0
0.2
0.6
0.4
0.6
Error/Mx2
0.8
0.8
1
1
% of images with error ≤ displayed image
Fig. 3. CC segmentation results. (Top) Error plots where Mx1 and Mx2 are the maximum values of the optimal parameter error result for experiment (i) and (ii), respectively. (Bottom) Automatic, optimal parameter segmentations demonstrating the full range of error.
compare between two scenarios: (i) fixed initialization with the average ellipse and fixed parameters obtained by averaging the set of optimal parameters; and (ii) adaptively chosen optimal parameters and initializations as functions of the coordinates of the novel image on MI . We found a 47.1% reduction in average error5 using optimal parameters over fixed (0.190 vs 0.358). This means that 50% of the error was due to erroneous parameters and/or initializations. For medical data we used a set of 470 256 × 256 affine registered mid-sagittal MRI, with corresponding expert-segmented corpora callosa (CC) (Fig. 2). We demonstrate that existing works can be extended to real clinical applications by defining the energy functional as a weighted combination of terms from [4,5,13] with an update equation in the form of: ⎧
⎨ w1 (I)g(c + κ) |∇Φ| + w2 (I)∇Φ · ∇g + w3 (I) H(Φ) |I − c1 |2 ... (2) Φt = 2 nonlinear ⎩ ... + (1 − H(Φ)) |I − c2 | + w4 (I)∇Φ Eshape H(Φ) is a Heaviside function, and c1 , c2 are average intensities as defined in [5]. We set c = −2.0, since w1 will scale accordingly regardless of its value, and use g as defined in [4]. For k-ISOMAP we set k = 10, and reduce the image space to a 5-manifold; chosen as the elbow of the scree plot. To demonstrate our method’s performance with different energy functionals two leave-one-out validation experiments were performed: (i) optimal parameters vs hand-tuned parameters, no 2 outside intensity term ( H(Φ) |I − c1 | ), and no auto-cropping (Fig. 3 left); (ii) optimal parameters vs averaged optimal parameters, the complete function and auto-cropping enabled (Fig. 3 right). With auto-cropping enabled segmentation took approximately 25 seconds per image, as opposed to 3 minutes, on a 2.4GHz 5
Error is measured as ε = Area(Auto∪GT − Auto∩GT )/Area(GT ), where Auto and GT are the binary automatic segmentation and the ground truth, respectively.
510
C. McIntosh and G. Hamarneh
AMD Opteron. As before significant reductions in average error are obtained: (0.17 vs 0.51) in experiment (i), and (0.16 vs 0.22) in (ii).
4
Conclusions
On the practical side, we motivated and outlined a new fully-automatic technique for MIA, which addresses the parameter-setting, and initialization problems that typically plague deformable model-based MIA approaches. Our approach requires training, mimicking the reliance of humans on example images to learn, and incorporating into the learning new images as they are segmented. Our results clearly demonstrate our method’s ability to segment a large number of unseen images, which we foresee carrying over to other datasets and objective functions. On the theoretical side, we explored the consequences of using fixed parameters for energy-minimizing MIA techniques, defined the notion of optimal weights that favor convex energy functionals, and demonstrated a relation between image manifolds and energy-based segmentation. Acknowledgements. CM was supported by the CIHR. The CC MRI data was provided by the MS/MRI Research Group at the University of British Columbia.
References 1. Pham, D., Xu, C., Prince, J.: Current methods in medical image segmentation. Annu. Rev. Biomed. Eng. 2, 315–337 (2000) 2. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. IJCV 1(4), 321–331 (1987) 3. Lobregt, S., Viergever, M.: A discrete dynamic contour model. IEEE TMI 14, 12–24 (1995) 4. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. IJCV 22(1), 61–79 (1997) 5. Chan, T., Vese, L.: Active contours without edges. IEEE TIP, 266–277 (2001) 6. Boykov, Y., Kolmogorov, V.: Computing geodesics and minimal surfaces via graph cuts. In: ICCV (2003) 7. Seung, H.S., Lee, D.: The manifold ways of perception. Science 290, 2268–2269 (2000) 8. Leventon, M., Grimson, E., Faugeras, O.: Statistical shape influence in geodesic active contours. In: CVPR (2000) 9. Pluempitiwiriyawej, C., Moura, J., Wu, Y.J.L., Ho, C.: Stacs: new active contour scheme for cardiac mr image segmentation. IEEE TMI 24, 593–603 (2005) 10. Gennert, M., Yuille, A.: Determining the optimal weights in multiple objective function optimization. Computer Vision 5, 87–89 (1998) 11. Koikkalainen, J., Ltjnen, J.: Model library for deformable model-based segmentation of 3-d brain mr-images. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 540–547. Springer, Heidelberg (2002) 12. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000) 13. Dambreville, S., Rathi, Y., Tannenbaum, A.: Shape-based approach to robust image segmentation using kernel pca. CVPR 1, 977–984 (2006)
A Duality Based Algorithm for TV-L1-Optical-Flow Image Registration Thomas Pock1 , Martin Urschler1 , Christopher Zach2 , Reinhard Beichel3 , and Horst Bischof1 1
Institute for Computer Graphics & Vision, Graz University of Technology, Austria {pock,urschler,bischof}@icg.tu-graz.ac.at 2 VRVis Research Centre, Graz, Austria [email protected] 3 Dept. of Electrical & Computer Engineering and Dept. of Internal Medicine, The University of Iowa, USA [email protected]
Abstract. Nonlinear image registration is a challenging task in the field of medical image analysis. In many applications discontinuities may be present in the displacement field, and intensity variations may occur. In this work we therefore utilize an energy functional which is based on Total Variation regularization and a robust data term. We propose a novel, fast and stable numerical scheme to find the minimizer of this energy. Our approach combines a fixed-point procedure derived from duality principles combined with a fast thresholding step. We show experimental results on synthetic and clinical CT lung data sets at different breathing states as well as registration results on inter-subject brain MRIs.
1
Introduction
A large number of medical image analysis applications require nonlinear (deformable) registration of data sets acquired at different points in time or from different subjects. The deformation of soft tissue organs, such as the lung or the liver often requires the compensation of breathing motion. Surveys on nonlinear registration methods in medical imaging can be found in Maintz and Viergever [12] or Crum et al. [9]. In the literature a distinction is also made between feature based and intensity based methods. However, the majority of publications utilizes the intensity based methods [2,8,15] mainly because all available image information is used for registration. The main drawback of those methods is the required computational effort. The optical-flow based approach is very popular for intra-modality registration tasks [10,15,16]. The variational formulation of the optical flow framework
This work was supported by the Austrian Science Fund (FWF) under the grant P17066-N04 and by the Austrian Research Promotion Agency (FFG) within the VM-GPU Project No. 813396. Part of this work has been done in the VRVis research center (www.vrvis.at), which is partly funded by the Austrian government research program Kplus.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 511–518, 2007. c Springer-Verlag Berlin Heidelberg 2007
512
T. Pock et al.
typically consists of quadratic data and regularization terms. This approach has two disadvantages. At first, only smooth displacement fields can be recovered. In some medical volume registration applications it is however necessary to allow for discontinuities in the displacement field. A typical example is breathing motion, where the diaphragm undergoes a heavy deformation, whereas the rib cage remains almost rigid. Therefore smooth displacement fields are not sufficient to model such complex motions. In addition, quadratic error norms are not robust against outliers in the intensity values. Image differences due to contrast agent application or inter-subject registration tasks frequently violate the intensity-constancy assumption. In this work we present a novel, numerical algorithm for an optical flow based method which allows for discontinuities in the displacement field and is robust with respect to varying intensities. In Section 2 we introduce variational optical flow methods and derive the TV-L1 -optical-flow model. For minimization we derive a novel numerical scheme by combining a duality based formulation of the variational energy and a fast thresholding scheme. In Section 3 we evaluate our algorithm using synthetically transformed and clinical data sets. In the last section we give some conclusions and suggest possible directions for future investigations.
2
Optical Flow
The recovery of motion from images is a major task of biological and artificial vision systems. In their seminal work, Horn and Schunck [11] studied the socalled optical flow, which relates the image intensity at a point and given time to the motion of an intensity pattern. 2.1
Model of Horn and Schunck
The classical optical flow model of Horn and Schunck (HS) for 2D images is given by the minimizer of the following energy: 2 2 2 min EHS = |∇u1 | + |∇u2 | dΩ + λ (I1 (x + u(x)) − I0 (x)) dΩ . u
Ω
Ω
(1) I0 and I1 is the image pair, u = (u1 (x), u2 (x))T is the two-dimensional displacement field and λ is a free parameter. The first term (regularization term) penalizes for high variations in u to obtain smooth displacement fields. The second term (data term) is basically the optical flow constraint, which assumes that the intensity values of I0 (x) do not change during its motion to I1 (x + u(x)). Since the HS model penalizes deviations in a quadratic way, it has two major limitations. It does not allow for discontinuities in the displacement field and it does not allow for outliers in the data term. To overcome these limitations, several models including robust error norms and higher order data terms have been proposed [1,3,13].
A Duality Based Algorithm for TV-L1 -Optical-Flow Image Registration
513
TV-L1 -Optical-Flow
2.2
In this paper we utilize the non-quadratic error norm | · | for both the regularization term and data term. The major advantage of this norm is, that it allows for outliers while still being convex. Using this error norm and extending Eq. (1) to N dimensions, the robust optical flow model is given by N |∇ud | dΩ + λ |I1 (x + u(x)) − I0 (x)| dΩ , (2) min ETV−L1 = u
Ω d=1
Ω
where u = (u1 , u2 , . . . , uN )T is the N -dimensional displacement field. Although this model seems to be simple and the modifications compared to Eq. (1) are minor, it offers some desirable improvements. At first, the regularization term allows for discontinuities. We note that this term is the well known Total Variation (TV) regularizer which has been proposed by Rudin Osher and Fatemi (ROF) for image denoising [14]. Secondly, the data term uses the robust L1 norm and is therefore less sensitive to intensity variations. Besides its clear advantages, the TV-L1 -optical-flow model also leads to some computational difficulties for minimization. The main reason is that both the regularization term and the data term are non-differentiable at zero. √ One approach would be to replace |s| by some differentiable approximation e.g. s2 + ε2 [4,6]. However, for small ε this approach usually shows slow convergence and, on the other hand, using large ε leads to blurred displacement fields. 2.3
Solution of the TV-L1 -Optical-Flow Model
Based on the influential work of Chambolle [5], we introduce an additional variable v = (v1 , v2 , . . . , vN )T and propose to minimize the following convex approximation of Eq. (2). N 1 2 (ud − vd ) dΩ + λ min ETV−L1 = |∇ud | + |ρ(x)| dΩ , (3) u,v 2θ Ω Ω d=1
where the parameter θ is small so that we almost have u ≈ v and ρ(x) = I1 (x+v 0 )+(∇I1 (x+v 0 ))T (v−v 0 )−I0 (x) is a first order Taylor approximation of the image residual. This formulation has several advantages: The minimization with respect to u can be performed using Chambolle’s algorithm. It is based on a dual formulation of the total variation energy and does not suffer from any approximation error [5]. The minimization with respect to v reduces to a simple 1-D minimization problem which can be solved by an efficient thresholding scheme. Thus, to solve the optimization problem Eq. (3), we propose an iterative algorithm by alternating the following two steps: 1. For every d and fixed vd , solve 1 2 min (ud − vd ) |∇ud | + dx . ud 2θ Ω
(4)
514
T. Pock et al.
2. For fixed u, solve
1 2 min (ud − vd ) + λ |ρ(x)| . v 2θ
(5)
d
The solution of these two optimization steps is given by the following propositions: Proposition 1. The solution of Eq. (4) is given by ud = vd − θ div p, where p = (p1 , p2 ) fulfills ∇(θ div p − vd ) = |∇(θ div p − vd )| p, which can be solved by the following iterative fixed-point scheme: pk+1 =
pk + τ ∇(div pk − vd /θ) , 1 + τ |∇(div pk − vd /θ)|
(6)
where p0 = 0 and the time step τ ≤ 2(N + 1). Proposition 2. The solution of Eq. (5) is scheme: ⎧ ⎨ λ θ ∇I1 v(x) = u(x) + −λ θ ∇I1 ⎩ −ρ(x) ∇I1 /|∇I1 |2
given by the following thresholding if ρ(x) < − λ θ |∇I1 |2 if ρ(x) > λ θ |∇I1 |2 if |ρ(x)| ≤ λ θ |∇I1 |2 ,
(7)
A proof of this proposition is presented in [17]. 2.4
Implementation
Computing optical flow is a non-convex inverse problem, which means, that no globally optimal solution can be computed in general. In order to avoid such physically non-relevant solutions, one typically applies a coarse to fine strategy. Note, that due to the linearization of the image residual, the single-level minimization problem Eq. (2) becomes a convex one. However, the overall problem still remains non-convex. For this purpose, we build a full Gaussian image pyramid by successively down-sampling the images by a factor of 2. On each level of the image pyramid, we solve the minimization problem Eq. (3), starting with the coarsest level. The solution is then propagated until the finest level has been reached. At the coarsest level, the displacement field is initialized by u = 0. In the following we describe the numerical scheme used to compute the solution at one single level of the image pyramid. For the implementation of the fixed-point iteration scheme Eq. (6) we use backward differences to approximate the discrete divergence operator and forward differences to approximate the discrete gradient operator [5]. For the implementation of the thresholding scheme Eq. (7) we use central differences to approximate the image derivatives. Finally, our iterative procedure consist of alternately computing the solutions of the fixed-point scheme and the thresholding step. In practice only a small number
A Duality Based Algorithm for TV-L1 -Optical-Flow Image Registration
515
of iterations (100 − 200) is necessary to reach convergence. Moreover, empirical tests have shown that one iteration of the fixed-point scheme is sufficient to obtain fast convergence of the entire algorithm. The numerical algorithm has two free parameters. The parameter θ acts as a coupling between the two optimization steps. It basically depends on the range of the intensity values of the input images. Empirically, we have selected θ = 0.02 in all experiments. The parameter λ is used to control the amount of regularization. Typical values of λ are between 10 and 100.
3
Experimental Results
To assess the validity of our approach we performed qualitative and quantitative evaluations using synthetically transformed and clinical thorax CT data sets showing breathing motion and inter-subject brain data sets. All our experiments were performed on an AMD Opteron machine with 2.2 GHz and 16 GB RAM, running a 64-bit Linux system. The run-time of our algorithm is approximately 15 minutes for 2563 voxel data sets when performing 200 iterations on the finest level of the image pyramid. Note that this performance is superior compared to the majority of intensity based registration algorithms. 3.1
Synthetic Data
Synthetic experiments were performed on a thoracic CT test data set with a size of 2563 voxels, which was taken from the NLM data collection project 1 . To demonstrate the robustness of our method we generated two data sets. The first one (NLM-SM) contains a simulated breathing motion. For simulation of diaphragm movement we applied a translational force in the axial direction. This force sharply decreases at those locations where the diaphragm is attached to the rib cage, and linearly decreases in the axial direction (see Fig. 1(b)). For the second data set (NLM-SMC) we simulated the application of a contrast agent by increasing the intensity values of pre-segmented lung vessels by 300 Hounsfield Units (HU) (see Fig. 1(c)). We ran our algorithm on these two data sets using λ = 40 and 200 iterations on the finest scale level. For quantitative evaluation we calculated the mean and the standard deviation of the computed displacement field (u) with respect to the known ground-truth displacement field. For comparison we additionally ran the popular Demons algorithm [15] on our synthetic data. The implementation was taken from the ITK 2 . We set up the ITK Demons algorithm in a multiscale manner with 6 pyramid levels, a total number of 50 iterations at the finest scale level and a displacement field smoothing value of σ = 1. Table 1 shows the results of our experiments. From this we can see that our algorithm performs slightly worse on NLM-SM, but outperforms the Demons algorithm on NLMSMC, showing the robustness of our algorithm. 1 2
http://nova.nlm.nih.gov/Mayo/ http://www.itk.org
516
T. Pock et al.
(a)
(b)
(c)
Fig. 1. (a) Original data set. (b) Simulated diaphragm motion. (c) Simulated contrast agent in lung vessels. Table 1. Quantitative evaluation of the synthetic experiments NLM-SM NLM-SMC Initial Demons TV-L1 Initial Demons TV-L1 mean [mm] 3.788 1.555 1.919 3.788 2.260 1.768 std [mm] 2.997 2.185 2.285 2.997 2.441 2.301
3.2
Clinical Data
The algorithm was also evaluated using clinical data sets. For clinical data clearly no ground-truth displacement field is available. To still allow a quantitative assessment, we calculated two similarity measures, the decrease of root mean squared error of the intensity values (RMS) and the increase of [0, 1]-normalized mutual information (NMI). The first experiment was breathing motion compensation of two thoracic CT data sets (LCT1, LCT2). Each of them consists of two scans at different breathing states. The size of both data sets is 2563 . The second experiment was intrasubject registration of brain MRI. The brain data base consists of four data sets (BMRI1, BMRI2, BMRI3, BMRI4) provided by the Non-Rigid Image Registration Evaluation Project (NIREP) [7]. The size of each data set again was 2563 . We chose BMRI1 as reference image and registered the remaining images to the reference. For the first experiment we ran our algorithm using λ = 50 and 200 iterations. For the second experiment we used λ = 10. The quantitative results of our experiments are given in Tab. 2. From this we can see a significant improvement in both similarity measures. For visual Table 2. Quantitative evaluation of the clinical data sets LCT1 LCT2 BMRI2 BMRI3 BMRI4 before after before after before after before after before after RMS 232.88 50.33 245.86 45.94 25.79 12.31 31.11 13.35 30.80 13.06 NMI 0.233 0.418 0.289 0.454 0.299 0.401 0.265 0.394 0.269 0.395
A Duality Based Algorithm for TV-L1 -Optical-Flow Image Registration
(a) LCT1
517
(b) LCT2
Fig. 2. Lung CT breathing motion compensation. The upper row shows differences of sagittal and coronal slides before registration, the lower row the same slides after registration.
(a) BMRI2 to BMRI1
(b) BMRI3 to BMRI1
(c) BMRI4 to BMRI1
Fig. 3. Checkerboard representation of inter-subject brain MRI registration. The upper row shows axial an sagittal slides before registration, the lower row shows the same slides after registration.
assessment we also provide qualitative results which are shown in Fig. 2 and Fig. 3.
4
Conclusion
In this work we presented a novel variational approach for nonlinear optical flow based image registration by employing a TV-L1 -optical-flow model. For minimization of the model we proposed a novel fast and stable numerical scheme by combining a dual formulation of the total variation energy and an efficient thresholding scheme. For quantitative and qualitative evaluation we used synthetically deformed lung CT data sets, clinical intra-subject thorax CT images and inter-subject brain images. For future work we see two potential directions. One direction is that we plan to use more advanced similarity measures such as mutual information. A second direction is to speed up our method using
518
T. Pock et al.
state-of-the-art graphics hardware. Our recent results indicate a speed-up-factor of approximately 30.
References 1. Aubert, G., Deriche, R., Kornprobst, P.: Computing optical flow via variational techniques. SIAM J. Appl. Math. 60(1), 156–182 (1999) 2. Bajcsy, R., Kovacic, S.: Multiresolution Elastic Matching. Computer Vision, Graphics and Image Processing 46(1), 1–21 (1989) 3. Black, M.J., Anandan, P.: A framework for the robust estimation of optical flow. In: ICCV 1993, pp. 231–236 (1993) 4. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 25–36. Springer, Heidelberg (2004) 5. Chambolle, A.: An algorithm for total variation minimization and applications. J. Mathematical Imaging and Vision 20, 89–97 (2004) 6. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. In: ICAOS 1996 (Paris, 1996), vol. 219, pp. 241–252 (1996) 7. Christensen, G.E., Geng, X., Kuhl, J.G., Bruss, J., Grabowski, T.J., Pirwani, I.A., Vannier, M.W., Allen, J.S., Damasio, H.: Introduction to the non-rigid image registration evaluation project (NIREP). In: Pluim, J.P.W., Likar, B., Gerritsen, F.A. (eds.) WBIR 2006. LNCS, vol. 4057, pp. 128–135. Springer, Heidelberg (2006) 8. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable Templates Using Large Deformation Kinematics. IEEE Trans. Image Processing 5(10), 1435–1447 (1996) 9. Crum, W.R., Hartkens, T., Hill, D.L.G.: Non-rigid image registration: theory and practice. The British Journal of Radiology - Imaging Processing Special Issue 77, S140–S153 (2004) 10. Hellier, P., Barillot, C., Memin, E., Perez, P.: Hierarchical Estimation of a Dense Deformation Field for 3D Robust Registration. IEEE Trans. Med. Imag. 20(5), 388–402 (2001) 11. Horn, B., Schunck, B.: Determining optical flow. Artificial Intelligence 17, 185–203 (1981) 12. Maintz, J.B.A., Viergever, M.A.: A Survey of Medical Image Registration. Medical Image Analysis 2(1), 1–36 (1998) 13. Papenberg, N., Bruhn, A., Brox, T., Didas, S., Weickert, J.: Highly accurate optic flow computation with theoretically justified warping. Int’l. J. Computer Vision, 141–158 (2006) 14. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 15. Thirion, J.-P.: Image matching as a diffusion process: An analogy with Maxwell’s demons. Medical Image Analysis 2(3), 243–260 (1998) 16. Weickert, J., Schn¨ orr, C.: A Theoretical Framework for Convex Regularizers in PDE-Based Computation of Image Motion. International Journal of Computer Vision 45(3), 245–264 (2001) 17. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. In: Hamprecht, F.A., Schn¨ orr, C., J¨ ahne, B. (eds.) Pattern Recognition - DAGM 2007. LNCS, vol. 4713, Springer, Heidelberg (2007)
Deformable 2D-3D Registration of the Pelvis with a Limited Field of View, Using Shape Statistics Ofri Sadowsky1 , Gouthami Chintalapani1 , and Russell H. Taylor1 Department of Computer Science, Johns Hopkins University, USA [email protected]
Abstract. Our paper summarizes experiments for measuring the accuracy of deformable 2D-3D registration between sets of simulated x-ray images (DRR’s) and a statistical shape model of the pelvis bones, which includes x-ray attenuation information (“density”). In many surgical scenarios, the images contain a truncated view of the pelvis anatomy. Our work specifically addresses this problem by examining different selections of truncated views as target images. Our atlas is derived by applying principal component analysis to a population of up to 110 instance shapes. The experiments measure the registration error with a large and truncated FOV. A typical accuracy of about 2 mm is achieved in the 2D3D registration, compared with about 1.4 mm of an “optimal” 3D-3D registration.
1
Introduction
Deformable statistical models of anatomy are useful tools for recovering the shape of patient anatomy when only partial information is available. In a previous publication [1], we showed that a deformably registered anatomical model can be combined with a limited-trajectory set of x-ray images to create a CT-like “hybrid reconstruction”. The current paper focuses on assessing the accuracy of our deformable registration. A frequent challenge in registering data to intra-operative fluoroscopic images is truncation, or a limited field of view (FOV) in the x-ray. Fig. 1 shows examples of fluoroscopic images of a dry cadaveric pelvic girdle (our target anatomy), taken with a common 9” C-arm. Only parts of the large and complex bone appear in the image, and this may affect the registration results. A main goal of this paper is to characterize a trade-off between different combinations of target images: a large FOV and a small number of projections, or a small FOV and varying numbers of projections. In comparison, previous works on deformable 2D-3D registration, such as [2,3,4], usually applied to smaller bones, such as the proximal femur, and the view included the full anatomy. We show in this paper that our shape atlas and the deformable 2D-3D registration method we created are relatively robust to the FOV truncation problem, if the imaging poses are carefully selected. Our registration is based on normalized mutual information (NMI) similarity, and does not require edge detection N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 519–526, 2007. c Springer-Verlag Berlin Heidelberg 2007
520
O. Sadowsky, G. Chintalapani, and R.H. Taylor
Fig. 1. Truncated fluoroscopic images of the pelvic bones. The images were taken with a 9” OEC 9600 C-arm. Contours of a deformably registered atlas are overlaid in green. The contours are part of an ongoing work discussed in Section 4.
(compare, for example, with [3]) or segmentation of the target images. The use of NMI simplifies the assumptions on the image content, and appears to contribute to the robustness to imaging constraints, such as noise, occlusions, and truncated view.
2 2.1
Method Atlas Creation
Our anatomical atlas [12] is modeled after the work of Yao [5,4] with some modifications (the main differences are in the methods used to generate the shape mesh and to co-register the template to the individual subjects), and essentially following the Active Shape Models paradigm [6]. To create the atlas, we start by labeling a selected CT study – the template CT, denoted CT0 – for the anatomy of interest, i.e. the pelvis bones. From the labeled voxels, we create a template tetrahedral mesh [7], denoted S0 . Given a collection of subject CT studies, {CTj }N j=1 , we compute for each subject instance j a deformation that maps the voxels of CT0 to corresponding voxels of CTj [8]. This deformation is applied to the vertices of S0 to obtain a subject shape Sj . Finally, all the shape instances are aligned using a similarity transformation, and modes of variation are extracted from this population using principal component analysis. A new shape instance can be generated from the atlas as Sˆ = S¯ +
L
wi Vi
(1)
i=1
where S¯ is the “mean shape” obtained from aligning all the instance shapes; {wi } are scalar weights; and {Vi }L i=1 are the first L modes in the distribution of shapes. Every tetrahedral cell in the atlas mesh is assigned with a CT density attribute, in the form of a 3rd order barycentric Bernstein polynomial. The polynomial coefficients are obtained by fitting to the CT intensity numbers in the
Deformable 2D-3D Registration of the Pelvis with a Limited FoV
521
template study voxels covered by that cell. At this stage, we use the densities of a single subject in the atlas, though Yao performed a statistical anlaysis of the densities as well. 2.2
2D-3D Registration
For the registration experiments, we use Take software [9] to create DRR’s of segmented CT datasets. For each target image, denoted Ik , the intrinsic and extrinsic camera parameters are given. In this paper’s DRR’s, the parameters are user-defined. In a clinical scenario, a calibrated C-arm and a form of tracking, e.g. optical markers or encoders, would be used to image the patient. The registration algorithm creates DRR’s of our atlas using a fast rendering algorithm [10]. For each camera pose k, the normalized mutual information similarity measure N M Ik = (H(Ik ) + H(DRRk ))/H(Ik , DRRk )
(2)
is computed (selected based on [11]), where H is the entropy of pixel intensity distribution (or joint distribution). The final similarity score for the entire set K of projections is Sim = k=1 N M Ik . Our method searches for the maximum of this score using the Nelder-Mead downhill-simplex algorithm, over the space spanned by the following parameters: translation (d), rotation (R), isotropic scale (s), and shape mode weights {wi }. The estimated shape has the form in Equation 1. The final, registered model is the outcome of applying a similarity ˆ ˆ S reg = s(R·S+d). We alternate subsets of parameters, such transformation to S: as a translation vector or rotation “Rodrigues” vector, search for the optimal value on each subset, and fix the result when searching the next subset. 2.3
Imaging Parameters
This paper examines two sizes of the field of view: 270 mm and 160 mm (see Fig. 2). These sizes relate to a “virtual” detector plane located at the isocenter of camera motion (typically inside the object), and reflect approximately the diameter of the visible portion of anatomy. The 270 mm FOV covers either the full or nearly-full anatomy of the pelvic bones. The 160 mm FOV is roughly comparable to a 9” C-arm image intensifier (about 216 mm) lying outside of the object (Fig. 1). The source-to-detector distance was set to 800 mm. By our experience (and Yao’s [5]), when a large field of view is available, two or three viewing directions, such as antero-posterior (AP), lateral, and oblique, are sufficient for a good registration. But when only a truncated view is available, more views may be required. To examine this, we run the registration using three, six, or eight truncated views, which cover increasing portions of the pelvis’s anatomy. Fig. 2 demonstrates the target images and the registration results for one subject. For the 270 mm images, (a)-(c), we rotated the camera in a circular arc about the the object’s Z axis, and imaged at angles 0◦ (lateral view), 45◦
522
O. Sadowsky, G. Chintalapani, and R.H. Taylor
Y
X
Z
Z
(a)
(d)
Z
(b)
(e)
(i)
(c)
(f)
(j)
(g)
(k)
(h)
(l)
Fig. 2. Reference images and deformable registration results. The contours of a registered atlas are overlaid in green on the reference images. (a)-(c) FOV=270 mm. (d)-(l) FOV=160 mm. Images (d)-(f) are used in the three-view registration. Images (g)-(l) are used in the six-view registration. All of (d)-(l) except (f) are used in the eight-view registration. The axis directions of the model are shown in yellow.
(oblique view), and 90◦ (AP view). The model axis directions are illustrated in the Figure. For the 160 mm images, (d)-(l), we included rotation and translation in the camera trajectory. The translations included moving the camera 40 mm up and down the Z axis, and moving 30 mm up the Y axis. The view angles included 0◦ , 30◦ , 45◦ , and 90◦ . The images were selected to include features such as the proximal and distal ends of bones, the ilium, and the sacrum. 2.4
Accuracy Measure
Because of the deformable component, it is hard to define the “exactly correct” registration parameters, unless we use artifically created instances. Hence, in this paper, we measure the effective error by surface-to-surface distance between the registered shape of the atlas and a “ground truth” shape. We performed leave-one-out tests by randomly slecting datasets from the study population. For each selected subject, j, we used the remaining datasets to create a shape atlas, Aj . The subject shape Sj (left out of the atlas) was regarded as the “ground truth”, and used as a segmentation mask over the CT
Deformable 2D-3D Registration of the Pelvis with a Limited FoV
523
study CTj . The masked subject dataset, CTjSgmt , was passed as input to the (j)
DRR generator to create the target images {Ik }. (j) Next, the atlas Aj was deformably registered with {Ik } to create a result reg reg shape Sj . Finally, the outer surface of Sj was compared with the Sj by way of projection on the nearest neighbor: For each vertex vl on the surface of Sjreg , we define sl as the nearest point on Sj (sl is not necessarily a vertex of Sj ). We defined el = sl − vl as the error vector at the vertex vl , and computed statistics on ||el ||. For comparison, we estimated the “optimal” 3D-3D registration that our atlas may achieve for the subject j using mode matching [5]. Based on the correspondence of vertices between the atlas Aj and the shape Sj , we computed the deformable registration parameters that minimize the sum of squared distances between the corresponding vertices. This produced an “optimal” shape Uj , which we compared with Sj using the same surface-to-surface error metric as above.1
3
Results and Discussion
Our study population consisted of 110 CT scans of the pelvis, taken from prostate cancer patients. Eleven datasets were selected at random as experiment targets. They were resampled to a uniform voxel size of 1.875 mm cube before DRRs were generated. The mesh population was the output of the first pass of a bootstrapping experiment, presented separately [12]. All the registration experiments used the first L = 15 shape variation modes. In all the experiments, we started from an arbitrary initial guess of zero translation, rotation, and deformation magnitude. The final translation magnitudes were between 13 mm and 59 mm, in different directions; the final rotation magnitudes were between 10◦ and 17◦ , about different axes. Table 1 summarizes the surface distances of each registration experiment from the “ground truth” shape Sj . According to Table 1, there is a slight decline in the mean registration error of the truncated images when more views are used. This suggests diminishing returns on the number of truncated views. For some subjects (e.g. 10, 26, 44) the registration error with three truncated views than with six or eight. This may be due to the assumptions underlying our choice of objective function in relation to the evaluation criterion. The best 2D-3D registration results, in almost all cases, were achieved with the larger FOV of 270 mm. For comparison, the mean error of the 3D-3D registration was about 1.4 mm. Fig. 3 shows the distribution of registration errors on the surface of a representative bone, computed as a mean per vertex over the population of Table 1. We compare the results of (a) 2D-3D registration using eight views, and (b) 3D-3D 1
In a previous series of experiments, we compared Uj with a hand-segmented shape. Due to shape uncertainties in both Sj and the hand segmentation, we decided to use Sj as the ground truth.
524
O. Sadowsky, G. Chintalapani, and R.H. Taylor
Table 1. Summary of surface distances (in mm) between the registration results and subject-specific (“ground truth”) shapes Sub- FOV=270 mm FOV=160 mm FOV=160 mm FOV=160 mm 3D Mode ject 3 views Matching 3 views 6 views 8 views Mean Max Mean Max Mean Max Mean Max Mean Max 10 1.94 12.06 2.05 10.77 2.20 11.00 2.09 11.48 1.59 6.97 26 2.00 8.31 2.02 13.78 2.10 12.45 2.10 12.83 1.59 8.13 31 1.77 7.93 2.37 17.48 1.86 10.53 1.72 8.87 1.63 10.37 41 1.71 9.35 1.85 7.91 1.75 8.04 1.82 8.25 1.02 5.68 43 3.44 17.83 3.42 20.59 3.26 19.51 2.98 17.68 1.08 6.00 44 1.92 10.26 2.34 13.62 2.37 14.02 2.40 13.45 1.50 7.73 58 2.23 9.08 2.43 13.15 2.31 12.81 2.48 11.59 1.70 7.67 60 1.70 9.01 1.58 10.13 1.65 7.40 1.67 7.71 1.49 6.09 66 1.59 8.80 1.50 7.73 1.74 9.75 1.81 9.21 1.18 6.14 68 1.48 7.05 1.85 7.17 1.66 7.49 1.71 8.61 1.27 8.48 76 1.92 16.34 2.48 13.74 2.02 13.66 2.13 14.68 1.52 11.82 Mean 1.97 10.55 2.17 12.37 2.08 11.51 2.08 11.31 1.42 7.73 mm 8
6
4
2
0
(a) mm 5 4 3 2 1 0
(b) Fig. 3. Distribution of surface registration errors on the pelvis. (a) 2D-3D registration, FOV=160 mm, 8 views. (b) Mode-matching 3D-3D registration. The error is the mean surface distance per vertex over the studied population. The color scales are adjusted to highlight each registration’s individual distribution.
mode matching. The color scales in the two parts are adjusted to highlight the distribution, not to compare the errors. For the 2D-3D registration, the error distribution is asymmetric, with a significant tilt to the right hemipelvis. This may be a side-effect of the selection of imaging trajectory (Fig. 2), which preferred one side of the subject over another. In the mode matching registration, some areas of the bone have higher errors than others, for example: the anterior spine of the ilium, the vicinity of the obturator foramen, and the tip of the sacrum. Some of these “common” errors also appear in the 2D-3D registration.
Deformable 2D-3D Registration of the Pelvis with a Limited FoV
525
They may be regions of higher variation among the studied datasets, where the statistical modes cannot approximate the shape as well as in other locations. Overall, the spatial distribution of errors for both the 2D-3D and 3D-3D appears, visually, quite similar, with the exception of the asymmetry. In future research, we are planning to see if the asymmetry can be reduced by changing the imaging trajectory.
4
Conclusion and Future Work
We have shown that a 2D-3D deformable registration of a shape and density atlas of the pelvis to x-ray images, based on normalized mutual information image similarity, can be robust to image truncation. A typical expected surface registration error is about 2 mm, compared with an “ideal” registration error of about 1.4 mm. A larger FOV can usually yield a better registration accuracy, and the use of more truncated views can improve the accuracy to some degree. This result has important applications in recovering a patient’s anatomical shape in clinical scenarios, when small FOV C-arms are used to image the patient. While the results we present are encouraging, applying the method on real fluoroscopic images is still a challenge. In Fig. 1, we show the result of a deformable 2D-3D registration as green contours of the registered model overlaid on x-ray target images of a cadaveric bone specimen. Qualitatively, we can observe that our model aligns with the bone as a whole, but does not align very well with specific features, such as the obturator foramen and the acetabular rim. It may be that this particular specimen is not a good representative of the studied population, which would decrease the overall quality of the registration. In addition, the MI similarity measure did not produce a good registration with these images, possibly because of different imaging characteristics of dry bone compared with live patient DRR. The results shown were obtained by maximizing the structural similarity index (SSIM) [13]. Human subject images also contain other anatomical detail than the pelvis bones: femurs, spine, and other organ tissues. These affect the image quality significantly, adding clutter and reducing contrast. In initial experiments we conducted, we included a crude “soft tissue” model and used the SSIM index as to achieve a coarse-level registration. Again, the MI measure did not perform as well. It has been suggested (e.g. in [11]) that NMI as an image similarity measure may be more robust to FOV truncation than mutual information. In a previous series of expetiments, we used MI as the objective function. While the rate of successful registrations was higher with NMI, the final accuracy seems roughly the same for both functions. In some of the previous experiments, the asymmetry of error was not as pronounced as in the results here. We would like to thank the anonymous reviewers of this paper for their suggestions, which led to the new experiments in this paper. Further study of different similarity measures for the registration with simulated and real images, and their applicability, is continuing. We are also continuing a search for representative cadaver bones which match well with the training population.
526
O. Sadowsky, G. Chintalapani, and R.H. Taylor
Acknowledgments We would like to thank Lotta Ellingsen and Pauline Pelletier for assisting in creating the atlas. The pelvis datasets were given to us by Dr. Ted DeWeese and Dr. Lee Myers from Johns Hopkins Radiation Oncology. This work was supported in part by NSF ERC Grant EEC9731748, and by NIH/NIBIB research grant R21-EB003616.
References 1. Sadowsky, O., Ramamurthi, K., Ellingsen, L.M., Chintalapani, G., Prince, J.L., Taylor, R.H.: Atlas-assisted tomography: Registration of a deformable atlas to compensate for limited-angle cone-beam tranjectory. In: IEEE International Symposium on Biomedical Imaging (ISBI), IEEE Computer Society Press, Los Alamitos (2006) 2. Fleute, M., Lavall´ee, S.: Nonrigid 3-d/2-d registration of images using statistical models. In: Taylor, C., Colchester, A. (eds.) MICCAI 1999. LNCS, vol. 1679, pp. 138–147. Springer, Heidelberg (1999) 3. Zheng, G., Ballaster, M.A.G., Styner, M., Nolte, L.P.: Reconstruction of patientspecific 3d bone surface from 2d calibrated fluoroscopic images and point distribution model. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 25–32. Springer, Heidelberg (2006) 4. Yao, J., Taylor, R.H.: Assessing accuracy factors in deformable 2d/3d medical image registration using a statistical pelvis model. In: Ninth Int. Conference on Computer Vision, Nice, France, pp. 1329–1334 (2003) 5. Yao, J.: A statistical bone density atlas and deformable medical image registration. PhD thesis, Johns Hopkins University (2002) 6. Cootes,T.F.,Taylor,C.J.,Cooper, D.H.,Graham,J.: Activeshapemodels–theirtraining and application. Computer Vision and Image Understanding 6(1), 38–59 (1995) 7. Mohamed, A., Davatzikos, C.: An approach to 3d finite element mesh generation from segmented medical images. In: IEEE International Symposium on Biomedical Imaging (ISBI), IEEE Computer Society Press, Los Alamitos (2004) 8. Ellingsen, L.M., Prince, J.L.: Mjolnir: Deformable image registration using feature diffusion. In: Proceedings of the SPIE Medical Imaging Conference. vol. 6144, pp. 329–337 (2006) 9. Muller-Merbach, J.: Simulation of x-ray projections for experimental 3d tomography. Technical report, Image Processing Laboratory Department of Electrical Engineering Linkoping University, SE-581 83, Sweden (1996) 10. Sadowsky, O., Cohen, J.D., Taylor, R.H.: Projected tetrahedra revisited: A barycentric formulation applied to digital radiograph reconstruction using higherorder attenuation functions. IEEE Transactions on Visualization and Computer Graphics (TVCG) 12(4), 461–473 (2006) 11. Studholme, C., Hill, D., Hawkes, D.: An overlap invariant entropy measure of 3d medical image alignment. Pattern Recognition, 71–86 (1999) 12. Chintalapani, G., Ellingsen, L.M., Sadowsky, O., Prince, J.L., Taylor, R.H.: Statistical atlases of bone anatomy: Construction, iterative improvement and validation. In: Ayache, N., Ourselin, S., Maeder, A. (eds.) MICCAI 2007. LNCS, vol. 4791, pp. 499–506. Springer, Heidelberg (2007) 13. Zhou, W., Bovik, A.C., Seikh, H.R., Simoncelli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004)
Segmentation-Driven 2D-3D Registration for Abdominal Catheter Interventions Martin Groher1, Frederik Bender1 , Ralf-Thorsten Hoffmann2, and Nassir Navab1 1
2
Computer Aided Medical Procedures (CAMP), TUM, Munich, Germany {groher,benderf,navab}@cs.tum.edu Department of Clinical Radiology, University-Hospitals Munich-Grosshadern, Germany [email protected]
Abstract. 2D-3D registration of abdominal angiographic data is a difficult problem due to hard time constraints during the intervention, different vessel contrast in volume and image, and motion blur caused by breathing. We propose a novel method for aligning 2D Digitally Subtracted Angiograms (DSA) to Computed Tomography Angiography (CTA) volumes, which requires no user interaction intrainterventionally. In an iterative process, we link 2D segmentation and 2D-3D registration using a probability map, which creates a common feature space where outliers in 2D and 3D are discarded consequently. Unlike other approaches, we keep user interaction low while high capture range and robustness against vessel variability and deformation are maintained. Tests on five patient data sets and a comparison to two recently proposed methods show the good performance of our method.
1 Introduction Catheter-guided interventions are carried out on an every-day basis in almost every hospital throughout the world. Common practice in these interventions is to acquire 2D fluoroscopic sequences of the patient in which catheter, contrasted vessels, and patient anatomy can be visualized for navigation. Also, in order to provide a high resolution visualization of the vasculature only, DSAs are taken. A 3D scan (CTA) of the patient is usually acquired preoperatively in order to evaluate possible risks and plan the treatment. In abdominal catheterizations (e.g. Transarterial Chemoembolization (TACE), or Transjugular Portosystemic Shunt (TIPS)) mono-plane X-ray imaging devices are used more often in contrast to neuroradiology interventions where biplane systems are commonly utilized. Only guided by 2D projections of one view, it is often very difficult for the physician to find a path through the patient’s vessel system to the region of interest. This is mainly due to overlay of vessel structures and, in the case of abdominal procedures, the breathing deformation of vessel systems. In order to provide a catheter guidance in 3D, or the transfer of planned information to intraoperative 2D projections, a 2D-3D registration is needed to align 3D preoperative to 2D intraoperative data set. Problems and challenges for this data fusion are as follows. First, due to hard time constraints during the intervention, a 2D-3D registration algorithm should be fast and as automatic as possible during the treatment. Second, vessel features, which are the only features that can be used for DSA-to-CTA registration, are rather distinct in the two N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 527–535, 2007. c Springer-Verlag Berlin Heidelberg 2007
528
M. Groher et al.
data sets because contrast is injected globally in the preoperative, and locally (through the catheter) in the intraoperative case. A naive registration would not be able to deal with these outliers emerging in 2D as well as in 3D and could drive the registration to “false positives”. Third, an automatic extraction of vessel features in the 2D DSA would also segment other tubular structures (needles, catheters), which would disturb a feature-based registration. Fourth, due to patient breathing, structures are deformed in the 2D data set, when compared to the 3D data set. 2D-3D rigid registration in deformable regions can be addressed by a fully intensity-based procedure with gating, as proposed e.g. by Turgeon et al. [1] for heart. However, it is difficult to use such an image-based method without gating information. In fact, the more appropriate approach to follow would be (partly) feature-based to be able to be robust against deformations. Recently, two methods for 2D-3D registration in the particular case of abdominal angiographic alignment were proposed by Jomier et al. [2] and Groher et al. [3]. The first is a model-to-image technique, which only requires a preoperative segmentation step for aligning DSA images of two views. The second is feature-based aligning extracted 2D and 3D vessel graphs in a one-view scenario. The former is fully automatic during the intervention but suffers from a small capture range while the latter has high capture range but requires manual interaction intraoperatively. As will be shown in section 3 both methods have difficulties to deal with outliers in 3D. Contribution: We propose a method for 2D-3D registration of preoperative CTA to intraoperative DSA images for abdominal interventions. It automatically aligns a preoperatively segmented 3D vasculature to a 2D DSA image by iteratively segmenting the image and aligning the extracted 2D and 3D vasculatures. A combination of registration and 2D segmentation via a probability map allows us to adjust the feature spaces such that non-corresponding features in 2D as well as 3D vasculature are removed consequently. With this approach we combine robustness and high capture range with a fully automatic registration technique. Moreover, one-to-one correspondence of vascular features is assured, which makes it possible to visualize roadmaps in 3D. We motivate our method through an maximum likelihood (ML) formulation solving for both registration and segmentation. Unlike traditional ML-based algorithms for combined segmentation/registration, we only care about the resulting registration and also leave the algorithm as generic as possible in order to use alternative registration and segmentation steps. Related Work: A combination of segmentation and 2D-3D registration was proposed by Hamadeh et al. [4] and Bansal et al. [5] for the rigid alignment of medical images. The former only segmented once for aiding the registration, the latter used a minimax-approach in order to optimize one energy functional that encapsulates the joint conditional entropy of DRR and X-ray given a segmentation. A recent method proposed by Brox et al. [6] combines pose estimation with a level set formulation to let a registration aid the segmentation. In all these methods, the segmentation is integrated into the algorithm and cannot be replaced. Since vessel segmentation is a specific problem where general approaches cannot be applied without modification we tried to leave the combination as generic as possible and discarded these methods. Combined segmentation and registration has also been successfully applied to brain MR images
Segmentation-Driven 2D-3D Registration for Abdominal Catheter Interventions
529
(see Pohl et al. [7] and references therein) where an Expectation Maximization ( EM) formulation was favored. In contrast to our proposed algorithm, this work also solves for MR specific nuisance parameters and serves a diagnostic application not subject to hard time constraints.
2 Method We now describe our segmentation-driven 2D-3D registration algorithm. We first justify our approach via an ML formulation integrating a segmentation into the registration process and derive a generic algorithm. In section 2.2 we apply our algorithm to 2D-3D DSA-to-CTA registration for abdominal interventions. 2.1 MLE with Labelmaps We want to maximize the probability such that certain registration parameters Θ fit best the 2D image data I and the 3D model M : ˆ = arg max P(Θ|I , M ) Θ Θ
(1)
Maximizing the likelihood of (1), i.e. arg maxΘ P(I , M |Θ), is very difficult if there is no correspondence information between image pixels and model points. Thus, we let a 2D segmentation aid the estimation. We introduce a random variable L representing a labelmap over the image I . Marginalizing over L we get ˆ = arg max ∑ P(Θ, L |I , M ) Θ Θ
(2)
L
= arg max ∑ P(L |I , M )P(Θ|L , I , M ) Θ
(3)
L
using the product rule. From this formulation we can deduce an iterative scheme. If we had values for variable L given, we could solve the ML of eq. (3). Since L has to be estimated also, we iterate between expectation estimation ( E-step) of the unknown random variable L and optimization of a cost function (M-step) given this expectation:
L (t) ← E(L |Θ(t−1) , I , M ) = ∑ L P(L |Θ(t−1) , I , M )
(4)
L
ML
Θ(t) ← arg max P(Θ|L (t) , I , M ) ∝ argmax P(L (t) , I , M |Θ) Θ
Θ
(5)
The M-step (eq. (5)) is rather easy to accomplish since we already have a model in 3D (M ) and can determine the MLE using L (t) in a model-to-model registration. For the Estep (eq. (4)), however, we must determine the expectation value of the labelmap given the last registration and the data. Since this is not straight-forward, we will discuss it in more detail. Assuming spatial independence of pixels in image I (which is common in this context, see [7,5]), we can determine the expectation for each pixel x separately. If we restrict our segmentation on one object only, we can deduce an indicator variable x for each pixel x, where
530
M. Groher et al.
x =
1, if x is inside the object 0, otherwise
(6)
Thus, the expectation for the label x of a pixel x becomes E(x |Θ(t−1) , I , M ) = P(x = 1|Θ(t−1) , I , M ) = αP(Θ
(t−1)
|x = 1, I , M )P(x = 1|I )
(7) (8)
using Bayes’ rule, where α = P(Θ |I , M ), and M is dropped in the last term of (8) since the segmentation of I is independent of the model. With eq. (8) we can assign the expectation of the segmentation to each pixel and thus get a probability map IL (t) for L (t) . We can interpret this map as the probability for each pixel to be registered (has a correspondence) to the model, given that it is part of the segmented object combined with the a-priori probability to be part of the segmented object. Note that we see the expectation as a probability where we joined the registration parameters from the last iteration and the a-priori knowledge of a pixel belonging to an object. We still keep the freedom to choose any kind of binarization technique, which we apply to the probability map. We can give a generic algorithm for the segmentationdriven 2D-3D registration, which we will henceforth refer to as EBM algorithm: (t−1)
Algorithm EBM: Given an image I , a model M , and initial estimates for the parameters Θ(0) , and labelmap L (0) Initialize values: Θ(t−1) ← Θ(0) ; L (t−1) ← L (0) Repeat until convergence E -step: For each pixel x: if x = 1 determine the probability for the new label using eq. (8), else set probability to zero B-step: Binarize IL (t) to get L (t) . M-step: Register M to L (t) starting from Θ(t−1) to get Θ(t) Note that our method does not follow the strict formulation of the EM algorithm [8], Θ(t) = arg max ∑ P(L |Θ(t−1) , I , M )P(L , I , M |Θ). Θ
(9)
L
In our algorithm, we directly calculate the expectation of the hidden variable L , whereas EM calculates the expectation of the probability of the complete data (L , I , M ) given the incomplete data (I , M ) and an estimate of Θ. Unlike EM, convergence is not proven for our approach. In our experiments, however, the algorithm always converged given suitable termination criteria. 2.2 Segmentation-Driven 2D-3D Registration on Angiographic Data We now describe the single steps of EBM in the particular case of 2D-3D registration of 3D CTA and 2D DSA data sets as illustrated in Fig. 1. The object to be segmented in order to aid the 2D-3D registration is, naturally, the vessels of the DSA. Given a DSA image I (Fig. 1(a)) and the vasculature model M (laid over the DSA in Fig. 1(f)) of a CTA volume. The vessel model M is generated by a region growing step and a centerline extraction. Seed points and thresholds are determined manually. Note that the CTA data set is acquired preoperatively and thus, the extraction of vasculature is
Segmentation-Driven 2D-3D Registration for Abdominal Catheter Interventions
(a)
(b)
(c)
(d)
(e)
(f)
531
Fig. 1. Illustration of the segmentation-driven registration: 1(a) original DSA, 1(b) bothat filtered DSA, 1(c) initial segmentation (automatic seed point detection, region growing), 1(d) probability map penalizing non-corresponding but extracted features in 2D and 3D, 1(e) final segmentation after registration - increased feature similarity, 1(f) overlay of 3D vasculature and DSA
not subject to hard time constraints1 The 3D point cloud that spatially describes M , i.e. sampling points on vessel centerlines and bifurcation locations, is denoted by {X j }. We now want to find the rigid registration parameters Θ = {α, β, γ,tx ,ty ,tz } describing the transformation of the model coordinate system into the coordinate system of the C-arm2 with which the DSA has been acquired. The intrinsic parameters are given and we assume an absence of image distortion due to flat-panel detectors. We consider all image intensities normalized to belong to the domain Ω = [0; 1]. Image Preprocessing: As initialization for the a-priori probability P(x = 1|Ix ) we choose a bothat filtered image whose contours are sharpened by histogram equalization (Fig. 1(b)). We refer to this filtered image as I BH . 2D Segmentation & Model Creation: The initial (L (0) , see Fig. 1(c)) as well as all subsequent segmentations of the DSA are computed using a region growing technique based on intensity thresholds. The seed points are detected using a derivative-mean fil1 2
In fact, the procedure takes about 5 min at most. Siemens Axiom Artis dFA.
532
M. Groher et al.
ter defined by Can et al. [9] for vessel tracing. It detects pixels that are likely to lie on a vessel centerline by filter inspection in 12 directions and criteria evaluation. This method is very fast and yields decent candidate seeds. In order to start with a segmentation, we choose the intensity values for the region growing to be inside the interval µseeds ± 2σseeds , which are mean and standard deviation of the intensity values of all detected seed points. We start the region growing from all detected seed points. Outliers are removed by choosing the largest connected region as the actual segmentation. From the segmentation we create a model of a 2D vessel centerline to be able to register it with the 3D model M and deduce a diameter, which is used as σ in the E-step. This is done with the same technique as in 3D. Initial Registration: Θ(0) is determined by combining information from the C-arm with an exhaustive feature search following the approach proposed by Groher et al. [3]. Iteration: We define the pixel error of a pixel x as ε(x) = d(x,C(x, {PΘXj }))2
(10)
where d(., .) determines the Euclidean distance of two vectors, C(y, {z j }) determines the closest point of a point set {z j } to a point y, {X j } are all points on the 3D centerline, and PΘ = K[RΘ |tΘ ] is the projection matrix with the current pose parameters Θ. E -step: This step computes the probability map as defined in eq. (8). The probability that a vessel pixel is registered to the 3D model, P(Θ(t−1) |x = 1, I , M ), is defined via the pixel error (eq. (10)). If we allow a deviation proportional to the maximal width of a vessel in the 2D image, σ, and assume the error distribution to be Gaussian, we get 2 1 P(Θ(t−1) |x = 1, I , M ) = √ e−ε(x)/σ σ 2π
(11)
The a-priori probability for a pixel x to lie inside a vessel is defined by the image I BH as described above, i.e. P(x = 1|I ) = I BH (x). Putting both terms together, we define the probability map as E(x |Θ(t−1) , I , M ) ∝ e−ε(x)/σ · I BH (x), 2
(12)
where we dropped α from eq. (8) since it just represents an isotropic scaling in pixel intensities of IL (t) . B-step: After building up the map (Fig. 1(d)), we perform a new region growing (Fig. 1(e)) and centerline extraction as described above to get a new 2D centerline. M-step: The 2D-3D registration is computed by minimizing the pixel error ε, which is evaluated only on the 2D centerline points. If a projected centerline point PΘ Xj already has a closest point, the one with the smaller error is chosen. Thus, we assure one-to-one correspondence of centerline features. The iteration of the registration is governed by a Downhill Simplex optimizer minimizing the non-linear cost function ∑x ε(x), where x has a corresponding (closest) point in the 3D model M . We stop the algorithm if the absolute difference of the two labelmaps of current and last E-step, |L (t−1) − L (t) |, is very small. We choose a threshold of 5% of the pixels in an image (size 10242), at which the difference of the labelmaps becomes visually insignificant.
Segmentation-Driven 2D-3D Registration for Abdominal Catheter Interventions
533
3 Results We acquired data from 5 patients, who suffer from Hepatocellular carcinoma (HCC) and were treated with TACE. Fig. 2 shows three data sets, the upper row illustrates the original DSA, the lower row overlays with the registered 3D vasculature. The results show that our approach can cope with both variations in 2D as well as in the 3D data set.
( ) (a)
(b)
(c)
(d)
(e)
(f)
Fig. 2. Registration results on 3 data sets. 2D vessel trees (upper row) are different as are 3D vasculatures (laid over DSA, lower row). Fig. 2(e) shows 3D vasculature with vessels that are not visible in 2D. Large deformations between 2D and 3D can be seen in the lower part of Fig. 2(d) or 2(e).
For testing accuracy and robustness, reference registration parameters as well as a segmentation of the DSA have been manually created by experienced radiologists for each data set. We added random errors of up to ±5mm and ±5◦ to each of the translation and rotation parameters, respectively, and invoked the registration method 200 times. The pose parameters are given relative to the origin of the object coordinate system, which was laid in the middle of the moving 3D model. Moreover, as a comparison, we implemented the methods of Jomier et al. [2] and Groher et al. [3] and compared standard deviation (σ) and root mean square errors (ε) for all parameters.
534
M. Groher et al.
Table 1. Results of standard deviations σ and RMS errors ε of translations (in mm) and rotations (in deg). Deviation and error in z-translation are not as significant as those in-plane, or in rotations. # 1
σtx
Jomier 0.76 Groher 0.20 EBM 0.54 2 Jomier 0.74 Groher 0.16 EBM 0.04 3 Jomier 3.34 Groher 2.32 EBM 3.55 4 Jomier 22.45 Groher 6.30 EBM 0.95 5 Jomier 26.35 Groher 20.65 EBM 1.22 Avrg. Jomier 10.73 Groher 5.93 EBM 1.26
σty
σtz
σα
σβ
σγ
εtx
εty
εtz
εα
εβ
εγ
1.80 0.06 2.68 0.93 1.33 0.16 2.03 4.56 1.53 6.65 0.89 0.18 16.37 1.82 0.74 5.60 1.70 1.06
4.66 3.43 34.63 5.06 4.52 1.99 7.39 43.72 24.11 13.73 55.10 9.41 10.34 148.41 16.83 8.23 51.04 17.39
3.48 0.08 3.79 4.84 4.80 0.45 4.43 3.84 7.47 7.74 4.55 0.29 7.15 14.88 3.05 5.53 5.63 3.01
3.17 1.89 1.88 3.18 2.59 0.18 4.37 3.98 3.33 24.64 6.82 1.96 13.60 12.29 2.07 9.79 5.5 1.88
1.60 0.11 1.25 0.58 1.27 0.04 2.04 4.78 4.62 9.15 4.09 0.10 5.81 15.10 0.67 3.83 5.07 1.34
0.78 0.34 0.54 0.78 1.24 0.07 4.68 6.72 3.54 23.18 9.05 1.09 26.66 52.74 1.41 11.21 14.02 1.33
2.21 0.23 2.70 3.46 1.49 0.25 2.96 6.21 1.54 6.84 2.02 0.19 16.37 3.39 0.74 6.37 2.67 1.08
5.30 6.58 36.12 5.38 28.48 3.73 7.43 62.07 24.36 13.81 70.69 13.04 10.87 422.62 20.19 8.56 118.09 19.49
4.75 0.22 3.79 8.33 5.28 0.72 5.62 5.19 7.49 8.40 7.54 0.36 7.25 28.76 3.05 6.87 9.39 3.08
3.60 4.41 1.97 3.18 3.85 0.35 4.39 11.58 3.33 25.41 14.73 2.62 18.63 16.95 2.44 11.04 10.30 2.14
1.66 0.38 1.25 0.78 2.33 0.04 2.07 4.87 4.62 9.16 4.99 0.10 6.07 24.40 0.76 3.95 7.39 1.35
For the method of Groher et al. we used the reference 2D segmentation for 2D graph creation and registration. The result of this study is summarized in table 1. It can be seen that error and deviation in z-translation is sometimes outperformed by the other two methods, however, the more important in-plane translation and rotations have less error in our method. The large errors of Jomier’s and Groher’s method in the last two data sets can be explained with a “subset” property. In the first three data sets the 3D vasculature is a “subset” of the 2D vasculature, whereas in the last two data sets this property is not fulfilled, i.e. the 3D vasculature shows branches that are not visible in 2D. The results show that our method is more robust with respect to variability in both dimensions. The number of iterations of our algorithm usually lies between 2 and 5. The runtime (analyzed on a Intel Core2Duo 2.6 GHz machine) splits into (bothat-) filtering (28.5 sec), seed point extraction (1.0 sec), region growing (0.3 sec), centerline extraction (2.7 sec), exhaustive initialization (34.9 sec), iteration3 (12.9 sec), where all runtimes have been averaged over the 5 patient data sets and the iteration runtime over the number of iterations. Altogether, applying the registration takes 1.5 - 2 min. The two critical stages are filtering and exhaustive initialization - both can be further optimized numerically.
4 Conclusion We have developed a method for 2D-3D registration of angiographic data. Our emphasis lies on a fully automatic registration once the interventionalist starts the treatment. We believe that a 2D segmentation can yield a more robust (feature-based) registration with high capture range. Motivated by an ML formulation of a combined segmentation/registration, we derived a generic method for estimating the 2D labelmap and the 3
Region growing, centerline extraction, and pose optimization.
Segmentation-Driven 2D-3D Registration for Abdominal Catheter Interventions
535
registration parameters iteratively linked by a common probability map. In this probabilistic framework, we keep the freedom to choose any segmentation or registration technique. Unlike other approaches, we keep user interaction low while high capture range and robustness against vessel variability and deformation are maintained. With the segmentation-driven registration, we create a common feature space and thus oneto-one correspondence of vessel features. Hence, we can visualize catheter locations and roadmaps in 3D and 2D simultaneously (see supplementary material). Application to different medical procedures and tests of alternative 2D segmentation and/or registration steps will follow the work in the future. With these future tests, we will also assess the impact of all processing steps, which are performed in the iteration. Acknowledgements. This research was funded by an academic grant from Siemens Medical Solutions Angiography/X-Ray division, Forchheim, Germany. The authors would like to thank in particular K. Klingenbeck-Regn and M. Pfister for their continuous support.
References 1. Turgeon, G.A., Lehmann, G., Guiraudon, G., Drangova, M., Holdsworth, D., Peters, T.: 2D3D registration of coronary angiograms for cardiac procedure planning and guidance. Medical Physics 32, 3737–3749 (2005) 2. Jomier, J., Bullitt, E., van Horn, M., Pathak, C., Aylward, S.: 3D/2D model-to-image registration applied to tips surgery. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 662–670. Springer, Heidelberg (2006) 3. Groher, M., Padoy, N., Jakobs, T., Navab, N.: New CTA protocol and 2D-3D registration method for liver catheterization. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 873–882. Springer, Heidelberg (2006) 4. Hamadeh, A., Lavalle, S., Cinquin, P.: Automated 3-dimensional computed tomographic and fluoroscopic image registration. Computer Aided Surgery 3, 11–19 (1998) 5. Bansal, R., Staib, L.H., Chen, Z., Rangarajan, A., Knisely, J., Nath, R., Duncan, J.S.: Entropybased, multiple-portal-to-3DCT registration for prostate radiotherapy using iteratively estimated segmentation. In: Taylor, C., Colchester, A. (eds.) MICCAI 1999. LNCS, vol. 1679, pp. 567–578. Springer, Heidelberg (1999) 6. Brox, T., Rosenhahn, B., Weickert, J.: Three-dimensional shape knowledge for joint image segmentation and pose estimation. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) Pattern Recognition. LNCS, vol. 3663, pp. 109–116. Springer, Heidelberg (2005) 7. Pohl, K.M., Fisher, J., Grimson, Kikinis, Ron, Wells, W.M.: A bayesian model for joint segmentation and registration. NeuroImage 31, 228–239 (2006) 8. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Soc. Series B 39, 1–38 (1977) 9. Can, A., Shen, H., Turner, J., Tanenbaum, H., Raysam, B.: Rapid automated tracing and feature extraction from retinal fundus images using direct exploratory algorithms. IEEE Transactions on Information Technology in Biomedicine 3, 125–138 (1999)
Primal/Dual Linear Programming and Statistical Atlases for Cartilage Segmentation Ben Glocker1,2, Nikos Komodakis2,4, Nikos Paragios2, Christian Glaser3 , Georgios Tziritas4 , and Nassir Navab1 1
3
Computer Aided Medical Procedures (CAMP) Technische Universit¨at M¨unchen {glocker,navab}@in.tum.de 2 GALEN Group, Laboratoire de Math´ematiques Appliqu´ees aux Syst`emes Ecole Centrale de Paris Department of Clinical Radiology, Ludwig-Maximilians-Universit¨at M¨unchen 4 Computer Science Department, University of Crete
Abstract. In this paper we propose a novel approach for automatic segmentation of cartilage using a statistical atlas and efficient primal/dual linear programming. To this end, a novel statistical atlas construction is considered from registered training examples. Segmentation is then solved through registration which aims at deforming the atlas such that the conditional posterior of the learned (atlas) density is maximized with respect to the image. Such a task is reformulated using a discrete set of deformations and segmentation becomes equivalent to finding the set of local deformations which optimally match the model to the image. We evaluate our method on 56 MRI data sets (28 used for the model and 28 used for evaluation) and obtain a fully automatic segmentation of patella cartilage volume with an overlap ratio of 0.84 with a sensitivity and specificity of 94.06% and 99.92%, respectively.
1 Introduction Degeneration of knee joint cartilage is an important and early indicator of osteoarthritis (OA) which is one of the major socio-economic burdens nowadays [1]. Accurate quantification of the articular cartilage degeneration in an early stage using MR images is a promising approach in diagnosis and therapy for this disease [2]. Particularly, volume and thickness measurement of cartilage tissue has been shown to deliver significant parameters in assessment of pathologies [3,4,5]. Here, accurate computer-aided diagnosis tools could improve the clinical routine where image segmentation plays a crucial role. In order to overcome the time-consuming and tedious work of manual segmentation, one tries to automate the segmentation as much as possible. Many automatic and semi-automatic cartilage segmentation methods have been proposed. Folkesson et al. [6] propose a hierarchical classification scheme for automatic segmentation of cartilage in low field MR images. A semi-automatic method based on watershed transformation and pre-segmentation using [6] is presented by Dam et
This work is partially supported by Siemens Medical Solutions, Erlangen, Germany.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 536–543, 2007. c Springer-Verlag Berlin Heidelberg 2007
Primal/Dual Linear Programming and Statistical Atlases for Cartilage Segmentation
537
al. [7]. An earlier work by Grau et al. [8] proposes an extension to the standard watershed transformation for semi-automatic segmentation. Cheong et al. [9] compare different model-based approaches. A very recent work of Fripp et al. [10] proposes automatic segmentation of bone in order to extract the bone-cartilage interfaces (BCI) with promising results. However, similar to [7] we believe that current automatic approaches cannot achieve the high accuracy and precision needed in the clinical application. In most cases, interactive refinement is needed. Still, a good automatically achieved initialization could improve the daily work of radiologists, immensely. Therefore, we developed a novel approach for automatic segmentation of cartilage using a statistical atlas and efficient primal/dual linear programming. Our results provide a very good initialization for subsequent interaction steps and may even provide good enough segmentation results for certain applications. The remainder of this paper is organized as follows; In Section 2 we present the construction of the probabilistic atlas, while in Section 3 we derive the formulation of the atlas-matching problem in a discrete setting. In Section 4 we briefly introduce the efficient optimization method based on primal/dual linear programming. Our results are presented in Section 5 and compared to the related methods. The last Section concludes our paper.
2 Probabilistic Atlas Construction Let us assume that n cartilage registered volumes are available V = {V1 , V1 , ..., Vn }. The task of atlas construction refers to the extraction of a model that combines the intensities of the training set to an average volume, a rather simplistic dimensionality reduction. On the other hand, a statistical atlas might be able to capture the variations of the training set, and often consists of – VM : Ω → R + that is an optimal representative volume - according to some criterion - derived from the training set, – σM : Ω → R + that is a variance map, determined according to the agreement between the atlas and the training set, – px (i) : a pd f defined at each voxel x which can be for instance a Gaussian density, px (i) =
√ 1 e 2πσM (x)
−
(i−VM (x))2 2σ (x)2
M
.
with Ω being the volume domain. Towards the construction of such a probabilistic atlas representation [11] one can consider solving the inference problem at the voxel level (x). Given a set of values [V1 (x), ..., Vn (x)], recover a distribution (px (i)) that has an optimal support from the data. The maximum posterior of this distribution along the training samples (assuming independences between voxels and using the [-log] of the density) is equivalent to minimizing n (Vi (x) − VM (x))2 2 E(VM , σM ) = dΩ. (1) ∑ log(σM (x)) + 2σ2M (x) Ω i=1 However, volumes correspond to a collection of organs which exhibit certain spatial and intra-subject smoothness properties, therefore one can expect a smooth probabilistic
538
B. Glocker et al.
atlas preserving contours/boundaries between organs. We can introduce such abstract constraints using regularization terms [12], which penalize the spatial derivatives of the field to be recovered: n (Vi (x) − VM (x(s))2 2 E(VM , σM ) = α dΩ ∑ log(σM (x)) + 2σ2M (x) Ω i=1 (2) +
Ω
ψ (∇σM (x)) dΩ +
Ω
g (|∇VM (x)|) ψ (∇VM (x)) dΩ,
where α is a constant that balances the data-fidelity and the smoothness term, ψ (the quadratic form is considered in the scope of this paper) is a regularization function, and g(|∇V |) = 1+|∇1 V |a a monotonically decreasing function which overwrites the smoothness constraint when a consensus for being at an edge is observed. The calculus of variations and a gradient descent method can now be used to recover the solution for the prior model (VM , σM ). One can initialize the process using one of the training volumes and constant variances. Furthermore, one can only consider a sub-domain of the volume domain which refers to a narrow-band zone from the cartilage since this is the component of interest.
3 Cartilage Segmentation Let us now consider a new volume V : [1, X] × [1,Y ] × [1, Z] to be segmented. We can reformulate segmentation as finding the region of interest in this volume, which best matches the atlas. In general, the two volumes are related with a non linear transformation T , that minimizes E(T ) =
Ω
ρ(V (T (x)), VM (x), σM (x))dx =
Ω
ρM (V (T (x)))dx
(3)
where ρM is a distance metric used to determine the most meaningful correspondence between the atlas and the image domain Ω. The distance metric in our approach could be explicitly defined using the probabilistic nature of the atlas, since VM (x) is assumed to be sample drawn from the density px (), therefore one can expect that the optimal segmentation will project the voxel x of the atlas to image intensity V (T (x)) that refers to the maximum of px () leading to the minimization of (V (T (x)) − VM (x))2 2 ρM (V (T (x))) = log(σM (x)) + (4) 2σ2M (x) Since we are interested in local registration (assuming a global pre-registration exists), let us introduce a deformation grid G : [1, K] × [1, L] × [1, M] (usually K X, L Y , and M Z) super-imposed to the atlas. The central idea of our approach is to deform the grid (with a 3D displacement vector d p for each control point) such that the structures in the atlas and the image to be segmented are perfectly aligned. The transformation of a voxel x can be expressed using a linear or non-linear combination of the grid points, or T (x) = x + D (x), D (x) = ∑ η(|x − p|) d p (5) p∈G
Primal/Dual Linear Programming and Statistical Atlases for Cartilage Segmentation
539
where η(·) is the weighting function measuring the contribution of the control point p to the displacement field D . The position of point p is denoted as p. In such a theoretical setting without loss of generality we consider Free Form Deformations (FFD) based on cubic B-splines as a transformation model which have been often considered for image registration [13]. By defining the atlas-matching problem based on such a deformation model we can now reformulate the criterion earlier introduced, Edata (T ) =
∑
p∈G
Ω
η−1 (|x − p|) ρM (V (T (x)))dx
(6)
where η−1 (·) is the inverse projection for the contribution to the objective of the image point x according to the influence of the control point p. Such a term will guarantee intensity correspondence between the two images. Hence, this term is also called the data term. The transformation due to the interpolation inherits some implicit smoothness properties. However, in order to avoid folding of the deformation grid, one can consider a smoothness term on the grid domain, or Esmooth (T ) =
∑ φ(|∇G d p |)
(7)
p∈G
with φ being a smoothness penalty function for instance penalizing the first derivatives of the grid deformation. The complete term associated with the registration problem is then defined as the sum of the data and smoothness term. The most common way to obtain the transformation parameters is through the use of a gradient-descent method in an iterative approach which due to the non-convexity of the cost function could produce sub-optimal results. One way to overcome this constraint is through the use of more efficient optimization techniques, like combinatorial programming [14]. Let us now consider a discrete set of labels L = {u1 , ..., ui } corresponding to a quantized version of the deformation space × = {d 1 , ..., d i }. A label assignment u p to a grid node p is associated with displacing the node by the corresponding vector d u p . The image transformation associated with a certain discrete labeling u becomes
D (x) =
∑ η(|x − p|) d u p .
(8)
p∈G
One can reformulate the registration as a discrete optimization problem, that is assign individual labels u p to the grid nodes such that Edata (u) =
∑
p∈G
Ω
η−1 (|x − p|)ρM (V (T (x)))dx ≈
∑ Vp(u p)
(9)
p∈G
where Vp (·) represents a local similarity metric. In this setting, the singleton potential functions Vp(·) are not independent, thus the defined data term can only be approximated. Hence, we pre-compute the |L | × |G | data term look-up table for the atlas and a given image by simple shift operators. The entry for node p and labels u p is determined by Vp (u p ) =
Ωp
ρM (V (T (x)))dx.
(10)
540
B. Glocker et al.
We determine the metric directly from the image patch Ω p centered at node p. Thus, the weighting function η−1 (·) can be neglected. In order to define the smoothness in the label domain one can express distances between the deformation vectors using difference between labels if a ranking has been considered within the definition of the label set, or Esmooth (u) =
∑
Vpq (u p , uq ), Vpq (u p , uq ) = min (|d u p − d uq |, T )
(11)
p,q∈E (p)
where E represents the neighborhood system associated with the deformation grid G . For the distance Vpq (·, ·) we consider a simple piecewise smoothness truncated term based on the euclidean geometric distances between the deformations corresponding to the assigned labels with T being the maximum penalty. Such a smoothness term together with the data term allows to convert the problem of image registration into the form of a Markov Random Field (MRF) in a discrete domain, or Etotal (u) =
∑ Vp(u p) + ∑
p∈G
Vpq (u p , uq ).
(12)
p,q∈E (p)
4 Linear Programming For optimizing the above discrete Markov Random Field, we will make use of a recently proposed method, we call Fast-PD [15]. This is an optimization technique, which builds upon principles drawn from the duality theory of linear programming in order to efficiently derive almost optimal solutions for a very wide class of NP-hard MRFs. Instead of working directly with the discrete MRF optimization problem above, Fast-PD first reformulates that problem as an integer linear programming problem (the primal problem) and also takes the dual of the corresponding LP relaxation. Given these 2 problems, i.e. the primal and the dual, Fast-PD then generates a sequence of integral feasible primal solutions, as well as a sequence of dual feasible solutions. These two sequences of solutions make local improvements to each other until the primal-dual gap (i.e. the gap between the objective function of the primal and the objective function of the dual) becomes small enough. Once this happens, the last generated primal solution is guaranteed to be an approximately optimal solution, i.e. within a certain distance from the optimum (in fact, this distance can be shown to be smaller than the achieved primal-dual gap). This is exactly what the next theorem, also known as the primal-dual principle, states. Primal-Dual Principle 1 (Primal-Dual principle). Consider the following pair of primal and dual linear programs: D UAL : max bT y P RIMAL : min cT x s.t. Ax = b, x ≥ 0 s.t. AT y ≤ c and let x, y be integral-primal and dual feasible solutions, having a primal-dual gap less than f , i.e.: cT x ≤ f · bT y. Then x is guaranteed to be an f -approximation to the optimal integral solution x∗ , i.e., cT x ∗ ≤ cT x ≤ f · cT x ∗
Primal/Dual Linear Programming and Statistical Atlases for Cartilage Segmentation
(a)
(b)
(c)
541
(d)
Fig. 1. (a) VM and (b) σM of the statistical atlas. In (a) the region of interest for the local registration and the atlas segmentation are overlaid. (c) Automatic segmentation compared to (d) ground truth. For this example the DSC of the segmented volume is 0.90 and the sensitivity is 93.28%. The lower interface between cartilage and surrounding soft-tissue is well preserved.
Fast-PD is a very general MRF optimization method, which can handle a very wide class of MRFs. Essentially, it only requires that the pairwise potential function is nonnegative (i.e., Vpq(·, ·) ≥ 0). Furthermore, as already mentioned, it can guarantee that the generated solution is always within a worst-case bound from the optimum. In fact, besides this worst-case bound, it can also provide per-instance approximation bounds, which prove to be much tighter, i.e. very close to 1, in practice. It thus allows the global optimum to be found up to a user/application bound. Finally, it provides great computational efficiency, since it is typically 3-9 times faster than any other MRF optimization technique with guaranteed optimality properties.
5 Experiments We evaluated our segmentation method on 56 T2-weighted MRI data sets of probands each with a resolution of 256x256x20 and a voxel size of 0.625x0.625x3.0 millimeters. The patella cartilage is manually segmented by clinical experts in a slice-by-slice fashion in all volumes. We are using 28 data sets for the atlas construction and segment automatically the remaining 28 data sets using the atlas-matching approach described above. We perform a (standard) global pre-registration with 9 degrees of freedom (translation, rotation, and scaling) using available methods1 . In such a pre-registration we only consider the VM of the atlas. The global registration provides a sufficient initialization for our subsequent non-rigid atlas-matching. All experiments are done on an Intel Centrino 2.16 GHz machine. A full registration of two images including the global pre-registration takes about 45 seconds where our non-rigid atlas-matching takes less than 20 seconds. The matching is done in a pyramidal fashion with two resolution levels. The quantized displacement space is sampled with five steps in the six main 3D directions with a step size of 0.8 mm for the coarser resolution level and 0.25 mm on the full volume resolution. On each level we perform ten iterations which are sufficient to let the matching converge. The control point spacing of the deformation grid is set to 10 mm for the coarser level and 5 mm for the finer pyramid level. 1
National Library of Medicine Insight Segmentation and Registration Toolkit.
542
B. Glocker et al.
Table 1. Comparison of different methods for cartilage segmentation. In the upper part the results of semi-automatic methods are shown. The lower part shows fully automatic methods. Semi-automatic
DSC Mean (Std) Sensitivity Specificity Interaction Cartilage
Grau et al. [8] 0.90 (±0.01) 90.03% 99.87% 5-10 min Dam et al. [7] 0.92 (±n/a) 93.00% 99.99% max 10 min Automatic DSC Mean (Std) Sensitivity Specificity Interaction Cheong et al. [17] 0.64 (±0.15) 74.00% n/a 0 Cheong et al. [17] 0.72 (±0.09) 79.00% n/a 0 Folkesson et al. [6] 0.80 (±0.03) 90.01% 99.80% 0 Our Approach 0.84 (±0.06) 94.06% 99.92% 0
Tibia, Femur, Patella Tibia, Femur Cartilage Medial Tibia Lateral Tibia Tibia, Femur Patella
In order to evaluate the segmentation results we compute several common measurements [16], namely the Dice similarity coefficient (DSC), the sensitivity, the specificity, and the average surface distance (ASD) from the manual segmentations. The ASD is computed in millimeters from an anisotropic 3D Euclidean distance transform of the automatic and manual segmentation surfaces and the overlay of the respective surface. We achieve an ASD of 0.49 (±0.23) mm which is below the voxel size. In Table 5 we compare our results to methods reported in the literature. However, due to the variability in the MRI sequences, volume resolution, and target anatomy such a direct comparison can only give an idea about the performance of the different methods. Compared to the semi-automatic methods [7,8] we get a worse DSC, but a slightly better sensitiv- Fig. 2. Color-encoded visuality and specificity. Assuming that most of the errors occur ization of the average surat the boundaries, small object such as patella cartilage are face distance for the example penalized and get lower DSC score than larger objects such shown in Fig. 4 (c)/(d) as the tibial or femoral cartilage. Still, our approach obtains better results than the automatic classification scheme proposed in [6]. From [17] we consider only the results of the proposed Patch-based Active Appearance Model, since it performs best in their evaluation of different automatic model-based approaches. Fripp et al. [10] report a median DSC for the bone-cartilage interface (BCI) of 0.93. However, the extraction of BCI does not provide closed volumes and thus is not directly comparable to our method.
6 Conclusion In this paper we have proposed a novel approach for fully automatic segmentation of knee cartilage using a statistical atlas and efficient primal/dual linear programming. We could achieve segmentation results for proband data with a mean overlap ratio of 0.84, a sensitivity and specificity of 94.06% and 99.92% compared to manual expert segmentation. Such an automatic approach could provide significant initialization improvements in applications with high accuracy constraints where interactive refinements are unavoidable. In less constrained applications such as rough classification tasks our method could even provide the final results. We will evaluate the robustness of our approach on patient data including higher variations in shape appearance. In future work also more complex priors including shape models will be investigated and the method will also be extended to other anatomy.
Primal/Dual Linear Programming and Statistical Atlases for Cartilage Segmentation
543
References 1. Yelin, E.: Cost of musculoskeletal diseases: Impact of work disability and functional. J. Rheumatol. 68(Suppl.), 8–11 (2003) 2. Eckstein, F., Cicuttini, F., Raynauld, J., Waterton, J., Peterfly, C.: Magnetic resonance imaging (mri) of cartilage in knee osteoarthritis (oa): morphological assesment. Osteoarthr. Cartil. 14, 46–75 (2006) 3. Kauffmann, C., Gravel, P., Godbout, B., Gravel, A., Beaudoin, G., Raynauld, J.P., MartelPelletier, J., Pelletier, J.P., de Guise, J.: Computer-aided method for quantification of cartilage thickness and volume changes using mri: validation study using a synthetic model. IEEE Biomedical Engineering 50(8), 978–988 (2003) 4. Tamez-Pena, J., Barbu-McInnis, M., Totterman, S.: Unsupervised definition of the tibiafemoral joint regions of the human knee and its applications to cartilage analysis. In: SPIE Medical Imaging, San Diego (2006) 5. Tang, J., Millington, S., Acton, S., Crandall, J., Hurwitz, S.: Surface extraction and thickness measurement of the articular cartilage from mr images using directional gradient vector flow snakes. IEEE Biomedical Engineering 53(5), 896–907 (2006) 6. Folkesson, J., Dam, E., Olsen, O.F., Pettersen, P., Christiansen, C.: Automatic segmentation of the articular cartilage in knee mri using a hierarchical multi-class classification scheme. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, Springer, Heidelberg (2005) 7. Dam, E., Folkesson, J., Pettersen, P., Christiansen, C.: Semi-automatic knee cartilage segmentation. In: SPIE Medical Imaging, San Diego (2006) 8. Grau, V., Mewes, A., Alcaniz, M., Kikinis, R., Warfield, S.: Improved watershed transform for medical image segmentation using prior information. IEEE Medical Imaging 23(4), 447– 458 (2004) 9. Cheong, J., Suter, D., Cicuttini, F.: Development of semi-automatic segmentation methods for measuring tibial cartilage volume. In: Digital Image Computing: Technqiues and Applications, DICTA 2005. Proceedings, pp. 307–314 (2005) 10. Fripp, J., Crozier, S., Warfield, S., Ourselin, S.: Automatic segmentation of the bone and extraction of the bone-cartilage interface from magnetic resonance images of the knee. Physics in Medicine and Biology (2007) 11. Rousson, M., Paragios, N.: Prior knowledge, level set representations and visual grouping. International Journal of Computer Vision (in press) 12. Tikhonov, A.: Ill-posed problems in natural sciences, Coronet (1992) 13. Schnabel, J.A., Rueckert, D., Quist, M., Blackall, J.M., Castellano-Smith, A.D., Hartkens, T., Penney, G.P., Hall, W.A., Liu, H., Truwit, C.L., Gerritsen, F.A., Hill, D.L.G., Hawkes, D.J.: A generic framework for non-rigid registration based on non-uniform multi-level free-form deformations. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, Springer, Heidelberg (2001) 14. Glocker, B., Komodakis, N., Paragios, N., Tziritas, G., Navab, N.: Inter and intra-modal deformable registration: Continuous deformations meet efficient optimal linear programming. In: Information Processing in Medical Imaging, Kerkrade, Netherlands (2007) 15. Komodakis, N., Tziritas, G., Paragios, N.: Fast, approximately optimal solutions for single and dynamic mrfs. In: Computer Vision and Pattern Recognition (2007) 16. Gerig, G., Jomier, M., Chakos, M.: Valmet: A new validation tool for assessing and improving 3d object segmentations. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, Springer, Heidelberg (2001) 17. Cheong, J., Faggian, N., Langs, G., Suter, D., Cicuttini, F.: A comparison of model-based methods for knee cartilage segmentation. In: Computer Vision Theory and Applications (2007)
Similarity Metrics for Groupwise Non-rigid Registration Kanwal K. Bhatia1 , Jo Hajnal2 , Alexander Hammers2,3 , and Daniel Rueckert1 1
Visual Information Processing, Department of Computing, Imperial College London 2 Imaging Sciences Department, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital 3 Division of Neuroscience, Faculty of Medicine, Imperial College London, Hammersmith Hospital
Abstract. The use of groupwise registration techniques for average atlas construction has been a growing area of research in recent years. One particularly challenging component of groupwise registration is finding scalable and effective groupwise similarity metrics; these do not always extend easily from pairwise metrics. This paper investigates possible choices of similarity metrics and additionally proposes a novel metric based on Normalised Mutual Information. The described groupwise metrics are quantitatively evaluated on simulated and 3D MR datasets, and their performance compared to equivalent pairwise registration.
1
Introduction
There has been growing interest in unbiased atlas construction through the use of groupwise registration [1] [2] [3] [4] [5] [6]. Techniques to do this often do not require the choice of any reference image, and the atlas obtained is therefore not biased by such a choice. The groupwise registration methods developed vary in terms of deformation model, similarity metric and averaging principle used, however, little comparison of such methods has been made. While the deformation models generally follow naturally from comparable pairwise methods, the extension of metrics representing the similarity between two images to the similarity between all subjects in a group, is a more challenging task. Similarity metrics should scale at most linearly with increasing numbers of subjects while still remaining effective at driving registration. Entropy-based similarity metrics are frequently used in pairwise non-rigid registration. Their evaluation typically requires the evaluation of probability density functions, for example, through the use of histograms. Extending these metrics to multiple images requires the evaluation of multidimensional PDFs. For example, extending Normalised Mutual Information (NMI) [1] [7] to n images gives: 1 H(Ii ) H(I1 , I2 ...In ) i=1 n
S=
(1)
where H(Ii ) represents the marginal entropy of image i and H(I1 , I2 , ...In ) represents the joint entropy of all the images. Evaluating n-dimensional NMI, N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 544–552, 2007. c Springer-Verlag Berlin Heidelberg 2007
Similarity Metrics for Groupwise Non-rigid Registration
545
therefore requires an n-dimensional histogram. Apart from the exponentially increasing memory requirements, this also leads to increasing sparsity as the histogram size becomes very much larger than the number of samples contained. For example, 10 images of size 256 × 256 × 256 with 64 intensity bins, gives a histogram size of 6410 = 260 , but only 2563 = 224 samples. The evaluation of such sparse histograms is computationally inefficient and so this method is infeasible for large numbers of subjects. Previous attempts to extend joint entropy-based metrics have been proposed by [1] and [3]. In [1], a histogram is created with the same dimensionality as the number of subjects. Evaluating this histogram requires the use of a parallel cluster of processors. [3] proposed the selection of one arbitrary image of the population to act as an intensity reference. A joint histogram is built with pair of intensities, comprising the voxel intensity in the reference and the corresponding intensity in each subject. A problem with this method is that with an increasingly large number of intensity pairs, for example when using a large number of 3D images, the effect that each pair has on the overall evaluation of the histogram is greatly reduced. In this paper we propose an alternate extension to NMI. Furthermore, we evaluate different similarity metrics for groupwise non-rigid registration and evaluate their performance on simulated as well as real 3D MR data. Additionally, we assess groupwise registration in comparison to iterative pairwise techniques for average atlas construction.
2 2.1
Methods Groupwise Registration
The groupwise registration algorithm used in this paper is based on [3] which uses a free-form deformation model based on B-splines. The deformations of corresponding points are constrained to sum to zero in order to ensure that the resulting atlas represents the average shape of the population. 2.2
Similarity Metrics
Similarity metrics that can be used efficiently within the above groupwise registration framework above are described in the following sections. All the metrics described scale linearly with increasing numbers of images in the population. Voxel Intensity-Based Metrics. Previous work on groupwise registration [2] [8] has used voxelwise differences from a reference intensity (for example the mean voxel intensity or the intensity of a selected reference image). One such measure when a reference image is not selected, is the sample variance of a population, which represents the difference from the current mean intensity ¯ I(x) for each voxel x in image domain Ω: 2 n ¯ I(x) − I(x) (2) SSV = nΩ i=1 x∈Ω
The mean intensity is updated at every iteration of the registration process.
546
K.K. Bhatia et al.
Entropy-Based Metrics. In pairwise registration, NMI has been shown to be an effective similarity measure [7], and it would be desirable to extend this to groupwise regisrtation. We propose a novel measure whereby separate histograms are constructed using each individual image and an average intensity image of the population. The similarity is the sum of the individual NMI values: n ¯ + H(Ii ) H(I) SAN MI = (3) ¯ Ii ) H(I, i=1 ¯ is the current voxel-wise mean intensity of the group of The reference image, I, images and is updated at every iteration. Label Consistency (LC). The use of multiple images can potentially cause issues due to the increased dimensionality and increased variation in intensities. This motivates the use of segmentation-based metrics for multi-subject similarity. If hard segmentations of every subject in the population are available, the overlap between these structures could be used to align the images. The label consistency is one such metric: SLC =
n N (Ii ∩ Iref ) i=1
N (Ii ∪ Iref )
(4)
where N (Ii ∩ Iref ) represents the number of voxels in both image Ii and the reference image having the same label and (Ii ∪ Iref ) is the total of the number of voxels labelled in Ii and the reference. One choice for the reference image Iref is the maximum probability estimate [9], created by assigning to each voxel the class representing the mode of the group (i.e. the most commonly occuring class at that voxel). This reference is recalculated at every iteration. Kullback Leibler Divergence (KL). An issue with using hard segmentations is that labelled regions may span many different structures. There is no way for the registration to tell how well individual structures are aligned within the same labelled area. Probabilistic segmentations are likely to offer greater sensitivity. These can be obtained via, for example, the Expectation-Maximisation (EM) algorithm [10]. The Kullback-Leibler divergence [11] is an additive measure which measures the distance between PDFs, and has previously been used for groupwise registration in [6]. For images segmented into K tissue classes: SKL =
K n i=1 x∈Ω k
pi,x,k log
pi,x,k pref,x,k
(5)
where pi,x,k is the probability of voxel location x in image i being classified as tissue class k, and the reference probability is taken as the mean of the probabilities at that voxel: pi,x,k (6) pref,x,k = i n Once again, the reference image has to be updated at each iteration of the registration.
Similarity Metrics for Groupwise Non-rigid Registration
547
Iterative Pairwise Methods for Average Atlas Construction. Average atlas construction need not be performed in a purely groupwise fashion. Algorithms for the application of average transformations from pairwise registrations have also been developed [12], [13]. The method considered in this paper is directly comparable to the groupwise method in that it uses the same free-form deformation model based on B-splines [7]. NMI is used as a similarity metric to pairwise register all subjects to a chosen reference subject. The inverse of the mean of these deformations is then applied to transform each image to the average shape [13]. This process can be repeated by iteratively re-registering to the current average image as in [12]. 2.3
Criteria for the Evaluation of Similarity Metrics
To analyse the performance of the metrics described, two items need to be considered: 1. Accuracy : how well the registration recovers the average shape of the population for each image. 2. Consistency : how well-aligned the subjects in the group are with each other. Accuracy. The accuracy of registration can be assessed if the average shape of the population or the transformations to the average shape are known. If tissue classes are available, the Dice similarity metric can be used: D=
2 × N (I ∩ Iref ) N (I ∪ Iref )
(7)
which is twice the ratio of the number of voxels correctly labelled to the total number of voxels with that label in both the reference and image under consideration. If the transformation is known, the average absolute displacement error for each voxel, x, can be computed: Error =
|dsimulated (x) − drecovered (x)| nΩ
(8)
x∈Ω
Consistency. How well-registered the subjects are with each other is determined using two measures: entropy and accumulated overlap of structures. The more well-aligned a population is, the sharper the resulting final atlas (a mean of the intensities of the individual transformed images) should be. However, it may not be easy to distinguish between atlases by visual inspection alone. The entropy, H(A) of each atlas is therefore computed: p(A(x)) log p(A(x)) (9) H(A) = − x
where p(A(x)) is the probability of the intensity of voxel x. As the atlas gets sharper, its entropy should decrease. Groupwise accumulated overlaps based on
548
K.K. Bhatia et al.
fuzzy set theory, developed by Crum et al. [14], have also been used to assess the registration: min(I1 , I2 ) pairs labels voxels (10) Overlap = pairs labels voxels max(I1 , I2 ) where I1 and I2 are the binary segmentation values in a given pair of images for a given voxel and label.
3
Results
A set of 100 deformation fields was created such that the total deformation of the set was equal to zero. The inverses of these deformations (calculated using a numerical scheme [15]) were applied to a slice of the MNI Brainweb image to create a population of 100 subjects, whose average shape is the original, undeformed slice. In addition, the MNI Brainweb image has probabilistic and ground truth (hard) segmentations of white matter (WM), grey matter (GM), cerebrospinal fluid (CSF) and background (BG) classes. These were also transformed to the space of each individual image for use in segmentation-based registration techniques and in the evaluation of the results. Groupwise registration of the population using each of the similarity metrics was performed: sample variance (SV), label consistency (LC), Kullback-Leibler divergence (KL) and normalised mutual information with the average image (ANMI). In addition, an average atlas of the population was created by using the mean transformation of pairwise registrations to a chosen subject (P(i=1)). This atlas was updated by reregistering each subject to the current atlas, and using the new transformations to update the atlas, up to four times (P(i=2), P(i=3), P(i=4)). The final atlases are shown in Figure 1.
Fig. 1. Far left: atlas of original population. L-R: average shape atlases using P(i=1), KL, ANMI and SV registration methods. Far right: MNI Brainweb image.
3.1
Accuracy
The accuracy of the registrations are determined by how well the registrations recover the original MNI Brainweb image. The overlaps between tissue classes obtained through registration and the actual segmentations are shown in Figure 2(a), while the errors in the obtained deformation fields are shown in 2(b).
Similarity Metrics for Groupwise Non-rigid Registration
549
It can be seen that best performing groupwise metrics are the KL and ANMI metrics, which provide results close to those obtained when using four iterations of the pairwise registration to the average shape. As expected, the label consistency metric performs poorly in comparison. For this metric to work well, more segmentations of smaller structures would be needed. 1.0
Dice Overlap
Population LC KL ANMI SV P (i=1) P (i=2) P (i=3) P (i=4)
0.8
0.7
0.6
Average Voxel Displacement Error (mm)
3.0 0.9
0.5
2.5
2.0
1.5 BG
CSF
GM
WM
Population
LC
KL
ANMI
SV
P (i=2)
P (i=3)
P (i=4)
Tissue Class
Fig. 2. Dice overlaps and deformation errors indicating accuracy of registration
3.2
Consistency
The entropies of the final atlases and the accumulated overlaps are shown in Figure 3. It should be noted that the accumulated overlap measure implicitly weights the effect of structures according to their area. In contrast, entropy weights all voxels equally and hence there may not be a direct relationship between the two. In Figure 3(b), the results shown do not include the alignment of the very large background class. Once again, the best methods are four iterations of the pairwise registration and the KL and ANMI metrics for groupwise registration. However, the entropy shows that the iterative pairwise method does not guarantee increasing consistency with each iteration. 1.0
3.3
Accumulated Overlap
0.9
Entropy
3.2
3.1
3.0
0.8
0.7
0.6
2.9
0.5 Population
LC
KL
ANMI
SV
P (i=1)
P (i=2)
P (i=3)
P (i=4) Brainweb
Population LC
KL
ANMI
SV
P (i=1) P (i=2) P (i=3) P (i=4)
Fig. 3. Atlas entropies and accumulated overlaps showing consistency of registration
550
3.3
K.K. Bhatia et al.
3D MR Image Registration
The best-performing intensity (ANMI, pairwise) and segmentation (KL) algorithms have also been tested on real 3D MR data for 12 subjects. The atlases are shown in Figure 4. These subjects have hard segmentations of 83 tissue classes obtained by expert manual segmentation, using an extension of an existing protocol [9]. The accumulated overlaps obtained are shown in Table 1, showing similar performance of the metrics to the 2D case. It can be seen that while four iterations of the pairwise registration appears to give the best overlap, the overall deformation field is not zero. This means that the coordinate system of the final atlas does not truly represent the average of the population. Figure 4 shows that the total deformation in both groupwise cases is equal to zero, and the resulting atlases therefore describe the average shape of the population.
Fig. 4. Top row: axial, coronal and sagittal sections of affinely-aligned 12 3D adult subjects. Subsequent rows: groupwise registration using KL; groupwise registration using ANMI; fourth iteration of pairwise re-registration to average shape. Far right column: total deformation field (sagittal section) using each similarity metric.
Similarity Metrics for Groupwise Non-rigid Registration
551
Table 1. Accumulated overlaps of 12 3D subjects registered to their average shape Affine Pairwise (i=0) Pairwise (i=4) KL ANMI 0.56 0.60 0.63 0.62 0.61
4
Discussion
This paper compares similarity measures for groupwise registration techniques. A novel similarity metric which extends the popular NMI metric to assess multisubject similarity has also been proposed. It was found that this metric, and the KL metric for when probabilistic segmentations are available, performed comparably to average atlas construction using four iterations of pairwise registration. Although this work does not constitute a full survey into the groupwise registration techniques available, it does demonstrate the relative performance of the methods considered, and provides an additional comparison with pairwise methods for atlas construction.
References 1. Studholme, C.: Simultaneous population based image alignment for template free spatial normalisation of brain anatomy. In: Gee, J.C., Maintz, J.B.A., Vannier, M.W. (eds.) WBIR 2003. LNCS, vol. 2717, pp. 81–90. Springer, Heidelberg (2003) 2. Joshi, S., Davis, B., Jomier, M., Gerig, G.: Unbiased diffeomorphic atlas construction for computational anatomy. NeuroImage 23, S151–S160 (2004) 3. Bhatia, K.K., et al.: Consistent groupwise non-rigid registration for atlas construction. In: ISBI 2004, pp. 908–911 (2004) 4. Twining, C., et al.: A a unified information-theoretic approach to groupwise nonrigid registration and model building. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 1–14. Springer, Heidelberg (2005) 5. Zollei, L., et al.: Efficient population registration of 3D data. In: ICCV (2005) 6. Lorenzen, P., et al.: Multi-modal image set registration and atlas formation. Medical Image Analysis 10(3), 440–451 (2006) 7. Rueckert, D., et al.: Non-rigid registration using free-form deformations: Application to breast MR images. IEEE TMI 18(8), 712–721 (1999) 8. Marsland, S., et al.: Groupwise non-rigid registration using polyharmonic clampedplate splines. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, pp. 771–779. Springer, Heidelberg (2003) 9. Hammers, A., et al.: Three-dimensional maximum probability atlas of the human brain, with particular reference to the temporal lobe. HBM 19, 224–247 (2003) 10. van Leemput, K., et al.: Automated model-based bias field correction of MR images of the brain. IEEE TMI 18(10), 885–896 (1999) 11. Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951) 12. Guimond, A., et al.: Average brain models: A convergence study. Computer Vision and Image Understanding 77(9), 192–210 (2000)
552
K.K. Bhatia et al.
13. Rueckert, D., et al.: Automatic construction of 3-d statistical deformation models of the brain using nonrigid registration. IEEE TMI 22(8), 1014–1025 (2003) 14. Crum, W.R., et al.: Generalised overlap measures for assessment of pairwise and groupwise image registration and segmentation. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 99–106. Springer, Heidelberg (2005) 15. Rao, A., et al.: Spatial transformation of motion and deformation fields using nonrigid registration. IEEE TMI 23(9), 1065–1076 (2004)
A Comprehensive System for Intraoperative 3D Brain Deformation Recovery Christine DeLorenzo1 , Xenophon Papademetris1,2, Kenneth P. Vives3 , Dennis D. Spencer3 , and James S. Duncan1,2 1
Departments of Biomedical Engineering, 2 Diagnostic Radiology, and 3 Neurosurgery, Yale University, P.O. Box 208042 New Haven CT 06520-8042, USA {christine.delorenzo, xenophon.papademetris, kenneth.vives, dennis.spencer, james.duncan}@yale.edu
Abstract. During neurosurgery, brain deformation renders preoperative images unreliable for localizing pathologic structures. In order to visualize the current brain anatomy, it is necessary to nonrigidly warp these preoperative images to reflect the intraoperative brain. This can be accomplished using a biomechanical model driven by sparse intraoperative information. In this paper, a linear elastic model of the brain is developed which can infer volumetric brain deformation given the cortical surface displacement. This model was tested on both a realistic brain phantom and in vivo, proving its ability to account for large brain deformations. Also, an efficient semiautomatic strategy for preoperative cortical feature detection is outlined, since accurate segmentation of cortical features can aid intraoperative cortical surface tracking.
1
Introduction
Successful neurosurgical interventions require precise localization of pathologic tissue. Localization inaccuracies, most often due to intraoperative brain deformation [17], can lead to unsuccessful resections, impairment of physical or mental abilities or death. Despite these consequences, compensating for the effects of brain shift remains an open problem. Biomechanical models guided by sparse intraoperative data can provide 3D deformation recovery based on the brain’s material properties. These brain models vary greatly in complexity and computational expense. Some authors propose nonlinear [17] or anisotropic [6] models. These detailed models require more computation, brain parameter estimation or information that is infrequently obtained from neurosurgical patients, such as white matter structure obtained from Diffusion Tensor Imaging (DTI). While this additional information, when available, may improve deformation calculations in some areas of the brain, the comparison to homogeneous linear models has been limited and generally based on small deformations. At the same time, accurate results have been achieved with linear elastic models (LEMs). In experiments done by Paulsen et al. [12], N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 553–561, 2007. c Springer-Verlag Berlin Heidelberg 2007
554
C. DeLorenzo et al.
Navier’s equation was shown to predict the displacement of brain tissue when subject to a body force. In interesting extensions of this work [4,8], the poroelastic model achieved accurate results when guided by surface data or fiducial marker displacement and solved inversely. In [15], it was shown that volumetric results could be obtained by relying solely on direct measurements of the cortical surface as displacement boundary conditions for an LEM. Based on these promising results, an LEM was chosen in this work. The model depends on two parameters of brain tissue, Young’s Modulus and Poisson’s ratio, both of which can be easily obtained from the literature [7]. The boundary conditions of the biomechanical model are based on the fixed skull and the intraoperative displacement of cortical surface. In previous related work, we have calculated intraoperative cortical surface displacement using a deformable model guided by stereo camera images [2,3]. In our work, as well as in similar research performed by other groups [1,9,16], the importance of using cortical features to aid the surface deformation estimation has been established. Therefore, in this paper, we have developed a novel semiautomatic 3D sulci extraction technique, developed specifically for this application, which affords easier and more reliable segmentation of preoperative features.
2
Method
The method for brain shift compensation can be divided into three main steps. (1) Preoperative processing. In this step, the cortical surface and the sulci on that surface are extracted. The semiautomatic process for performing this extraction is developed below. (2) Intraoperative Deformation Estimation. The displacement of the exposed cortical surface is determined at this stage. Though this step is very important for the overall result, it is not the focus of this paper. The cortical surface displacement field can be obtained from any deformation tracking algorithm [1,2,3,9,16]. However, this surface deformation must then be propagated through the brain volume. (3) Volumetric Determination. This is the method of estimating brain volumetric changes, in this case, using a linear elastic model. The contributions of this paper are the development of steps (1) and (3). 2.1
Preoperative Processing
In order to extract the preoperative cortical surface and feature positions, the preoperative MRI is first skull stripped and segmented into brain and nonbrain regions. This is performed automatically using the freely available BioImage Suite software [10]. The resulting image, I, is a mapping from spatial position to a greylevel intensity, I : R3 → R. A level set surface from this image is the set of points x1 , ..., xn ∈ R3 : I(x1 , ..., xn ) = c where c is a constant value. Since a large gradient exists between the brain and nonbrain regions, an appropriate value of c (normally around one-third of the maximum intensity) will yield a level set corresponding to the brain surface.
A Comprehensive System for Intraoperative 3D Brain Deformation Recovery
555
Fig. 1. The left side shows the transition between an exposed vertex (yellow) and a vertex occluded by face fm (green) on one slice of a brain MRI. Explanation of the variables is in the text. On the right, the entire brain surface is shown. The area of the craniotomy is outlined in magenta. The yellow points are automatically detected by the algorithm as transition points between exposed and occluded mesh vertices. Using these as a guide allows easier and more accurate sulci detection.
Unlike the gyri, which are fully exposed, the sulci can run deep under the surface. Therefore, determining the points at which the triangulated surface vertices transition from being exposed to being occluded should guide the sulci extraction. To determine if the nth vertex, vn , is occluded by the mth face of the surface, fm , first the plane of fm is determined through simple geometry: (vm2 − vm1 ) × (vm3 − vm1 ) = − n→ m,
− n→ m • (x − vm1 ) = 0
(1)
where − n→ m is the normal to face fm , (vm1 , vm2 , vm3 ) define the vertices of face fm , and x is a point on the plane. (See the left side of Figure 1). To be occluded by face fm , a line originating at point vn and traveling in the → − direction of the viewer (vn + d • t, where t is the parameterization constant → − and d is the viewer direction) must intersect the plane defined in equation (1) inside face fm . In this case, the viewer direction is defined as being normal to the current face which contains vn . The intersection point, p, can be found by solving the simultaneous equations: → − p = vn + d • tp , where
− n→ m • (p − vm1 ) = 0
(2)
where tp is the value of the parameter that yields the intersection. Once the intersection to the plane is calculated, one only needs to determine if p lies inside face fm . Point p is within the face fm if, for each pair of two vertices defining −−−→ −−−−−→ −−−−−→ one edge of fm , − v− m1 vm2 ,vm2 vm3 ,vm1 vm3 , p is on the same side of that edge as the remaining vertex, vm3 , vm1 , vm2 , respectively. To determine, for example, if −−−→ p is on the same side of − v− m1 vm2 as vm3 one needs to ensure that ((vm2 − vm1 ) × (p − vm1 )) • ((vm2 − vm1 ) × (vm3 − vm1 )) >= 0. If the above inequality holds true for all three edges, then the point lies within that triangular face [5]. In this case, it also means that the point is occluded by that face. By performing this test on all the vertices in the craniotomy region,
556
C. DeLorenzo et al.
Table 1. Errors (in mm) were calculated as the distance between the predicted surface and twelve intraoperatively tracked points used for validation. Details about the deformation algorithm, which requires the preoperative locations of the sulci, can be found in [2]. mean Total Deformation 4.49 Deformation Algorithm Error (Manually Segmented Sulci) 1.05 Deformation Algorithm Error (Semiautomatically Segmented Sulci) 0.58
max 7.24 3.08 1.01
std 2.03 0.86 0.22
the vertices that are on the border between exposed and occluded can be automatically highlighted. These vertices signal the edge of a sulcus and can be used as a guide in sulci extraction. (See the right side of Figure 1.) At this point, the user can choose to connect the highlighted vertices in whichever order or orientation needed. The main drawback of this method is that it is dependent on view angle. We are currently working on extending this method by repeating the above process for varying view angles and combining the results to surround the sulci from both sides. This would make the method more robust and may require less user input in the final stages. The single viewpoint method did, however, provide promising initial results. It is difficult to compare the accuracy of the semiautomatically detected sulci versus the manually outlined sulci, since there is no ground truth. However, since the goal of the sulcal extraction is to aid in cortical surface detection, the results can be evaluated based on which method enables better cortical surface tracking. Using an in vivo test case, results of a surface tracking algorithm (details of the algorithm are provided in [2]) were compared using either manually segmented (as they were in [3]) or semiautomatically segmented sulci as algorithm inputs. As Table 1 shows, the semiautomatically detected sulci provided a more accurate initialization of the algorithm and therefore better results. 2.2
Intraoperative Deformation Estimation
Once the cortical surface and sulci have been extracted, the next step is to calculate the intraoperative deformation. Methods for intraoperative deformation estimation often rely on cortical surface features, such as sulci or blood vessels, either warping the features from the preoperative images to their intraoperative positions using robust point matching [1], an iterative closest point technique [16], or a deformable model [2,3]. Whether blood vessels or sulci are used, accurate sulcal segmentation may aid the process since vessels often parallel the sulcal and gyral patterns [9]. Therefore, the guided sulcal extraction can aid most deformation estimation algorithms. The details of the deformation estimation algorithms are beyond the scope of this paper. Regardless of the method used, however, the result can be formulated as a dense displacement field over each node of the exposed cortical surface.
A Comprehensive System for Intraoperative 3D Brain Deformation Recovery
2.3
557
Volumetric Determination
The dense displacement field found in the previous step can be applied to the brain volume using finite element analysis, in which the brain is modeled as a linear elastic material. Given the displacements of points on the intraoperative cortical surface, pi , calculated from the surface detection algorithm, the deformation throughout the brain volume, V, can be calculated using an energy minimization framework [11]. N 2 (3) W (α, v, m) + c(pi ) |u(pi ) − v(pi )| vˆ = arg min v
V
i=1
where N is the total number of cortical surface points used to guide the model, u(pi ) is the known 3D displacement at each point pi , v(pi ) is the 3D LEM displacement at pi . The confidence in each measurement is set by c(pi ), in which the surface displacements are imposed as boundary conditions. W (α, v, m) is a positive semi-definite functional defining the approximation strategy which, in this case, is internal energy. It is defined as W = T C, a function of α, the parameter vector, the displacement field, v, and m, the spatial position. The energy minimization was performed using the finite element analysis software package ABAQUS (ABAQUS, Inc., Providence RI). The model inputs were the tetrahedral brain volume mesh, a surface representing the surrounding skull, Young’s modulus (66.7 kPa) and Poisson’s ratio (0.48) of brain [7]. Surface displacements were applied as boundary conditions and the cortex and skull surfaces were defined as contact surfaces, enforcing that the brain will not deform past the rigid confines of the skull. This constraint keeps the deformation physically meaningful because it allows the entire surface to sink inward or along the skull surface; however, only in the region of the craniotomy can it bulge outward. The output of the finite element analysis is the displacement at every node in the brain mesh. Since the goal is to produce a displacement field at every point of the original MRI image from which the mesh was obtained, it was necessary to resample the resulting displacements in image space. This was performed using trilinear interpolation.
3
Results
The above method has been applied to both phantom and in vivo cases. Although the ultimate test of the algorithm lies in its application to in vivo cases, the brain phantom provides an opportunity to test the volume finite element analysis completely before it is applied to patient data. 3.1
Volume Phantom Validation
The brain phantom, described in detail in [3], consists of a skull mold modeled from a neurosurgical patient’s MRI. This mold has a removable section corresponding to the site of that patient’s craniotomy. The brain tissue is simulated
558
C. DeLorenzo et al.
Fig. 2. The left side shows the same slice of an Initial, Predicted, and Deformed (for validation) Phantom MR Image. The inner balloon (dark arrow) collapses in the Deformed Image and this is captured by the Predicted Image. Qualitatively, the Predicted Image appears similar in shape and deformation pattern as the image taken for validation. For quantitative comparison, the locations of nine landmarks were obtained from each of the MR images with the locations of the Deformed Image considered as the ground truth. The mean errors (16.56 mm for Initial and 2.93 mm for Predicted) are shown with dashed lines in the graph on the right.
using Sylgard 527 Silicone Dielectric Gel (Dow Corning, Midland, MI) which has been proven to have the same mechanical properties of brain [13]. Deformation was created by inflating or deflating a balloon in the center of the silicone gel. To account for this balloon in the model, an additional boundary condition was set such that the deformation would cause the balloon to collapse. This was necessary due to the volume of the balloon, which is large relative to the phantom brain volume, and does not collapse physiologically in the direction of gravity, but rather inward, toward the balloon center. The total volumetric deformation can be tracked using MRI to view the MR opaque plastic markers uniformly placed throughout the phantom volume. Figure 2 shows the accuracy of the volumetric deformation result. Although the mean deformation of the markers was quite high (16.56 ± 6.27 mm), the model was able to track most of the movement. The mean error was reduced to under 3 mm, accounting for 82.5% of the total deformation. 3.2
In Vivo Data
Encouraged by the results of the phantom experiments, volume deformation recovery was then applied to in vivo cases. As explained in Section 2.3, the position of the skull was used to constrain the deformation of the surface nodes that do not lie within the craniotomy region. This allows for a more physiological deformation than fixing the surface nodes that are not exposed. Figure 3A illustrates this effect on a case in which a bilateral craniotomy was performed. With fixed surface nodes, the regions near the craniotomy cannot move, even when the deformation becomes large. This effect is most obvious in the region indicated
A Comprehensive System for Intraoperative 3D Brain Deformation Recovery
559
Fig. 3. A) A slice of the preoperative MRI (left) which has either been deformed using fixed surface nodes (middle) or a skull constraint (right). The red and yellow spheres indicate the surface points acquired during surgery. The aqua arrows are located in the same location on each image. B) One slice of the preoperative (right) and predicted intraoperative initial (middle) and final (left) MR image. As the surface sinks toward the midline over time, the deformation is propagated through the hemisphere and the ventricle begins to collapse. The red spheres were acquired by the neurosurgeon two hours into surgery and the yellow spheres were acquired 75 minutes later. C) Volume Renderings of the Preoperative (left) and Predicted (middle and right) MR Images. The same red and yellow spheres from (B) are plotted here relative to the volume renderings of the deforming brain.
by the aqua arrow, which is located in the same relative position on all three images. The deformation decreases sharply to zero outside the craniotomy region if the surface nodes are fixed. However, when constrained by the skull, the region indicated by the aqua arrow is allowed to deform inward as well, resulting in a more natural deformation. To check for deformation consistency, volumetric model deformation calculations also were performed at two time points of a single surgery. Figure 3B shows the surface deformation over time relative to a set of intraoperative points acquired 2 hours into surgery (red) and 3.25 hours into surgery (yellow). Though a rigid midline was not added as a constraint, the relatively small amount of deformation confines the movement mostly to a single hemisphere. However, the deformation is significant enough to cause a noticeable change in the size of the ipsilateral ventricle. Figure 3C shows the same deformation in a 3D view.
560
4
C. DeLorenzo et al.
Discussion
A simple, efficient method of semiautomatic sulci segmentation was outlined in this paper. Using these semiautomatically extracted sulci resulted in better cortical surface tracking. Although accurate sulcal segmentation of the whole brain has been proposed before, through the use of deformable atlases or other techniques such as neural networks [14], these algorithms are more computationally expensive and time-consuming than the simple sulcal extraction proposed here. Though some types of research may require the use of these other methods, this work has shown that for the purposes of brain deformation recovery, the method proposed in this paper produces reliable results. The proposed LEM also produced promising results. In phantom experiments, the LEM was able to compensate for large deformations, as are often seen intraoperatively. However, the need for an additional term to account for the internal balloon’s collapse may indicate that a more detailed model, such as one which takes into account cerebrospinal fluid drainage or the collapse of fluid-filled ventricles, is necessary in cases of large in vivo deformations. Future work with this model will therefore involve quantitative analysis of the in vivo data and assessment using volumetric intraoperative imaging. However, qualitatively, the proposed biomechanical model produced consistent and visually accurate results in vivo.
References 1. Cao, A., et al.: Tracking cortical surface deformations based on vessel structure using a laser range scanner. In: ISBI, Washington, DC, pp. 522–525 (2006) 2. DeLorenzo, C.: Image-Guided Intraoperative Brain Deformation Recovery. PhD thesis, Yale University (December 2007) 3. DeLorenzo, C., et al.: Nonrigid 3D brain registration using intensity/feature information. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 932–939. Springer, Heidelberg (2006) 4. Dumpuri, P., et al.: An atlas-based method to compensate for brain shift: Preliminary results. Medical Image Analysis 11(2), 128–145 (2007) 5. Ericson, C.: Real-Time Collision Detection. Morgan Kaufmann Publishers, San Francisco, California (2005) 6. Kemper, C., et al.: An anisotropic material model for image guided neurosurgery. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 267–275. Springer, Heidelberg (2004) 7. King, A., et al.: WSU Brain Injury Model, http://ttb.eng.wayne.edu/brain/ 8. Lunn, K., et al.: Assimilating intraoperative data with brain shift modeling using the adjoint equations. Medical Image Analysis 9(3), 281–293 (2005) 9. Nakajima, S., et al.: Use of cortical surface vessel registration for image-guided neurosurgery. Neurosurgery 40, 1201–1210 (1997) 10. Papademetris, X., et al.: BioImage Suite: An integrated medical image analysis suite, Yale School of Medicine, http://www.bioimagesuite.org 11. Papademetris, X., et al.: Computational Models for the Human Body (Handbook of Numerical Analysis), pp. 551–590. Elsevier B.V, The Netherlands (2004)
A Comprehensive System for Intraoperative 3D Brain Deformation Recovery
561
12. Paulsen, K., et al.: A computational model for tracking subsurface tissue deformation during stereotactic neurosurgery. IEEE Trans. Biomed. Eng. 46(2), 213–225 (1999) 13. Puzrin, A., et al.: Image guided constitutive modeling of the silicone brain phantom. In: Proc. SPIE, vol. 5744, pp. 157–164 (2005) 14. Rivi`ere, D., et al.: Automatic recognition of cortical sulci of the human brain using a congregation of neural networks. Medical Image Analysis 6(2), 77–92 (2002) ˇ 15. Skrinjar, O., et al.: Model-driven brain shift compensation. Medical Image Analysis 6(4), 361–373 (2002) 16. Sun, H., et al.: Using cortical vessels for patient registration during image-guided neurosurgery - A phantom study. In: Proc. SPIE, vol. 5029, pp. 183–191 (2003) 17. Wittek, A., et al.: Brain shift computation using a fully nonlinear biomechanical model. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 583–590. Springer, Heidelberg (2005)
Bayesian Tracking of Tubular Structures and Its Application to Carotid Arteries in CTA Michiel Schaap, Rashindra Manniesing, Ihor Smal, Theo van Walsum, Aad van der Lugt, and Wiro Niessen Biomedical Imaging Group Rotterdam, Departments of Radiology and Medical Informatics Erasmus MC - University Medical Center Rotterdam [email protected]
Abstract. This paper presents a Bayesian framework for tracking of tubular structures such as vessels. Compared to conventional tracking schemes, its main advantage is its non-deterministic character, which strongly increases the robustness of the method. A key element of our approach is a dedicated observation model for tubular structures in regions with varying intensities. Furthermore, we show how the tracking method can be used to obtain a probabilistic segmentation of the tracked tubular structure. The method has been applied to track the internal carotid artery from CT angiography data of 14 patients (28 carotids) through the skull base. This is a challenging problem, owing to the close proximity of bone, overlap in intensity values of lumen voxels and (partial volume) bone voxels, and the tortuous path of the vessels. The tracking was successful in 25 cases, and the extracted path were found to be close (< 1.0mm) to manually traced paths by two observers. Keywords: Bayesian tracking, Elongated structures, Bhattacharrya metric, Carotid arteries.
1
Introduction
Segmentation and tracking of tubular elongated structures is an important goal in a wide range of biomedical imaging applications. Especially vessel tracking in medical images has received considerably attention, as it can e.g. be used as a preprocessing step towards the detection and characterization of vessel pathology, such as stenoses. In this work we present a new vessel tracking method which is based on a probabilistic approach. Until recently almost all vessel tracking methods were deterministic in their approach; only one assumption of the track configuration is taken into account during tracking, e.g. [1,2]. The path or segmentation is found by updating the track to the most probable configuration at each iteration of the method. This may lead to incorrect tracking results because the path direction can not be derived from imaging data locally, e.g. owing to pathologies, corrupted or missing data. By taking into account multiple hypotheses during tracking this problem N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 562–570, 2007. c Springer-Verlag Berlin Heidelberg 2007
Bayesian Tracking of Tubular Structures and Its Application
563
can be circumvented. Probabilistic methods for tracking elongated structures have been presented in a number of recent papers [3,4]. These methods search globally for the path that reflects the priors as good as possible. One of the important considerations when developing a tubular structure tracking algorithm is an observation model which describes the appearance of the tubular structures in the image, given the parameters of the tube (e.g. position and orientation). The two mentioned probabilistic approaches [3,4] use an observation model that is based on the assumption that vessels are bright relative to their background. This model is not suited for situations where the background contains both lower and higher intensities than the tube, which is e.g. the case for carotid arteries, which are surrounded by low intensity soft tissue and high intensity bone. This article is based on the method presented in [4]. The novelty of this new work is threefold. First, we present a generic and intuitive observation model to overcome the problems associated with the bright vessel observational model. The new observation model is specifically tailored for tracking homogeneous tubular structures through backgrounds with varying intensities. Second, the article presents a method to reduce the possibility of tracking sections of tubular structures multiple times. Third, the Bayesian tracking methodology is used to obtain a rough segmentation of the vessel. The presented method is evaluated by tracking the internal carotid arteries through the skull base, which is a challenging problem. The success rate and accuracy of the tracking are assessed quantitatively by comparing results on 14 datasets (28 carotids) to tracings by two observers.
2
Bayesian Tracking of Tubular Structures
This section presents our Bayesian tracking approach. For a reference of the methods and notations used in this article we would like to refer to the work of Doucet et al.[5]. In Bayesian tracking, the algorithm searches for the track that is best explained by the data, given an observation model (which is based on our model of a tube), and prior information on the shape and appearance of the vessel (modeled via e.g. transition priors). The posterior probability density function of a track, given the data, is recursively estimated. This is achieved by adding new hypotheses to a set of currently most likely hypotheses. At the end, the maximum of the posterior probability is chosen as the track that best represents the vessel. In the following, we first describe our model of a tube, and how that model is implemented in our observation model. Next, it is explained how the posterior probability density can be estimated (’prediction’), and lastly the mechanism for generating new hypotheses, is explained (’update’). 2.1
Tube Model
Our tracking method will consider a tube as a series of tube segments. A tube segment at iteration t is described by its location pt = (xt , yt , zt )T , orientation v t = (θt , φt ), radius rt , intensity It and intensity variance σ t . Thus each tube
564
M. Schaap et al.
segment is characterized by a state vector xt = (pt , v t , rt , It , σ t )T . This results in a tube configuration described by x0:t {x0 , . . . , xt }, see Figure 1(a). With every tube segment we associate a region of interest (ROI) U , defined by the components p t , rt , and v t of xt (see Fig. 1b). Subsequently, we let zt denote the image measurements (i.e. image intensities) within this ROI. Hence, all measurements corresponding to tube x0:t are denoted with z0:t . S(xt ) defines the set of spatial coordinates that lie within the hypothesized tube and B(xt ) = U \S defines the set of spatial coordinates and in the band around the tube, see Figure 1(b). 2.2
Observation Model
With p(I|S(xt )) and p(I|B(xt )) we describe the normalized intensity histograms of the voxels at the inside and outside of the tube segment respectively. These distributions are constructed by sampling, with nearest neighbor interpolation, from the ROI defined by xt . x0t
B(xti)
x1t .
x0i v0i r0i
r1i
v1i
x1i r2i
v2i
S(xti)
x2i
rti
(a)
vti
rti
xti
xit-1 i vt1
Nt-2
xt
rt-i 1
Nt-1
xt
B(xti)
(b)
x..t xjt . x..t
l
(c)
Fig. 1. Fig. (a) shows a part of the tube configuration x0:t . Fig. (b) visualizes the region in the tube S(xit ) and the region in the band B(xit ) around the tube. The prediction of new tubular segments, as explained in Section 2.3, is presented in Fig. (c).
In our observation model, we assume that tubes have a homogeneous intensity with additive Gaussian disturbance, which differs from the intensity of surrounding tissue. Given the tube segment xt and the measurements zt , the likelihood of the observation given the state p(zt |xt ) is given by p(zt |xt ) ∝ p(zt |p t , v t , rt , It , σ t ) = Dcp,t (1 − Dsb,t )pM (xt )p(rt )
(1)
where Dcp,t and Dsb,t denote the similarity between the intensity distributions of the current and the previous segment and the similarity between the intensity distribution of the inside and outside of the tube, respectively. Furthermore pM (xt ) is a spatial prior that is used to prevent loops in the tracked tube and p(rt ) is a user-defined and application specific prior for the expected radii. The two intensity measures are calculated as follows 2 ˆs,t ), p(I|It , σt2 ))c1 Dcp,t = D(N (I|Iˆs,t , σ 2 ˆs,t ), p(I|B(xt )))c2 , Dsb,t = D(N (I|Iˆs,t , σ
(2) (3)
Bayesian Tracking of Tubular Structures and Its Application
565
where N (.|μ, σ 2 ) indicates a normal distribution with the mean μ and variance 2 σ 2 , Iˆs,t and σ ˆs,t describe the mean and variance of the histogram p(I|S(xt )), and c1 and c2 are parameters to regulate the influence of the different components. Several methods have been suggested in literature to calculate the similarity between two distributions. We use the Bhattacharyya Metric (BM) [6], defined by D(p1 , p2 ) = BM(p1 , p2 ) = p1 (x)p2 (x)dx (4) For the prior pM (xt ) we store for each voxel the probability that the voxel has not been identified as being part of the tubular structure in a spatial map Mt , with M0 = 1. This map is updated after each iteration. The update step is described in section 2.5. The prior is constructed by averaging over all values of this map that fall within the ROI described by S(xt ). p∈S(xt ) Mt (p) pM (xt ) = , (5) |S(xt )| where |.| defines the set size operator. This prior serves two purposes. First, it is less likely that sections of the image are tracked twice. Second, this map presents a rough tube segmentation. 2.3
Prediction
Using the Bayesian rule, the pdf p(x0:t |z0:t ), that describes the posterior probability of the tube configuration, having all the observations up to iteration t, can be estimated with the following recursion [5] p(x0:t |z0:t ) ∝ p(xt |xt−1 )p(zt |xt )p(x0:t−1 |z0:t−1 ),
(6)
where the transition prior p(xt |xt−1 ) is assumed to be Markovian (xt only depends on xt−1 and not on any other past states) and is factorized as p(xt |xt−1 ) ∝ p(p t , v t |p t−1 , v t−1 )p(rt |rt−1 ),
(7)
assuming that there is no transition model for It and that all intensities are equally probable. The likelihood p(zt |xt ) relates the conditionally independent measurements at iteration t to the state xt , as defined in equation (1). At each iteration step, we represent the probability of the tube configuration with a set of Nt weighted t states X 0:t = {xi0:t , wti }N i=1 , thus p(x0:t |z0:t ) =
Nt
wti δ(x0:t − xi0:t ),
(8)
i=1
where δ(·) is the Dirac delta function and the weights are normalized such that Nt i i=1 wt = 1.
566
M. Schaap et al.
In each iteration we use the variance of the weights wti to determine which hypotheses should be kept in the next iteration. The Ne most probable hypotheses are kept according to the weights wti , i ∈ {1, ..., Nt }, where Ne = Nt
1
i=1
(wti )2
(9)
From each of these states, Nti = max(nint(wti N ), Nmax ) new states are created, where nint(.) denotes a nearest integer round, N is pre-defined and describes the maximum total number of hypotheses created and Nmax describes the maximum number of hypotheses created from one hypothesis. This approach will keep only the relevant hypotheses and effectively distribute them according to the described pdf. 2.4
Update
In the update step, prior knowledge on the curvature of the centerline, the variance of the tube radius along the centerline, the intensity variance in the tube and the contrast-to-noise ratio in the image are incorporated. This is achieved as follows. The formation of the Nti new hypotheses xjt , j = {0, ..., Nti − 1}, at iteration t from the previous hypothesis xit−1 consist of a transition to a new position p t , which is deterministically defined by pjt = pit−1 + Rz (θt−1 )Ry (φt−1 )Rz (ϑjt )Ry (ϕjt )(0, 0, l)T
(10)
where Rz (.) and Ry (.) are rotation matrices around the z- and y-axis [7]. The length of a tube segment l is user-defined. This parameter has influence on the propagation speed, the curvature of the track, and the possible contrast-to-noise ratio of the tube. Figure 1(c) gives a schematic explanation of the transition in (10). The angles (ϑjt , ϕjt ) describe a point in the local spherical coordinate system with the z-axis orientated in the direction of v it−1 and origin at pt−1 . Therefore, the angle ϕjt is equal to the enclosed angle between v t−1 and v t . The two angles (ϑjt , ϕjt ) are constructed with an algorithm that uniformly distributes points on a sphere, as described by Saff and Kuijlaars [8]. This algorithm is used to distribute the Nti new hypotheses uniformly on the half sphere in front of pt−1 oriented in the direction of v t−1 . In this case, the transition density p(p t , v t |p it−1 , v it−1 ) is given by i
p(p t , v t |p it−1 , v it−1 ) =
Nt
ω˜j δ(p t − p jt )
(11)
j=1
where the weight ω˜j of a given enclosed angle is given by ω(φj ) ω˜j = N i t k=1 ω(φk )
(12)
ω(ϕ) = N (ϕ|0, σϕ2 )
(13)
with σϕ being a pre-defined parameter.
Bayesian Tracking of Tubular Structures and Its Application
567
The transition of rt is described by rt = rt−1 + ηt , where ηt is an uncorrelated Gaussian random variable with variance σr2 . In this case p(rt |rt−1 ) = N (rt |rt−1 , σr2 ). We estimate the radius of the patch rtj as follows rˆtj = argmax p(zt |p jt , φjt , rt , It|t−1 )p(rt |rt−1 ).
(14)
rt
Then, the intensity and variance variables of the state vector are updated aci i cording to Iti = Iˆs,t−1 and σti = σ ˆs,t−1 . 2.5
Tube Probability Update
At each iteration we calculate for each voxel theprobability that it has been identified as being part of the tube with Pt (p) = ∀i:p∈S(xit ) wti . Pt is then used to update the map Mt according to Mt = Mt−1 (1 − Pt )
3
Carotid Arteries Through Skull Base
The method is evaluated by tracking the internal carotid arteries (ICAs) in CTA data covering the anatomical region containing the skull base. This is a challenging problem because of the small spatial separation and strong overlap in intensity values of the ICAs and the skull base. A standard approach is the use of an additional CT scan to mask high intensity structures, but this is associated with increased radiation dose and increased acquisition time for the patient. Recently, methods have appeared that directly track the vessels in CTA, [9,10,11], but either lack methodological details [9] and/or an extensive validation [10,11]. The method described in [11] seems to be the most promising to date, but evaluation was limited to visual inspection by observing whether the tracked path was fully contained within the ICAs. In this work we quantitatively compare the results of our method to manually traced paths by two observers. 3.1
Data Acquisition and Parameter Settings
Fourteen consecutive patients suffering from a transient ischemic attack underwent a CTA examination. The data, acquired on a 16-slice CT scanner (Siemens Somatom Sensation 16, Forchheim, Germany) were reconstructed using a B46f kernel, and subsampled in-plane with linear interpolation to a 256 × 256 matrix, resulting in voxel sizes of 0.5×0.5×1.2mm, in order to reduce the computational costs. Visual inspection showed that 24 out of 28 carotids contained some form of pathology, 16 carotids showed mild calcifications, stenoses or aneurysms and eight carotids contained severe calcifications and/or stenoses. Parameters of the method were fixed and empirically selected: N = 1000, Nmax = 50, c = 10, s = 30, σϕ2 = 2.0, σr2 = 0.5, p(rt ) = N (rt |2.5mm, 0.5mm2 ), and l = 4.5mm.
568
3.2
M. Schaap et al.
Evaluation Method
The algorithm is initialized by placing a small vector in the vessel just after the carotid bifurcation. The method then iterates until a vessel length of at least 180 mm is tracked to ensure that each tracked path goes beyond the region containing the skull base and reaches the Circle of Willis. The segment length is 4 mm, therefore the final path will contain 46 points. For comparison, two observers manually traced both ICAs in each patients (resulting in a total of 56 manually traced paths). Points were first clicked on the axial slices and then visually inspected on the MPR images to determine whether the point was properly centralized. Points could be corrected on the MPR images if needed. A comparison between two paths is made segment-wise, by finding the minimum distances of two neighboring points on the first path to the second path. The radii of ICAs are approximately in the range of 1.0mm-2.5mm. If both distances are found to be smaller than 2.5 mm, than this path segment is considered to be corresponding. For the corresponding path segments the average distance is computed. In this way we both evaluate the tracking success rate and its accuracy. Both criteria are applied in the path comparisons between the method and observers M → O{1,2} , and between the observers O1 ↔ O2 .
4
Results
The results are summarized in Table 1 and 2. An example of the results is shown in figure 2. Overall, 95% of the tracked segments was found within 2.5mm of the annotated centerlines and for these segments the average distance was approximately 0.7mm, which is 0.2mm larger than the interobserver variability. Furthermore, the average distance of the corresponding segments per track was always smaller than 1.0mm. Seven tracks had an overlap of less than 100% with the track of one of the observers. Visual inspection showed that an annotation error was made by observer two, causing one of these cases. The other six tracks with less than 100% correspondance can be divided in a group of three (> 80%) where the track has some minor localization errors in the upper parts of the ICA and a group of three (< 80%) where the method failed to track from the seedpoint to the Circle of Willis. For these vessels, respectively 57, 72 and 78% was tracked. Table 1. The results of the comparison between observer 1 (O1 ), observer 1 (O1 ), and the method (M). ’Overlap’ shows the percentage of the total length of all 14 tracks found within 2.5mm. ’Avg. Dist.’ shows for all corresponding segments the average distance in millimeters to the reference track, weighted by segment length.
Overlap Avg. Dist.
M → O 1 M → O 2 O 1 → O2 O 2 → O1 0.95% 0.95% 0.99% 0.99% 0.69 0.67 0.46 0.47
Table 2. The automatic tracks (M) categorized on the percentage found within 2.5mm of the reference standard, respectively observer 1 (O1 ) and 2 (O2 )
100% 90-100% 80-90% <80%
M → O 1 M → O2 22 22 1 2 2 3 3 2
Bayesian Tracking of Tubular Structures and Its Application
569
Fig. 2. An example of an automatically generated curved planar reformatted image. The image shows calcifications in the internal carotid artery in the region after the skull base, depicted on the right side of the image.
5
Discussion and Conclusion
In this paper we have presented a tubular tracking approach within a Bayesian framework. This approach has a number of advantages over most existing tracking approaches. First, the method is flexible, since prior information can easily be incorporated. In our case, important prior information is incorporated through the observation model. Second, its non-deterministic character increases robustness of the method. The method has successfully been applied to the tracking of 28 ICAs through the difficult region of the skull base. The resulting paths were found to be close (< 1.0mm) to manually traced paths by two observers. Because the average computation time was approximately four minutes per carotid artery we believe that the method has large potential for clinical practice.
References 1. Aylward, S., Bullit, E.: Initialization, noise, singularities, and scale in height ridge traversal for tubular object centerline extraction. IEEE Transactions on Medical Imaging 21(2), 61–75 (2002) 2. Lorigo, L.M., Faugeras, O.D., Grimson, W.E.L., Keriven, R., Kikinis, R., Nabavi, A., Westin, C.-F.: Curves: Curve evolution for vessel segmentation. Medical Image Analysis 5, 195–206 (2001) 3. Florin, C., Paragios, N., Williams, J.: Globally optimal active contours, sequential monte carlo and on-line learning for vessel segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 476–489. Springer, Heidelberg (2006) 4. Schaap, M., Smal, I., Metz, C., van Walsum, T., Niessen, W.: Bayesian tracking of elongated structures in 3d images. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 74–85. Springer, Heidelberg (2007) 5. Doucet, A., Godsill, S., Andrieu, C.: On Sequential Monte Carlo Sampling Methods for Bayesian Filtering, Statistics and Computing, pp. 197–208 (2000) 6. Thacker, N.A., Aherne, F.J., Rockett, P.I.: The Bhattacharyya Metric as an Absolute Similarity Measure for Frequency Coded Data. In: TIPR vol. 34, pp. 363–368 (1998) 7. Weisstein, E.W.: Rotation Matrix. From MathWorld–A Wolfram Web Resource
570
M. Schaap et al.
8. Saff, E., Kuijlaars, A.: Distributing many points on a sphere. The Mathematical Intelligencer 19(1), 5–11 (1997) 9. Suryanarayanan, S., Mullick, R., Mallya, Y., Kamath, V., Nagaraj, N.: Automatic partitioning of head CTA for enabling segmentation. In: Fitzpatrick, J., Sonka, M. (eds.) SPIE Medical Imaging, vol. 5370, pp. 410–419 (2004) 10. Shim, H., Yun, I.D., Lee, K.M., Lee, S.U.: Partition-based extraction of cerebral arteries from CT angiography with emphasis on adaptive tracking. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 357–368. Springer, Heidelberg (2005) 11. Manniesing, R., Viergever, M., Niessen, W.: Vessel axis tracking using topology constrained surface evolution. IEEE Transaction on Medical Imaging 26(3), 309– 316 (2007)
Automatic Fetal Measurements in Ultrasound Using Constrained Probabilistic Boosting Tree Gustavo Carneiro1, Bogdan Georgescu1 , Sara Good2 , and Dorin Comaniciu1 1 2
Siemens Corporate Research, Integrated Data Systems Dept., Princeton, NJ, USA Siemens Medical Solutions, Innovations Ultrasound Div., Mountain View, CA, USA
Abstract. Automatic delineation and robust measurement of fetal anat-omical structures in 2D ultrasound images is a challenging task due to the complexity of the object appearance, noise, shadows, and quantity of information to be processed. Previous solutions rely on explicit encoding of prior knowledge and formulate the problem as a perceptual grouping task solved through clustering or variational approaches. These methods are known to be limited by the validity of the underlying assumptions and cannot capture complex structure appearances. We propose a novel system for fast automatic obstetric measurements by directly exploiting a large database of expert annotated fetal anatomical structures in ultrasound images. Our method learns to distinguish between the appearance of the object of interest and background by training a discriminative constrained probabilistic boosting tree classifier. This system is able to handle previously unsolved problems in this domain, such as the effective segmentation of fetal abdomens. We show results on fully automatic measurement of head circumference, biparietal diameter, abdominal circumference and femur length. Unparalleled extensive experiments show that our system is, on average, close to the accuracy of experts in terms of segmentation and obstetric measurements. Finally, this system runs under half second on a standard dual-core PC computer.
1
Introduction
Accurate fetal ultrasound measurements are one of the most important factors for high quality obstetrics health care. Common fetal ultrasound measurements include: bi-parietal diameter (BPD), head circumference (HC), abdominal circumference (AC), and femur length (FL). These measures are used to estimate the gestational age (GA) of the fetus [11] and are an important diagnostic tool. Although prevalent in clinical setting, the manual measurement by specialists of BPD, HC, AC, and FL present the following issues: 1) the quality of the measurements are user-dependent, 2) the exam can take more than 30 minutes, and 3) specialists can suffer from Repetitive Stress Injury (RSI) due to these lengthy exams. The automation of these ultrasound measures has the potential of improving productivity and patient throughput, enhancing accuracy and consistency of measurements, and reducing the risk of RSI to specialists. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 571–579, 2007. c Springer-Verlag Berlin Heidelberg 2007
572
G. Carneiro et al.
In this paper we present an on-line system that targets the accurate and robust detection and segmentation of fetal head, abdomen, and femur in ultrasound images. The segmentation information is then used to compute BPD, HC, AC, and FL. Our approach directly exploits the expert annotation of fetal anatomical structures in large databases of ultrasound images [3] to train a novel discriminative appearance classifier of simple image features derived from the probabilistic boosting tree classifier (PBT) [12]. Our method can handle two previously unsolved issues in the domain of fetal ultrasound imaging. First, our system is able to provide an accurate segmentation of the fetal abdomen. Second, the approach was designed to be totally automatic, so the user does not need to provide any initial guess. The only inputs to the system is the image and the measurement to be performed (BPD, HC, AC, or FL). Extensive experiments show that, on average, there is practically no difference between the measurement produced by our system and the annotation made by experts for the four fetal measurements mentioned above. Moreover, the algorithm runs under half second on a standard dual core PC computer.
2
Literature Review
The detection and segmentation of fetal anatomical structures in ultrasound images have been studied by several researchers [1,4,5,9]. An important common point among these papers is that they heavily exploit the low-level structures of the imaging of the fetal anatomy, such as edge or texture, which is usually effective at segmenting heads and femurs due to strong signal responses produced by the imaging of the skull and femur. However, the segmentation of abdomens (see Fig. 3) represents a more challenging problem, where such set of low-level structures has proved to be extremely hard to find. The most promising approaches by Chalana et al. [1,9] and by Jardim and Figueiredo [5] share the idea of describing the segmentation as an optimization process. They also share the issue that the optimization can get stuck at local minima, which may produce low quality segmentation results in images presenting noise and/or missing data. A constraint presented by both papers is the need of an initial guess by the user, yielding a semi-automatic system. Nevertheless, these papers present results that are of comparable accuracy as sonographers in small datasets (less than 50 images) of fetal heads and femurs. A common point among the papers above is that they did not work on the segmentation of fetal abdomen. In echocardiography, the detection and segmentation of the left ventricle of the heart in ultrasound images have produced similar algorithms based on an optimization process [14]. The issue of the arbitrary initial condition has been handled with the use of level sets [8], and the robustness to noise and missing data has been managed by the use of a shape influence term [10]. However, these methods still present some basic issues, such as the under utilization of the appearance model because of the use of relatively simple parametric models for the appearance term in the optimization function. Also, these techniques tend
Automatic Fetal Measurements in Ultrasound
a) search parameters
573
b) PBT
Fig. 1. Image region with five parameters (a) and the PBT tree structure (b)
to work well whenever image gradients separate the sought anatomical structure, but that might not be the case for complex anatomies (e.g., fetal abdomens). The method we propose in this paper is aligned with the state-of-the-art detection and high-level segmentation methods proposed in computer vision and machine learning. These methods involve user annotated training data, discriminant classifiers of local regions, and a way to combine the results of those local classifiers [6,3]. We exploit the database-guided segmentation paradigm [3] in the domain of fetal ultrasound images. This domain presents common issues encountered in ultrasound images, such as large amount of noise, signal drop-out and large variations between the appearance, configuration and shape of the anatomical structure. However, our method has to handle new challenges presented by fetal ultrasound images, such as the extreme appearance variability of the fetal abdomen imaging as well as to be easily generalized to all anatomical structures. In order to cope with these new challenges, we constrain the recently proposed probabilistic boosting tree classifier [12] to limit the number of nodes present in the binary tree, and also to divide the original classification problem into stages of increasing complexity.
3
Automatic Measurement of Fetal Anatomy
In this section we define the segmentation problem and explain the training and detection processes used for the process of automatic segmentation and measurement of fetal anatomies. 3.1
Problem Definition
The ultimate goal of our system is to compute the probability that an image region contains the sought structure of interest. An image region is represented by a set of N image features (here we use the Haar wavelet features [7,13] computed using integral images), so we define the vector I ∈ N as follows: I = f (θ),
(1)
where θ = [x, y, α, σx , σy ] with the parameters (x, y) representing the top left region position in the image, α denoting orientation, (σx , σy ), the region scale (see Fig. 1-(a)), and the function f (θ) computes the N image features in the
574
G. Carneiro et al.
region. Note that with these parameters, the ellipsoidal measurements for HC and AC are computed with a set of M points as follows: {(x + rσx cos α cos γ + rσy sin α sin γ, y − rσx sin α cos γ + rσy cos α sin γ)}, where γ = j 2π M with j = {0, ..., M − 1}, r = 0.75, and the line measurements for BPD and FL are defined by the two end points of line:{(x + rσi cos(α + γ), y + rσi cos(α + γ))}, where r = {−0.75, +0.75}, for FL, i = x and γ = 0, and for BPD, i = y and γ = π/2. A classifier then defines the following function: P (y|I), where y ∈ {−1, +1} with P (y = +1|I) representing the probability that the image region I contains the structure of interest (i.e., a positive sample), and P (y = −1|I), the probability that the image region I contains background information (i.e., a negative sample). Notice that the main goal of the system is to determine θ∗ = arg maxθ P (y|f (θ)). Therefore, our task is to train a discriminative classifier that minimizes the probability of mis-classification. 3.2
Constrained Probabilistic Boosting Tree
The classifier used for the anatomical structure detection is derived from the probabilistic boosting tree classifier (PBT) [12]. Training the PBT involves the recursive construction of a tree, where each of its nodes represents a strong classifier. The input training set for each node is divided into two sets (left or right) according to the result provided by the learned strong classifier. Each new set is then used to train the left and right sub-trees recursively. The posterior probability that a sample is positive is computed as follows [12]: P (y = +1|I) = P (y = +1|ln , ..., l1 , I)...P (l2 |l1 , I)P (l1 |I), (2) l1 ,l2 ,...,ln
Fig. 2. Expert annotation of (left to right): BPD, HC, AC, and FL
where n is the total number of nodes of the tree (see Fig. 1-(b)). The original PBT classifier presents a problem: if the classification is too hard, two problems may occur: a) overfit of the training data in the nodes close to the leaves and b) long training and detection procedures. As a result, in order to reduce the training and detection times and improve the generalization ability of the classifier, we propose the Constrained PBT (CPBT) algorithm, which constrains the original PBT training process as follows: a) instead of having only one classifier, we train a sequence of classifiers with increasing complexity; and b) given that the
Automatic Fetal Measurements in Ultrasound
575
Algorithm 1. Training algorithm. + : Set of training images with respective true positive parameter {(I, θ )i }i=1,..,M I Coarse search sampling interval δcoarse Height Hcoarse of the Coarse classifier tree and total number of nodes Ncoarse Fine search sampling interval δfine Height Hfine of the Fine classifier tree and total number of nodes Nfine + I = ∅ and I − = ∅ for i = 1, ..., M do Generate P positives samples by randomly sample the parameter space with ˜+ ∈ [θ+ − δcoarse , θ+ + δcoarse ] and add these samples to the set of positive samples I + θ I I I Generate N negative samples by randomly sample the parameter space with ˜− ∈ [θ+ − δcoarse , θ+ + δcoarse ] and add these samples to the set of negative samples I − θ I I I end Train coarse classifier of height Hcoarse and number of nodes Ncoarse using I + and I − . I + = ∅ and I − = ∅ Data
for i = 1, ..., M do Generate P positives samples by randomly sample the parameter space with + + ˜+ ∈ [θ+ − δ θ fine , θI + δfine ] and add these samples to the set of positive samples I I I Generate N negative samples by randomly sample the parameter space with + + + ˜− ∈ [θ+ − δ ˜− θ fine , θI + δfine ] and θI ∈ [θI − δcoarse , θI + δcoarse ] and add these samples to the set I I of negative samples I − end Train fine classifier of height Hfine and number of nodes Nfine using I + and I − . Result : Coarse and fine classifiers.
classification problem of each classifier is simpler than the original problem, the height and the number of tree nodes are constrained. 3.3
Training a Constrained PBT
Figure 2 shows expert annotations of fetal measurements. Note that this annotation explicitly defines the parameter θI+ for the true positive sample of the training image (1). Figure 3 shows image regions returned by sampling the parameter space with θI+ for several training images. As mentioned in Sec. 3.2, the training processes involves a two stage classification problem of increasing complexity (see Alg. 1). The first stage, referred to as coarse stage, is robust to false negatives, but it accepts a relatively large number of false positives, while the second stage, which is called fine stage, is more selective, being robust to false positives.
Fig. 3. Examples of the training set for BPD and HC (first three images), AC (images 4-6), and FL (last three images)
3.4
Detection
The detection algorithm also runs in two stages, as described in Algorithm 2. The coarse detection samples the search space uniformly using the δcoarse as the sampling interval, while the fine detection searches the hypotheses selected from the coarse search at smaller intervals of δfine .
576
G. Carneiro et al.
Algorithm 2. Detection algorithm. Data
: Test image and anatomy to be detected Coarse and fine classifiers Measuremet to be performed (BPD, HC, AC, or FL) Hcoarse = ∅ for θ = [0, 0, 0, 0, 0] : δcoarse : max(search space) do Sample image region I = f (θ) (1) Compute P (y = +1|I) (2) using coarse classifier Hcoarse = Hcoarse (θ, P (y = +1|I)) end Assigned the top H hypotheses from Hcoarse in terms of P (y = +1|I) to Hfine for i = 1, ..., H do Assume (θi , Pi ) = ith element of Hfine for θ = (θi − δcoarse ) : δfine : (θi + δcoarse ) do Sample image region I = f (θ) (1) Compute P (y = +1|I) (2) using fine classifier (θ, P (y = +1|I)) Hfine = Hfine end end Select the top hypothesis from Hfine in terms of P (y = +1|I). Result : Parameter θ of the top hypothesis.
Ë
Ë
3.5
Training Results
We have 1, 426 expert annotated training samples for head, 1, 168 for femur, and 1, 293 for abdomen. A coarse and a fine CPBT classifiers were trained, each with six levels. We are interested in determining the tree structure of the classifier, where we want to constrain the tree to have the fewest possible number of nodes without affecting the classifier performance. Recall from Sections 3.3 and 3.4 that the fewer the number of nodes, the more efficient the training and the detection algorithms. Here, we compare the performance of the full binary tree against a tree constrained to have only one child per node. The number of weak classifiers in each node is arbitrarily set at between 10 and 300 depending on the tree structure (full binary tree classifiers usually uses fewer weak classifiers to avoid over-fitting). The sampling interval for the training algorithm 1 is set as δcoarse = [20, 20, 20, 20, 20], and δfine = [10, 10, 10, 10, 10]. These two vectors contain the variation for each dimension of the parameter space (i.e., position (x, y) in pixels, orientation α in angles, and scale (σx , σy ) in pixels). Note that these sampling intervals were set based on experiments ommited in this paper due to lack of space. Finally in Algorithm 1, the number of additional positives per image P = 100 and the number of negatives per image N = 1000. In Fig. 4 we see the measurement errors for HC and BPD in the training for the constrained tree and the full binary tree. Assuming that the GT contains the expert annotation and DT denotes the automatic measurement produced by the system, the error is computed as: error = |GT − DT |/GT.
(3)
Notice that the performance of the constrained tree is better than that of the full binary tree for the parameters. This is explained by the fact that the constrained tree is more regularized and should be able to generalize better than the full binary tree. Another key advantage of the constrained tree is the efficiency in training. For the cases above, the training process for the full binary tree takes between seven to ten days, while for the constrained tree the whole training takes two to four days on a standard PC computer. Hence, a constrained tree classifier should be used in the experiments.
Automatic Fetal Measurements in Ultrasound 0.2
0.2
Constrained PBT
Constrained PBT 0.15
Full Binary Tree
0.15
Full Binary Tree
error
error
577
0.1 0.05
0.1
0.05
0 0
0.2
0.4 0.6 sorted data
0.8
0 0
1
0.2
a) HC
0.4 0.6 sorted data
0.8
1
b) BPD
Fig. 4. Training error (3) comparison between cascade and full binary tree 0.2
0.2
train
0.1
0.05
0 0
train
test
test
0.15
0.1
0.05
0.2
0.4 0.6 sorted data
a) BPD
0.8
1
0 0
0.2
train
error
0.15 error
error
0.15
test
train 0.15 error
0.2
0.1
0.05
0.2
0.4 0.6 sorted data
0.8
1
0 0
b) HC
test
0.1
0.05
0.2
0.4 0.6 sorted data
0.8
1
c) AC
0 0
0.2
0.4 0.6 sorted data
0.8
1
d) FL
Fig. 5. Error (3) in ascending order for each anatomy in the training and test sets. The horizontal axis is normalized to vary from zero to one.
4
Experimental Results
In this section we show qualitative and quantitative results of the databaseguided image segmentation based on the CPBT classifier. Using the detection algorithm described in Sec. 3.5, which was trained using the training set listed in Sec. 3.5 for each anatomy, we compute the error in a test set composed of 177 ultrasound images of fetal head, 183 of fetal abdomen, and 171 of fetal femur. Fig. 5 shows the error in ascending order for each anatomy in the training and test sets. Table 1 displays the median and mean errors for the training and test sets. The running time for this algorithm is under half second for all measurements on a standard dual-core PC computer. Finally, Fig. 6 displays a few segmentation results produced by our method. This system was extensively tested in a clinical setting of ultrasound examinations. In this experiment, we noticed that 20% of the cases show a relatively large error. Since these cases can be considered to be outliers, we left them out of the following results. The results showed that in the remaining 80% of the test cases, the system produced an average error of 0.0265 (3) with respect to a Table 1. Median and mean errors (3) of each one of the measurements Measurement
BPD
HC
AC
FL
median error in training set 0.0249 0.0187 0.0277 0.0241 median error in test set
0.0269 0.0211 0.0319 0.0316
mean error in training set
0.0306 0.0225 0.0399 0.0668
mean error in test set
0.0333 0.0247 0.0479 0.0588
578
G. Carneiro et al.
ground truth measurement computed as the average of the measurement of 15 experts in hundreds of images presenting fetal heads, femurs, and abdomens. It is important to mention that the average error between the users’ measurements and the same ground truth was exactly 0.0265.
a) BPD
b) HC
c) AC
d) FL
Fig. 6. Detection and segmentation results
5
Conclusions
We presented a system that automatically measures the BPD and HC from ultrasound images of fetal head, AC from images of fetal abdomen, and FL in images of fetal femur. Our system uses a large database of expert annotated images in order to train a Constrained Probabilistic Boosting Tree classifier. We showed that this system is capable of handling previously unsolved issues in the domain of fetal ultrasound imaging, such as the effective abdomen segmentation, and completely automated measurement procedure without user assistance. The results show that our system produces accurate results, and the clinical evaluation shows results that are, on average, close to the accuracy of sonographers. Moreover, the algorithm is extremely efficient and runs in under half second on a standard dual-core PC computer. Finally, the clinical evaluation also showed a seamless integration of our system into the clinical workflow.
References 1. Chalana, V., Winter II, T., Cyr, D., Haynor, D., Kim, Y.: Automatic fetal head measurements from sonographic images. Acad Radiology 3(8), 628–635 (1996) 2. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28(2), 337–374 (2000) 3. Georgescu, B., Zhou, X., Comaniciu, D., Gupta, A.: Database-guided segmentation of anatomical structures with complex appearance. In: IEEE CVPR, IEEE Computer Society Press, Los Alamitos (2005)
Automatic Fetal Measurements in Ultrasound
579
4. Hanna, C., Youssef, A.: Automated measurements in obstetric ultrasound images. In: ICIP (1997) 5. Jardim, S., Figueiredo, M.: Segmentation of fetal ultrasound images. Ultrasound in Medicince and Biology 31(2), 243–250 (2005) 6. Levin, A., Weiss, Y.: Learning to combine bottom-up and top-down segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, Springer, Heidelberg (2006) 7. Oren, M., Papageorgiou, C., Sinha, P., Osuna, E., Poggio, T.: Pedestrian detection using wavelet templates. In: IEEE CVPR, IEEE Computer Society Press, Los Alamitos (1997) 8. Paragios, N., Deriche, R.: Geodesic active regions for supervised texture segmentation. In: ICCV, pp. 926–932 (1999) 9. Pathak, S.D., Chalana, V., Kim, Y.: Interactive automatic fetal head measurements from ultrasound images using multimedia computer technology. Ultrasound in Medicine and Biology 23(5), 665–673 (1997) 10. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, Springer, Heidelberg (2002) 11. Schluter, P.J., Pritchard, G., Gill, M.A.: Ultrasonic fetal size measurements in Brisbane, Australia. Australasian Radiology 48(4), 480–486 (2004) 12. Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In: ICCV (2005) 13. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE CVPR, IEEE Computer Society Press, Los Alamitos (2001) 14. Zhu, S., Yuille, A.: Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE TPAMI 18, 884–900 (1996)
Quantifying Effect-Specific Mammographic Density Jakob Raundahl1 , Marco Loog1,2 , Paola Pettersen3 , and Mads Nielsen1,2 1
University of Copenhagen, Department of Computer Science, Denmark 2 Nordic Bioscience, Herlev, Denmark 3 Center for Clinical and Basic Research, Ballerup, Denmark
Abstract. A methodology is introduced for the automated assessment of structural changes of breast tissue in mammograms. It employs a generic machine learning framework and provides objective breast density measures quantifying the specific biological effects of interest. In several illustrative experiments on data from a clinical trial, it is shown that the proposed method can quantify effects caused by hormone replacement therapy (HRT) at least as good as standard methods. Most interestingly, the separation of subpopulations using our approach is considerably better than the best alternative, which is interactive. Moreover, the automated method is capable of detecting age effects where standard methodologies completely fail.
1
Introduction
Women with high mammographic density appear to have a four to six fold increase in breast cancer risk, as indicated by numerous studies [1,2,3,4]. Since this makes breast density a surrogate measure of risk for developing cancer in the breast, a sensitive measure of changes in this density would be most valuable and could aid as a diagnostic tool for the analysis of mammograms. A related issue concerns the influence of post-menopausal hormone replacement therapy, which induces an increase in mammographic density [5,6,7]. During hormone dosing, such a sensitive measure would provide a reliable indicator of the gynecological safety of a given treatment modality. The classical way to measure breast density is to use a categorical score, such as either Wolfe patterns [1] or the breast imaging reporting and data system (BI-RADS) [8]. These measures have been constructed to explain different kinds of biological and mammographic effects. The main purpose of Wolfe patterns, as an example, is to indicate breast cancer risk and BI-RADS mainly indicates masking effect. The aim of the presented work is to provide a framework for obtaining more accurate and sensitive measurements of breast density changes related to specific effects. Given effect-grouped patient data, we propose a statistical learning scheme providing such a non-subjective and reproducible measure and compare it to the BI-RADS measure and a computer-aided percentage density. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 580–587, 2007. c Springer-Verlag Berlin Heidelberg 2007
Quantifying Effect-Specific Mammographic Density
581
Several approaches to other automatic methods for assessing mammographic breast density have been suggested [9,10,11,12,13]. All of these aim at reproducing the radiologist’s categorical rating system or at segmenting the dense tissue to get a percentage density score. Our approach differs from existing methods in mainly three ways 1. Breast density is considered a structural property of the mammogram, that can change in various ways explaining different effects. 2. The measure is derived from observing a specific effect in a controlled study. 3. The method is invariant to affine intensity changes. We mean to convince the reader of the fact that density changes can indeed be perceived as a structural matter that may be accessed ignoring the actual brightness of the images and that it changes differently under the physiological processes of aging and HRT. The following section, Section 2, introduces the medical study that produced the images used in this investigation. Subsequently, Section 3 describes the two standard methods and the new supervised method in detail. Section 4 contains a description of the experimental setup and results. Section 5 consists of discussions and conclusion.
2
Materials
Since HRT has been shown to increase mammographic density[5,6,7], images from a placebo controlled HRT study may be used to evaluate density measures by their ability to separate the HRT and placebo populations. Furthermore, aging effects can be detected by stratifying the baseline patients according to age. The data used in this work is from a 2-year randomized, double-blind, placebocontrolled clinical trial, in which the participants received either 1 mg 17βestradiol continuously combined with 0.125 mg trimegestone (n=40), or placebo (n=40) for 2 years. At entry into the study, women were between 52 and 65 years of age, at least 1 year postmenopausal with a body mass index less than or equal to 32 kg/m2 . Breast images were acquired at the beginning (t0 ) and the end of the 2-year treatment period (t2 ) using a Planmed Sophie mammography X-ray unit. The images were then scanned using a Vidar scanner to a resolution of approximately 200 microns with 12 bit gray-scales. Delineation of the breast boundary on the digitized image was done manually, using 10 points along the boundary connected with straight lines, resulting in a decagon region of interest. Only the right mediolateral oblique view was used, since it has been shown previously that a reliable measure of the breast density can be assessed from any one view [14]. We denote the patient groups P0, P2, H0, and H2 for placebo and treatment at t0 and t2 respectively.
582
3
J. Raundahl et al.
Methods
For both methods involving human interaction, the reading radiologist was blinded with respect to the labelling of the images and the images were presented in random order. The same radiologist made all readings. 3.1
BI-RADS
Breast imaging reporting and data system (BI-RADS) is the four category scheme proposed by the American College of Radiology [8]. The BI-RADS categories are: 1) Entirely fatty; 2) Fatty with scattered fibroglandular tissue; 3) Heterogeneously dense; 4) Extremely dense. A trained radiologist assigns the mammogram to one of these categories based on visual inspection. It is included here since it is widely used both in clinical practice and for automated and computer aided approaches [15]. 3.2
Interactive Threshold Method
The reading radiologist determines an intensity threshold using a slider in a graphical user interface. She is assisted visually by a display showing the amount of dense tissue corresponding to the current slider position. The system is similar to the approach proposed by Yaffe [16] and has been used in several clinical trials [15]. The density is defined as the ratio between segmented dense tissue and total area of breast tissue. 3.3
The Supervised Approach
Our density measure is derived by training a pixel classifier on subsets of images from the available data. These subsets are chosen to represent the different changes in density to be detected by the method. As an example, one subgroup may be the H2 images from hormone treated patients and the other the P2 images from the placebo group. Most often, as in our case, the pixel classification would be based on local features that describe the image structure in the vicinity of every pixel to be classified. Generally, the features extracted per pixel will exhibit large similarity for every image even though they may come from the two different sets of images. Therefore, for individual pixels, it will be difficult to decide to which of the subsets it belongs. Fusing all weak local decisions, however, into a global overall score per image ensures that sufficient evidence in favor of one of the two groups is accumulated and allows for a more accurate decision. In this work, a simple fusion strategy is employed. After every pixel has been provided with a posterior probability by the classifier, the average probability per pixel in the image is determined. This mean is then taken as the final score. Obviously, several other fusion schemes are possible (see e.g. [17]), but we do not necessarily expect benefit from these. An example of a mammogram with corresponding pixel probability maps is shown in Fig. 1. Below follows a more
Quantifying Effect-Specific Mammographic Density
(a)
(b)
(c)
583
(d)
Fig. 1. Mammogram from the data set (a); pixel classification result using the classifiers HRTC, HRTL and AGE respectively (b), (c), (d)
precise description of the features and a description of the various subgroups used to train the classifiers. Features. A specific three dimensional feature space is used since a previous study found these features to be associated with breast density [18]. These features are invariant to affine intensity transformations of the image and, in addition, point noise robustness is provided through convolution with a Gaussian kernel. For every pixel in the breast tissue, features based on eigenvalues of Hessian at three scales are determined. The Hessian at scale s is defined by ⎤ ⎡ 2 ∂ I ∂2I ⎢ ∂x2 ∂x∂y ⎥ ⎥ ⎢ Hs (I) = Gs ∗ ⎢ ⎥ ⎣ ∂2I ∂2I ⎦ ∂y∂x ∂y 2 where Gs denotes the Gaussian at scale (standard deviation) s. This is implemented by analytical derivation of the Gaussian prior to convolution using the fact that G ∗ ∂I = I ∗ ∂G [19]. The scales used are 1, 2 and 4 mm. The features used are given by the quotient: qs =
|e1 | − |e2 | |e1 | + |e2 | +
where e1 and e2 are eigenvalues of the Hessian at specific scale s and e1 > e2 , and is a small positive number ( = 10−5 ) to avoid numerical stability problems in the accidental and non-generic planar regions of an image where e1 ≈ e2 ≈ 0. This ratio is related to the elongatedness of the image structure at the point (x, y) at the scale s.
584
J. Raundahl et al.
Subgroups and Classifiers. Three combinations of subgroups are used for classifier training and tested in the experiments conducted subsequently: HRTL Subsets H0 and H2 are used to capture the effect of HRT. There is also an effect of aging, but it is expected to be much lower than that of HRT. The trained classifier is referred to as HRTL (longitudinal). HRTC Subsets P2 and H2 are used to capture the effect of HRT. Separation between classes is expected to be lower, since inter-patient biological variability is diluting the results. The trained classifier is referred to as HRTC (crosssectional). AGE The baseline population (P0 and H0) is stratified into three age groups, and the first and last tertile are used to capture the effect of age. The second tertile is used as control population. The trained classifier is referred to as AGE. In each case every pixels receives a label based on the subgroup it belongs to and a k nearest neighbours classifier (k = 100) is trained using this data to separate pixels from the two classes. The use of this powerful, non-parametric classifier is justified by the large number of pixels and low dimensionality of the feature space. Due to the limited amount of patients, the data is not split up into a training and a test set. Instead the classifier is trained on all but a pair of images (one image from each class) and pixel probabilities are computed for this pair using the trained classifier. This is repeated until all pixel probabilities for all images are computed. This technique is similar to leave-one-out [17], but is modified to leave-two-out since leaving a sample out from one class creates a bias in the training set toward the other. This effect is unwanted, especially when dealing with a small number of samples. In the training phase, feature vectors from 10,000 randomly selected pixels within the breast region in each image are used, which is sufficient considering that our feature space is only three dimensional.
4
Experimental Setup and Results
The experiments serve to answer two questions. How does the separation of the hormone treated subpopulation, H2, compare to the same patients at baseline, H0, and the control population who received placebo, P2, for the different measures? And, can any of the measures detect the aging of the placebo group by separating P2 and P0? Statistical t-tests are used to test for significance in the separation and resulting p-values make a comparison of methods possible. Table 1 shows p-values for all combinations of methods and relevant pairs of groups. The first two columns are paired two-sided t-tests, while the last two columns are unpaired. In Fig. 2 the density changes are shown using the three different training strategies together with the BI-RADS scores and the
Quantifying Effect-Specific Mammographic Density
585
Table 1. p-values for the different methods and tests. Thresholding is abbreviated TH. Method\Test
P0 vs. P2
BI-RADS
H0 vs. H2
P0 vs. H0
P2 vs. H2
0.3
< 0.001
0.3
0.1
1
< 0.001
0.8
0.02
HRTL
0.08
< 0.001
0.7
0.003
HRTC
0.4
0.003
0.7
0.001
0.004
0.4
0.8
0.07
Interactive TH
Age
t0
t2
t0
BI-RADS
t0
t2
t0
t2
Percentage density
t2
t0
HRTL
AGE
t2
HRTC
Fig. 2. Relative longitudinal progression of the different measures. The placebo group is indicated with a dashed line; HRT group by a solid. Vertical bars indicate the standard deviation of the mean of the subgroups at t2 and of the entire baseline population at t0 .
55
56
57 58 Patient age
59
60
61
Fig. 3. AGE density as a function of age in tertiles in the baseline population including the standard deviation of the mean for each tertile
586
J. Raundahl et al.
percentage density. The actual values of the different scores are immaterial and are not displayed. The figure allows for a qualitative comparison of the methods by showing the relative progressions of the HRT and placebo groups combined with standard deviations of means. Fig. 3 shows that the differences between P0 and P2 indicated by the AGE classifier is indeed an age related effect and not a general difference in image appearance at t0 and t2 . The baseline population is again stratified into three age groups and the AGE measures show an increasing trend with increasing age. The values for the first and last tertile are significantly different (p = 0.015).
5
Discussion and Conclusions
The first observation that should be made is that none of the methods separate the two baseline groups P0 and H0, confirming successful randomization. The second immediate observation is that the interactive threshold shows better capability to separate P2 and H2 than the categorical BI-RADS methodology. This might be explained by the continuous nature of the threshold measure making it more sensitive. For the automatic measures, HRTL and HRTC performs even better than the percentage density, and AGE detects the aging effect in a very significant way as opposed to the currently available techniques, which are unable to detect any meaningful changes. In conclusion, the proposed methodology shows substantial merit as it performs considerably better than both the BI-RADS and the percentage density method, the current state of the art. As shown in this work, the approach can be trained to detect changes due to aging and HRT. These changes might not be interesting in themselves, but because of the supervised machine learning approach employed, the method can be easily adapted to the detection of other mammographic changes. Ultimately, it may be possible to train the system to accurately quantify breast cancer risk, providing better risk assessment than the standard density measures. Clearly, we may need additional features to detect these different effects, but our general and effective framework can readily cope with such extensions.
References 1. Wolfe, J.N.: Risk for breast cancer development determined by mammographic parenchymal pattern. Cancer 37(5), 2486–2498 (1976) 2. Boyd, N.F., O’Sullivan, B., Campbell, J.E., et al.: Mammographic signs as risk factors for breast cancer. British Journal of Cancer 45, 185–193 (1982) 3. Boyd, N.F., Byng, J.W., Jong, R.A., Fishell, E.K., Little, L.E., Miller, A.B., Lockwood, G.A., Trichler, D.L., Yaffe, M.J.: Quantitative classification of mammographic densities and breast cancer risk: Results from the canadian national breast screening study. Academic Radiology 87(9), 670–675 (1995)
Quantifying Effect-Specific Mammographic Density
587
4. van Gils, C.H., Hendriks, J.H.C.L., Holland, R., Karssemeijer, N., Otten, J.D.M., Straatman, H., Verbeek, A.L.M.: Changes in mammographic breast density and concomitant changes in breast cancer risk. European Journal of Cancer Prevention 8, 509–515 (1999) 5. Greendale, G.A., Reboussin, B.A., Slone, S., Wasilauskas, C., Pike, M.C., Ursin, G.: Postmenopausal hormone therapy and change in mammographic density. Journal of the National Cancer Institute (January 2003) 6. Colacurci, N., Fornaro, F., Franciscis, P.D., Palermo, M., del Vecchio, W.: Effects of different types of hormone replacement therapy on mammographic density. Maturitas 40 (November 2001) 7. Sendag, F., Cosan, M.T., Ozsener, S., Oztekin, K., Bilgin, O., Bilgen, I., Memis, A.: Mammographic density changes during different postmenopausal hormone replacement therapies. Annals of Internal Medicine 76 (September 2001) 8. American College of Radiology: Illustrated Breast Imaging Reporting and Data System, 3rd edn. American College of Radiology (1998) 9. Boone, J.M., Lindfors, K.K., Beatty, C.S., Seibert, J.A.: A breast density index for digital mammograms based on radiologists’ ranking. Journal of Digital Imaging 11(3), 101–115 (1998) 10. Karssemeijer, N.: Automated classification of parenchymal patterns in mammograms. Physics in Medicine and Biology 43, 365–378 (1998) 11. Byng, J.W., Boyd, N.F., Fishell, E., Jong, R.A., Yaffe, M.J.: Automated analysis of mammographic densities. Physics in Medicine and Biology 41, 909–923 (1996) 12. Tromans, C., Brady, M.: An alternative approach to measuring volumetric mammographic breast density. In: Astley, S.M., Brady, M., Rose, C., Zwiggelaar, R. (eds.) International Workshop on Digital Mammography, pp. 26–33. Springer, Heidelberg (2006) 13. Petroudi, S., Brady, M.: Breast density segmentation using texture. In: Astley, S.M., Brady, M., Rose, C., Zwiggelaar, R. (eds.) International Workshop on Digital Mammography, pp. 609–615. Springer, Heidelberg (2006) 14. Byng, J.W., Boyd, N.F., Little, L., Lockwood, G., Fishell, E., Jong, R.A., Yaffe, M.J.: Symmetry of projection in the quantitative analysis of mammographic images. European Journal of Cancer Prevention 5, 319–327 (1996) 15. Boyd, N.F., Rommens, J.M., Vogt, K., Lee, V., Hopper, J.L., Yaffe, M.J., Paterson, A.D.: Mammographic breast density as an intermediate phenotype for breast cancer. The Lancet Oncology 5, 798–808 (2005) 16. Byng, J.W., Boyd, N.F., Fishell, E., Jong, R.A., Yaffe, M.J.: The quantitative analysis of mammographic densities. Physics in Medicine and Biology 39, 1629– 1638 (1994) 17. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: A review. IEEE Tr. on PAMI 22(1), 4–37 (2000) 18. Raundahl, J., Loog, M., Nielsen, M.: Mammographic density measured as changes in tissue structure caused by hrt. SPIE Medical Imaging (2006) 19. Koenderink, J.J.: The structure of images. Biological cybernetics 50(5), 363–370 (1984)
Revisiting the Evaluation of Segmentation Results: Introducing Confidence Maps Christophe Restif Department of Computing, Oxford Brookes University, Oxford OX33 1HX, UK [email protected]
Abstract. We introduce a novel framework, called Confidence Maps Estimating True Segmentations (Comets), to store segmentation references for medical images, combine multiple references, and measure the discrepancy between a segmented object and a reference. The core feature is the use of efficiently encoded confidence maps, which reflect the local variations of blur and the presence of nearby objects. Local confidence values are defined from expert user input, and used to define a new discrepancy error measure, aimed to be directly interpreted quantitatively and qualitatively. We illustrate the use of this framework to compare different segmentation methods and tune a method’s parameters.
1
Introduction
Evaluating a segmentation method is important, necessary, and difficult to define. It is important to assess quantitatively a method’s results to validate it, for example on clinical data sets. It is necessary to rank segmentation results, when tuning parameters or comparing methods. However, evaluation in itself is difficult to define: it depends on the definition and purpose of the segmentation, and on the types of errors relevant for the task. Additional issues affect medical imaging. Partial volume effect and blur cause ambiguity on the actual location of borders, and expert variability lead to multiple references for the same objects. Recent publications have defined evaluation criteria. In [1], Zhang describes three types of evaluation. Analysis focuses on the algorithmic part of segmentation; goodness evaluates results using image and object properties, with no external reference; discrepancy compares segmentation results to references (or ground truth). In [2], Udupa et al. emphasise that evaluation depends on the application domain: the application or task considered, the body region imaged, and the imaging protocol. Each application domain has specific properties, such as the type of noise in images, which affect the choice of an evaluation method. Yet, some issues remain open: in particular, how to define a reference and cope with multiple references, and how to measure the agreement between a segmented object and a reference. This article focuses on discrepancy evaluation, and presents our contributions to those open issues, in the context of two-dimensional images containing multiple objects affected by blur or partial volume effect. For illustration we use examples from cytometry, where these effects are common. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 588–595, 2007. c Springer-Verlag Berlin Heidelberg 2007
Revisiting the Evaluation of Segmentation Results
589
References for segmentation are often defined as binary masks, drawn by experts or computed from segmentations. Different experts, or a given expert at different times, may disagree on some parts of objects’ boundaries: those effects are termed inter- and intra-user variability. Although keeping the different references of a single object can be useful, in particular to analyse these variabilities [3,4], some applications require a single reference per object, which calls for a way of merging multiple references [5, 6, 4]. One strategy is to average the pixel distance between the borders of the multiple references [7]. Another is to optimise the overlaps between the multiple references and the result of the merging. STAPLE [4] measures overlaps with specificities and sensitivities, while VALMET [3] uses more diverse measures. Overlaps areas may depend on object and image sizes though, which might be an issue when objects are significantly smaller than images [8], in particular in many cytometry applications. Also, the results of such methods do not indicate the differences between the input references. A solution [5] is to average the binary values of the masks, resulting in a multi-valued mask with as many possible values as there are input references. The agreement between a segmented shape and a reference may be measured with a variety of methods. Area-based measures classify pixels in different regions (e.g. true and false positives and negatives) and return various ratios of region areas. Distance-based measures focus on the distance between the reference’s and the segmented object’s borders, and return various statistics on them (e.g. maximum, average, variance). Although they allow method comparisons, such measures may be difficult to interpret. Area-based measures may depend on objects and background sizes, and treat all pixels with the same importance (whether they are in blurred regions or well inside objects). Distance-based and fuzzy measures [9] have to be compared to some reference value, e.g. the width of blur around objcts, which may vary locally. One approach to evaluation is to use several measures [2, 3, 10]. It produces more data, but leads to further questions on how to combine numbers in different units, or on which measures are more informative. Introducing classifiers to combine measures significantly increases the complexity of evaluation, as it creates dependencies on choices of classifiers, parameters and training sets. Our main contribution is to replace the binary-mask type of reference with a confidence map, storing more information in the reference at low cost in terms of user input and storage. Under this framework, called Confidence maps estimating true segmentations (Comets), similar techniques for merging references and measuring agreements can easily be defined, as described below, with the aim to be more easily interpreted, and more meaningful for some applications.
2
Method
Motivation.We focus our work on two-dimensional medical images, showing any number of objects, and affected by blur or partial volume effect. Examples from cytometry are shown in Fig. 1. On these images, objects of interests (1) are surrounded with blurred regions (2) or nearby objects (3), and with background (4)
590
C. Restif 4
4 1
2 3
4
1
2
3
3
1
4 2
4
1
2
3 1
2 3
Fig. 1. Local variations of ambiguity around an object. See Section 2 for details.
further away. The location of the object’s borders may be unclear in zones (2), but more precise locally (3), due to nearby objects or lower blur. The distance to the object (1) is not enough to discriminate zones (2) and (3), and while the area of zone (2) may be significant compared to zone (1), slight variations of segmentation in zone (2), where a correct reference is difficult to define, may still be acceptable for segmentation applications. The motivation for using confidence maps as references, is to define an evaluation measure such that the penalty for mis-segmenting the zones of ambiguity (2) is lower than for the other zones, and the penalties for mis-segmenting zones (3) and (4), i.e. under-segmenting the object, can be differentiated from mis-segmenting zone (1), i.e. over-segmenting the object. Specifying where these zones stand is left to the expert user defining the references for the objects. We call the framework Comets, for Confidence maps estimating true segmentations: although the true segmentation of an object may not be known with one-pixel precision, it may be estimated with a confidence map. Confidence map. For each object, every pixel P is given a real number Conf (P ) reflecting the expert’s confidence in where P stands, as follows. Qualitatively, ⎧ if Conf (P ) = 0 : P is believed to be on the border of the object ⎪ ⎪ ⎨ if −1 ≤ Conf (P ) ≤ 1 : P is believed to be on or close to the border if Conf (P ) > 1 : P is strongly believed to be part of the object ⎪ ⎪ ⎩ if Conf (P ) < −1 : P is strongly believed to be out of the object Quantitatively, the higher |Conf (P )|, the more confident the expert is that P belongs to the object or to the background. The pixels with low confidence, between -1 and 1, form a band around the object’s border (see white bands on Fig. 2c and Fig. 4d). The local width of the band reflects the expert’s local confidence in the border’s actual location: the narrower it is locally, the more confident the expert is that the border drawn corresponds to the actual border. It corresponds to the zone (2) described above. Pixels with confidence above 1 are in zone (1), while pixels with confidence below -1 are in zones (3) and (4). To avoid storing the whole map on the whole image for each object, we use the following model. Along a line normal to the object’s border, the confidences grow linearly, with a linear factor depending both on the location of that line (i.e. where the line intersects the border), and on the direction of the line, either inward or outward. It allows local variations of confidence around the object, and independent confidences for the inside and outside of objects. With this model, each pixel B on the border is assigned two positive parameters, called inner and outer confidence factor. This is enough to compute the confidence of any pixel
a
b
limit pixels
c l2 l1
border pixels
591 3 2 1 0 -1 -2 -3
Confidence
Revisiting the Evaluation of Segmentation Results
Fig. 2. Construction of a Comets. a: original image; b: user input (border and limit pixels); c: confidence map. Lines l1 and l2 show regions with different variations of confidence, reflecting the expert’s local confidence on the location of the border.
P . If P is on the border, Conf (P ) = 0. Else, let B be the border pixel closest to P , at distance d and with confidence factors IC (B) and OC (B). If P is inside the object, Conf (P ) = ICd(B) . If P is outside the object, Conf (P ) = − OCd(B) . By visualising a confidence map as a height map, the border is the level set 0, the object stands at positive height (mountain-shaped) and the rest of the image at negative heights. In this height map, the confidence factors stored at each border pixel correspond to the gradients of the slopes, inwards and outwards. These two slopes are independent, and may vary along the border. This is illustrated in Fig. 2c: the gradient is sharper along line l1 than l2 . User input to define Comets. The previous paragraph details how to compute the confidence of any pixel given the border pixels and their confidence factors. However, these factors may not be easy to define manually. We use the following scheme instead. An expert draws a continuous line of pixels (called border pixels) where they think the actual border stands. Then, the expert selects any number of pixels inside the object, called inner limit pixels, which they are certain belong to the object and are as close to the border as possible. These will receive confidence value 1. Similarly, they select pixels outside the object (referred to as outer limit pixels), as close to the border as possible while confidently belonging to the background. These pixels will have confidence value -1. An example of user input is shown in Fig. 2b. Any number of limit pixels can be selected. With this input, the confidence factors are computed as follows. First, for each limit pixel P , the closest border pixel E, at distance d, is found. If P lies outside the object, E is assigned an outer confidence factor of d. If P is inside, E is assigned an inner confidence factor of d. The resulting inner and the outer confidence factors are both linearly interpolated along the curvilinear coordinate describing the border. This method is illustrated in Fig. 3. Fig. 3a shows part of the border and two outer limit pixels. Fig. 3b shows the assignment of the outer confidence factor to the two closest border pixels, and the linear interpolation between them. Fig. 3c illustrates how the confidence of a pixel is computed with the confidence factors. Evaluating segmentation results. We present a single measure to reflect the severity and type of segmentation error. Let C be a Comets reference and Obj C = {P : Conf (P ) ≥ 0} the set of pixels referenced as the object. Let S be
592 d1
C. Restif outer limit pixels
OC 1 = d1 d2
border pixels a
s1
OC = s2
b
d Conf (P ) = − OC
d1 s2 +d2 s1 s1 +s2
OC 2 = d2
c
P d OC
Fig. 3. Linearisation of confidence factors. a: user input; b: Outer Confidence factors assigned to the border pixels closest to outer limit pixels then interpolated linearly along the curvilinear coordinate; c: Confidence of any outside pixel P , based on the Outer Confidence factor of the closest border pixel.
the set of pixels segmented as the object, and Mis = (S ∪ Obj C ) − (S ∩ Obj C ) the set of mis-labeled pixels. The error of S against C can be defined as: ∞ if S ∩ Obj C = ∅ (1) error (S, C) = Conf (P0 ), P0 = arg maxP ∈Mis {|Conf (P )|} else. Intuitively, if there is no overlap between the segmented object and the reference, they are not related and the error is ∞. Otherwise, mislabeling pixels of low confidence only (near the object boundary) leads to a small error, while mislabeling higher confidence pixels causes a greater error. Moreover, the value of this error can be interpreted in terms of qualitative segmentation error [11], as follows. A shape with error 0 is a perfect match for the reference. An error within -1 and 1 indicates that only pixels of low confidence were mislabeled: the segmentation can be considered correct. An error greater than 1 shows that object pixels were labeled as background, so the shape is over-segmented. An error below -1 is due to background pixels labeled as object: the shape is under-segmented. When a segmented object both over- and under-segment a reference, this single measure keeps the worse error of the two. Should more information be needed, the error measure can be extended, for example by returning a couple of values: the maximum positive and the minimum negative confidence of the mis-labelled pixels. This reflects the severity of the over- and under-segmentation of each object, with the same straightforward interpretation as above. Combining multiple references. As explained in the introduction, it is sometimes necessary to merge multiple expert references into a single entity before evaluating a segmentation result. Under the Comets framework, references can be weighted with their local confidences, by simply averaging the confidence values of each map. Let {Ci }1≤i≤n be the set of n expert segmentations to combine, encoded as Comets. Let Ω = {P, ∃ i:Conf i (P )≥−1}, where Confi (P ) is the confidence of pixel P with respect to Ci . All pixels outside Ω are confidently considered as background by all the experts, so they are discarded for the rest of the construction. The combined confidence map on the domain n Ω is the average of all the experts’ confidence maps: ∀P ∈ Ω, Conf (P ) = n1 i=1 Conf i (P ) . Let {P ∈ Ω : Conf (P ) = 0} be the resulting border pixels, {P ∈ Ω : Conf (P ) = −1} the resulting outer limit pixels, and {P ∈ Ω : Conf (P ) = 1} the resulting inner limit pixels. The confidence factors of the resulting border pixels are computed
a
b
c
d
593 3 2 1 0 -1 -2 -3
Confidence
Revisiting the Evaluation of Segmentation Results
Fig. 4. Combining multiple segmentations. a: original image; b: input from expert 1; c: input from expert 2; d: resulting average Comets.
as detailed above. An example of reference combination is shown in Fig. 4. The zones of disagreement receive a lower confidence, in absolute value, than the zones of agreement, allowing the segmentation errors there to be less penalised. Comparison and tuning of segmentation methods. Quantitative evaluation allows the comparison of different segmentation methods, and different sets of parameters for each method. To illustrate the comparatively straightforward interpretation of the Comets measure defined in Eq. (1), we use three segmentation methods on the same data set of cell images, based on local thresholding [12], watershed and active contours [13]. The results for the first two are shown in the graph of Fig. 5a, as the number of objects (y axis) having an error between -10 and 10 (x axis). The orange rectangle shows the acceptable error zone between -1 and 1. The watershed, in blue, has a high number of errors between 1 and 3, reflecting over-segmentation, and on this data set the thresholding technique performs better than the watershed. In Fig. 5b, four sets of parameters for the active contours are compared. Set 1, in blue, has the higher proportion of correctly segmented shapes (in the orange zone), and is better than the others on the data set used. Although these results can be analyzed in more detailed, it is beyond the scope of this article.
3
Discussion
As detailed in [2], an evaluation method should be designed according to a segmentation method, which makes it difficult to compare evaluation methods in general. In this section we examine their differing principles. Tools such as STAPLE [4] and VALMET [3] are designed for 3D data sets, where objects occupy the major part of the image. This assumption does not always hold for 2D images, especially from microscopy. Also, blur and multiple nearby objects can cause local variations of ambiguity. Two important consequences for evaluation are that some pixels may be less important to segment correctly than others, and those are not merely characterised by their distances to the object’s border, as illustrated in Fig. 1. The former observation lowers the relevance of area-based evaluation measures, and the latter, that of distance-based measures. Our approach is to encode these variations within the references, to simplify the choice and interpretation of evaluation measures. To the best of our knowledge, the closest work is from Jin et al. [5] who define multi-valued references as the average of binary masks, and use these values to weight the measure. However, the
segmentation method: thresholding watershed
Number of objects
a
C. Restif
-10
-5
-1
1
5 10 Comets error
b
active contours parameter values:
Number of objects
594
-10
set set set set
-5
-1
1
1 2 3 4
5 10 Comets error
Fig. 5. Use of the single Comets error measure to (a) compare segmentation methods and (b) tune one method’s parameters. See Section 2 for details.
values are discrete, and only the result of reference merging. Also, non-binary values only occur where different experts disagree: this method is not designed to encode blur, only user variability, and requires several references for each object. Our approach is more general from the beginning: the confidence maps use a continuous range of numbers, and so for every reference. We presented a single error measure to reflect the severity and type of segmentation error. With the type of images we use here, such an interpretation is difficult to obtain from single area- or distance-based measures. Although combining multiple measures may improve the interpretation, it also increases the complexity of evaluation. Another approach to reflect the types and severities of errors from evaluation results is presented in [11, 14]. Each segmented object is classified (as over-, under-segmented, noise, etc.) by comparing its relative overlap with a reference, with a threshold value between 50% and 100%. Plotting the number of objects in one category as a function of the threshold chosen, produces a curve reflecting the quality of the segmentation on the whole data set. For comparison or tuning of segmentation methods, the areas below several sets of 2D curves have to be compared. Although this method presents more data, it may not be straightforward to use because of its complexity. With graphs as in Fig. 5, the performance of different methods may be easier to compare.
4
Conclusion
We have presented a novel framework, Comets, that allows: – the encoding of segmentation references with local confidence factors, storing more expert knowledge than a binary mask at low cost; – the combination of multiple references, with confidence-based local weights; – a single discrepancy measure with a straightforward interpretation, both qualitative (e.g. under-, over-segmented) and quantitative (severity). The framework is based on local confidence factors, which are derived from an expert user’s input. The extra input needed, namely the selection of limit pixels, is less demanding that landmark selection, and could be readily added to existing systems. We have illustrated how this framework can be used to compare qualitatively and quantitatively different segmentation methods applied to
Revisiting the Evaluation of Segmentation Results
595
the same set of images, or to tune one method’s parameters. In case more information is needed from the evaluation results, more complex measures can easily be defined, based on the same intuitive interpretation of confidence. Future work include a quantitative comparison with other evaluation measures, which amounts to evaluating different evaluation methods and is beyond the scope of this article. We also consider using non-linear functions of d to define the confidence Conf (P ), in particular using logarithm odds maps [15].
References 1. Zhang, Y.J.: A review of recent evaluation methods for image segmentation. In: Intl. Symposium on Signal Processing and Its Applications, Malaysia, pp. 148–151 (2001) 2. Udupa, J., LaBlanc, V., Schmidt, H., Imielinska, C., Saha, P., Grevera, G., Zhuge, Y., Currie, L., Molholt, P., Jin, Y.: Methodology for evaluating image-segmentation algorithms. In: SPIE Medical Imaging, USA, pp. 266–277 (2002) 3. Gerig, G., Jomier, M., Chakos, M.: Valmet: A new validation tool for assessing and improving 3D object segmentation. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, Springer, Heidelberg (2001) 4. Warfield, S.K., Zou, K.H., Wells, W.M.: STAPLE: An algorithm for the validation of image segmentation. IEEE Trans. on Medical Imaging 23(7), 903–921 (2004) 5. Jin, Y., Imielinska, C., Laine, A.F., Udupa, J.K., Shen, W., Heymsfield, S.B.: Segmentation and evaluation of adipose tissue from whole body MRI scans. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, pp. 635–642. Springer, Heidelberg (2003) 6. Rohlfing, T., Maurer, Jr., C.R.: Shape-based averaging for combination of multiple segmentations. In: MICCAI, vol. 2, pp. 838–845. Palm Springs, USA (2005) 7. Tsai, A., Yezzi, Jr., A., Wells, W., Tempany, C., Tucker, D., Fan, A., Eric Grimson, W., Willsky, A.: A shape-based approach to the segmentation of medical imagery using level sets. IEEE Trans on Medical Imaging 22(2), 137–154 (2003) 8. Davis, J., Goarich, M.: The relationship between precision-recall and ROC curves. In: Intl. Conference on Machine Learning, Pittsburgh, USA, pp. 233–240 (2006) 9. Crum, W., Camara, O., Hill, D.: Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Tr. Med. Img. 25(11), 1451–1461 (2006) 10. Klingensmith, J., Shekhar, R., Vince, D.: Evaluation of 3D segmentation algorithms for the identification of luminal and medial–adventitial borders in intravascular ultrasound images. IEEE Trans. on Medical Imaging 19(10), 996–1011 (2000) 11. Hoover, A., Jean-Baptiste, G., Jiang, X., Flynn, P.J., Bunke, H., Goldgof, D.B., Bowyer, K., Eggert, D.W., Fitzgibbon, A., Fisher, R.B.: An experimental comparison of range image segmentation algorithms. IEEE PAMI 18(7), 673–689 (1996) 12. Restif, C.: Towards safer, faster prenatal genetic tests: Novel unsupervised, automatic androbustmethodsofsegmentationofnucleiandprobes.In:Leonardis, A.,Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 437–450. Springer, Heidelberg (2006) 13. Restif, C., Clocksin, W.: Comparison of segmentation methods for cytometric assay. In: MIUA, London, UK, pp. 153–156 (2004) 14. Min, J., Powell, M., Bowyer, K.: Automated performance evaluation of range image segmentation algorithms. IEEE Trans on Systems, Man, and Cybernetics - Part B: Cybernetics 34(1), 263–271 (2004) 15. Pohl, K., Fisher, J., Shenton, M., McCarley, R., Grimson, E., Kikinis, R., Wells, W.: Logarithms odds maps for shape representation. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 955–963. Springer, Heidelberg (2006)
Error Analysis of Calibration Materials on Dual-Energy Mammography Xuanqin Mou and Xi Chen Institute of Image Processing and Pattern Recognition, Xi’an Jiaotong University Xi’an, Shaanxi, 710049, China [email protected], [email protected]
Abstract. Dual-energy mammography can suppress the contrast between adipose and glandular tissues and improve the detectability of microcalcifications (MCs). In clinical dual-energy mammography, imaging object is human breast, while in calibration measurements, only phantoms of breast-tissue-equivalent material can be used. The composition and density differences between calibration materials and human breast bring the differences of linear attenuation coefficient which lead to the calculation errors in dual-energy imaging. In this paper, the magnitude of MC thickness error from calibration materials has been analyzed using a first-order propagation of error analysis. This analysis shows that the thickness error from calibration materials ranges from dozens to thousands of microns which can not be ignored when carrying out dual-energy calculations. The evaluation of several popular phantoms shows that it is of great importance to adopt the phantom materials approaching human breast most. Keywords: dual-energy, mammography, calibration materials, error analysis.
1
Introduction
Microcalcifications (MCs) are the principal indicator of breast cancer. Thus the visualization and detection of MCs in mammography play a crucial role in reducing the rate of mortality of breast cancer. MCs are composed mainly of calcium compounds. Although MCs are small in size, almost below 1 mm, they have greater x-ray attenuation than the surrounding breast tissue. This makes MCs relatively apparent on homogeneous soft-tissue backgrounds. However, visualization of MCs may be obscured in mammograms by overlapping tissue structures. In dual-energy digital mammography, high- and low-energy images are acquired and synthesized to suppress the contrast between adipose and glandular tissues, improve the detectability of MCs and calculate the thickness of the MC. However, quantitative dual-energy imaging can be influenced by many factors in practical application, such as the selection of high- and low-energy spectra, scatter, quantum noise, DQE (detection quantum efficiency) of detectors and the choice of calibration polynomials, all of which decrease the calculation precise N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 596–603, 2007. c Springer-Verlag Berlin Heidelberg 2007
Error Analysis of Calibration Materials on Dual-Energy Mammography
597
and cause quantitative information inaccurate. Research work in dual-energy imaging has mainly focused on the selection and optimization of high- and lowenergy spectra [1], scatter corrections [2], analysis of detectors [3] and selection of inverse-map functions [4]. In clinical dual-energy mammography, the imaging object is human breast, while in calibration measurements, only phantoms of breast-tissue-equivalent material can be used. Also, the elemental composition ratio of human breast presents a wide range [5].The composition and density differences between the phantom materials and human breast bring the differences of linear attenuation coefficients which lead to the calculation error in dual-energy imaging. However, such error hasn’t been accounted for in the published literatures on dual-energy imaging till now. In this paper, we assess the magnitude of the calculation error introduced by calibration materials using an error propagation analysis. Simulated experiments verify the magnitude of such error in dual-energy mammography.
2 2.1
Theory Physical Model of Dual-Energy Imaging
In mammography, the breast is compressed to the thickness T which is to be composed of adipose tissue (thickness ta ), glandular tissue (thickness tg ) and MC (thickness tc ). As the total breast thickness T is known and the contribution of MCs to the total breast thickness can be ignored, the three unknowns of ta , tg , tc can be expressed as two: glandular ratio g = tg /(ta + tg ) ∼ = tg /T and MC thickness tc . If P0 (E) is the incident x-ray photon fluence of energy E and P (E) is the transmitted fluence, then: P (E) = P0 (E) exp [−μa (E)T − g(μg (E) − μa (E))T − μc (E)tc ]
(1)
μa (E), μg (E) and μc (E) are the linear attenuation coefficients of adipose tissue, glandular tissue and MCs, respectively. In dual-energy imaging calculations, a reference signal Ir is needed to increase the dynamic range of the logarithmic intensity value. Now we define the exposure data f as the log-value of ratio of the transmitted exposure I and reference signal Ir . The low- and high-energy log-value fl (tc , g), fh (tc , g) are measured independently using x-ray beams of different kVps: ⎧ fl = ln(Irl ) ⎪ ⎪ ⎨ − ln{ P0l (E) exp [−μa (E)T −g(μg (E) − μa (E))T − μc (E)tc ] Q(E)dE} f = ln(Irh) ⎪ h ⎪ ⎩ − ln{ P0h (E) exp [−μa (E)T −g(μg (E) − μa (E))T − μc (E)tc ] Q(E)dE} (2) Q(E) is the detector response. The goal of dual-energy mammography is to convert the log-value functions fl (tc , g) and fh (tc , g) to MC thickness tc (fl , fh ) and g(fl , fh ). Polynomials are
598
X. Mou and X. Chen
often used to describe the relationship between (fl , fh ) and (tc , g) in diagnostic dual-energy imaging calculations. In this paper, cubic polynomials are used to describe the inverse-map functions: tc = kc0 + kc1 fl + kc2 fh + kc3 fl2 + kc4 fh2 + +kc5 fl fh + kc6 fl3 + kc7 fh3 (3) g = kg0 + kg1 fl + kg2 fh + kg3 fl2 + kg4 fh2 + +kg5 fl fh + kg6 fl3 + kg7 fh3 Coefficients kij (i = c, g; j = 0, 1, · · · , 7) are approximated numerically by least-squares technique according to calibration data. 2.2
Calculation Errors from Calibration Materials
After necessary correction procedures, the image data (fl , fh ) are free of scatter and noise; systemic calculation bias Δtc of MC thickness still exists in dualenergy imaging. Δtc is composed of two portions: first, it is an approximation that MC thickness and glandular ratio represented by polynomials. Substituting the log-values (fl , fh ) into the calibrated polynomials to calculate MC thickness is an interpolation of calibration data intrinsically which results in fitting error Δtcp . Secondly, materials for calibration always differ from materials imaged. The differences of linear attenuation coefficient between calibration materials and imaged human breast lead to calibration material error Δtcc of MC thickness. Isolated calibration material error Δtcc could not be obtained directly from inverse-map polynomials Eq.3, but can be deduced from dual-energy physical model Eq.2. Physically, the differences between calibration materials and imaged human breast can be represented in Eq.2 where fl and fh are log-values of imaged human breast and μa (E), μg (E) and μc (E) are linear attenuation coefficients of calibration materials. At the energy of E, the partial derivative of tc with respect to μa (E) can be deduced from dual-energy physical model Eq.2 using differential method for implicit function: l h − ∂μ∂f × Sg + ∂μ∂f × Rg ∂tc a (E) a (E) = ∂μa (E) Rc × Sg − Rg × Sc
(4)
where: Rc = ∂fl /∂tc
Rg = ∂fl /∂g
Sc = ∂fh /∂tc
Sg = ∂fh /∂g
The partial derivatives of tc with respect to μg (E) and μc (E) can be obtained in the same way. As we know, the first-order terms of a Taylor’s series expansion can be used for error estimation. And then, the magnitude of calibration material error in MC thickness Δtcc can be approximated using the first-order terms of a Taylor’s series expansion of tc in terms of the variables μi (E)(i = g, a, c): ∂tc ∂tc ∂tc × Δμg (E) + × Δμa (E) + × Δμc (E) ∂μg (E) ∂μa (E) ∂μc (E) E E E (5) where Δμi (E)(i = g, a, c) are the differences of the linear attenuation coefficient between calibration materials and imaged human breast.
Δtcc ≈
Error Analysis of Calibration Materials on Dual-Energy Mammography
3
599
Materials and Methods
The simulation conditions which are in accordance with the clinical equipments are as same as those in papers [2] [4] of Kappadath. The x-ray spectra used for low- and high-energy measurements were 25 kVp and 50 kVp. The spectra data were taken from Handbook of Mammographic X-ray Spectra [6] with resolution of 1 keV. The detector consists of a CsI:Tl converter layer coupled with an aSi:H flatpanel detector. All photons transmitted through the imaged object were assumed to be absorbed completely in the perfectly efficient converter layer [7]. Hammerstein [5] has determined the elemental compositions of glandular and adipose tissues in a human breast. The composition per weight of H, N, and P seems to be well determined; however, those of C and O present a wide range of possible values. Table 1 lists two extreme compositions 1 and 2, one general composition 3. In this paper, the composition per weight of C and O refer to 3 composition 3 without specific notification. The densities equal to 0.93 g/cm 3 for adipose tissue and 1.04 g/cm for glandular tissue. Table 1. Elemental compositions (percentage per weight) of C and O for glandular and adipose tissues Composition 1 Composition 2 Composition 3 Element Adipose Glandular Adipose Glandular Adipose Glandular C 68.1 30.5 51.3 10.8 61.9 18.4 O 18.9 55.6 35.7 75.3 25.1 67.7
Phantom materials were used for calibration. In the simple way, polyethylene (CH2 ) was commonly used for adipose phantom material, PMMA (C5 H8 O2 ) and acrylic (C3 H4 O2 ) were often used for glandular phantom material. Breast-tissueequivalent materials can be available commercially. For example, breast-tissueequivalent materials from CIRS (Computerized Imaging Reference Systems, Inc., Norfolk, VA, USA) are based on Hammerstein and those from RMI (Radiation Measurements Inc., Madison, WI, USA) are based on White [8]. The elemental compositions of phantom materials [9] and human breast are listed in Table 2. In this paper, MCs were assumed to be composed mainly of calcium oxalate 3 (CaC2 O4 ). The density is 2.20 g/cm . Aluminum was used to calibrate MCs. Mass attenuation coefficients of the materials in this paper were computed by XCOM software from NIST [10]. The calibration data consisted of 30 different combinations of MC thicknesses and glandular ratios. Glandular ratio was in five steps 0%, 25%, 50%, 75% and 100% at a fixed total breast thickness 5 cm; MC thickness was in six steps 0, 380, 760, 1140, 1520, 1900 μm. The reference signal Ir was measured through 50% glandular ratio of 5 cm breast phantom.
600
X. Mou and X. Chen
Table 2. Elemental compositions (percentage per weight) of phantom materials and human breast Density Materials (g/cm3 ) For adipose Polyethylene 0.93 CIRS 0.924 RMI 0.92 Human 0.93 For glandular PMMA 1.19 Acrylic 1.19 CIRS 1.04 RMI 0.99 Human 1.04
4 4.1
H
C
Elemental compositions N O F Cl Ca
14.4 11.8 8.4 11.2
85.6 76.0 1.2 9.8 - 1.2 69.1 2.4 16.9 3.1 0.1 61.9 1.7 25.1 -
8.0 5.6 10.9 8.4 10.2
60.0 - 32.0 50.0 - 44.4 70.2 1.2 12.5 68.0 2.3 18.9 18.4 3.2 67.7
-
-
P
Al
0.1
-
1.1 0.6 - 3.5 0.1 2.3 - 0.5 -
Results Calibration Material Errors
After calibration, substitute the log-values (fl , fh ) into the calibrated poly nomials and calculate MC thickness tc , the systemic calculation bias is Δtc = tc − tc . When calibration materials are identical to human breast, the calibration material error Δtcc is zero and the fitting error Δtcp equals Δtc .When calibration materials are phantom materials, Δtc consists of Δtcp and Δtcc , Δtcc can be calculated according to Eq.5. Tables 3 lists Δtcc , Δtcp and Δtc when different calibration materials used where MC is 250 μm and glandular ratio is 70%. In the first three rows, only one calibration material is different from the composition of human breast. Then we can get the MC thickness errors where 1% linear attenuation coefficient differences exist. In the last three rows, phantom materials were adopted. In Table 3, fitting error is only -2 μm.The calibration material errors are −33%(-83 μm), −138%(-346 μm) and 1.2%(3 μm) when 1% linear attenuation coefficient differences exist between the calibration materials for adipose, glandular, MC and human beast, respectively. The errors coming from the calibration materials for glandular and adipose tissues are more than errors from MC calibration material. If CIRS phantoms and aluminum were adopted as calibration materials, the calibration errors of MC thickness are 169%(423 μm). CIRS phantoms are made based on the human breast composition; however, composition differences still exist. In the energy range 10 keV–50 keV, the average linear attenuation coefficient differences are 0.95% between CIRS adipose phantom and breast adipose, 1.14% between CIRS glandular phantom and breast glandular, 0.95% between aluminum and CaC2 O4 .
Error Analysis of Calibration Materials on Dual-Energy Mammography
601
Table 3. MC thickness errors using various calibration materials (tc = 250μm , g = 70%) Calibration materials Adipose Glandular Microcalcification 0.99HM HM1 CaC2 O4 HM 0.99HM CaC2 O4 HM HM 0.99CaC2 O4 CIRS Al CIRS2 RMI Al RMI3 Polyethylene PMMA Al 1 2 3
MC thickness errors (μm ) Δtcc Δtcp Δtc -83 -2 -84 -346 -2 -310 3 -2 -0.3 423 -2 404 -9419 -2 -9163 2066 -2 2134
human breast composition determined by Hammerstein phantom material produced by CIRS phantom material produced by RMI
If RMI phantoms, polyethylene and PMMA were used for calibration, the tremendous calibration material errors make the calculated MC thickness insignificant. 4.2
Calibration Material Errors of Different Breast Compositions
The breast composition of different individuals is variable while the calibration materials are always fixed when dual-energy imaging. The variations in breast composition of different individuals also bring about the MC thickness errors. Table 4 lists the errors of the three compositions listed in Table 1. Calibration materials are CIRS phantoms and aluminum. The MC thickness errors are far apart when calibration materials fixed and breast compositions changing. Table 4. Calibration material errors of different breast compositions (tc = 250μm , g = 70%)
Breast compositions Composition 1 Composition 2 Composition 3
5
MC thickness errors Δtcc Δtcp -182 -6 912 -6 423 -6
(μm ) Δtc -182 905 404
Discussion
In this paper, we analyze the MC thickness calculation errors brought by attenuation differences between calibration materials and human breast. Comparing Δtcc and Δtcp , the fitting error Δtcp is relatively small. The calibration material error Δtcc occupies the dominant position in the systemic thickness bias Δtc , it may be thousands of microns sometimes which makes the calculated MC thickness insignificant.
602
X. Mou and X. Chen
Microcalcification thickness percent error
As can be seen in Table 3, it is impractical to use RMI phantoms or polyethylene, PMMA as calibration materials since they may bring about huge calibration material errors. In contrast, CIRS phantom materials are made based on human breast and approaching the compositions of human breast nearest in the published literatures. However, it will bring about 160% calibration material errors. Figure 1 presents the calibration material percent errors when glandular ratio ranges from 0% to 100%, MC thickness ranges from 0 μm to 1000 μm. The errors are almost positive which means that calibration using CIRS phantoms and aluminum will make the calculated MC thicker and will not cause the MC invisible. These calibration material errors may be ignored when dual-energy mammography is just used for the purpose to enhance the detectability of MCs. The calibration material percent error increases as the glandular ratio increasing, and decreases as the MC thickness increasing.
600
500
400
300
200
100
0 1
0 0.5
500 1000 0
MC thickness [μm]
Glandular ratio
Fig. 1. MC thickness percent errors from calibration materials CIRS phantoms and aluminum
6
Conclusion
Since the attenuation characteristics of calibration materials are not exactly equal to those of human breast and the breast composition of different individuals is variable, MC thickness errors from calibration materials are unavoidable. From the point of imaging physics, this paper investigates the magnitude of calibration material error, ranging from dozens to thousands of microns. The errors from calibration materials can not be ignored when carrying out dual-energy calculations. The simple phantom materials, such as PMMA or polyethylene, are not suitable for calibration in dual-energy mammography in clinic. CIRS phantoms are the better choice. In this paper, we analyze the physical model of dual-energy imaging and derive the expression that quantifies the calibration material error. Calculating
Error Analysis of Calibration Materials on Dual-Energy Mammography
603
based on the physical model is an effective way to analyze errors in dual-energy imaging. Acknowledgments. The project is partially supported by National Nature Science Fund of China (No.60472004 and No.60551003 ), and the fund of the Ministry of Education of China (No. 106143).
References 1. Johns, P.C., Yaffe, M.J.: Theoretical Optimization of Dual-energy X-ray Imaging with Application to Mammography. Med. Phys. 13, 289–296 (1985) 2. Kappadath, S.C., Shaw, C.C.: Dual-energy Digital Mammography for Calcification Imaging: Scatter and Nonuniformity Corrections. Med. Phys. 32, 3395–3408 (2005) 3. Guillemaud, R., Robert-Coutant, C., Darboux, M., Gagein, J.J., Dinten, J.M.: Evaluation of Dual-energy Radiography with a Digital X-ray Detector. Proc. SPIE 4320, 469–478 (2001) 4. Kappadath, S.C., Shaw, C.C.: Dual-energy Digital mammography: Calibration and Inverse-mapping Techniques to Estimate Calcification Thickness and Glandulartissue Ratio. Med. Phys. 30, 1110–1117 (2003) 5. Hammerstein, G.R., Miller, D.W., White, D.R., Masterson, M.E., Woodard, H.Q., Laughlin, J.S.: Absorbed Radiation Dose in Mammography. Radiology 130, 485– 491 (1979) 6. Fewell, T.R., Shuping, R.E.: Handbook of Mammographic X-ray Spectra. HEW Publication (FDA), Washington. D.C. (1978) 7. Lemacks, M.R., Kappadath, S.C., Shaw, C.C., Liu, X., Whitman, G.: A Dual-energy Subtraction Technique for Microcalcification Imaging in Digital Mammography-A Signal-to-noise Analysis. Med. Phys. 29, 1739–1751 (2002) 8. White, D.R.: The Formulation of Tissue Substitute Materials Using Basic Interaction Data. Phys. Med. Biol. 22, 889–899 (1977) 9. Byng, J.W., Mainprize, J.G., Yaffey, M.J.: X-ray Characterization of Breast Phantom Materials. Phys. Med. Biol. 43, 1367–1377 (1998) 10. XCOM, http://physics.nist.gov/PhysRefdata/Xcom/Text/XCOM.html
A MR Compatible Mechatronic System to Facilitate Magic Angle Experiments in Vivo Haytham Elhawary1, Aleksandar Zivanovic1, Marc Rea1,2, Zion Tsz Ho Tse1, Donald McRobbie2, Ian Young3, Martyn Paley4, Brian Davies1, and Michael Lampérth1 1
Mechanical Engineering Department, Imperial College London, South Kensington Cam-pus, SW7 2AZ, London, UK {h.elhawary, a.zivanovic, m.rea, t.tse06, b.davies, m.lamperth}@imperial.ac.uk 2 Faculty of Medicine, Clinical Sciences Centre, Charing Cross Hospital, Imperial College London, W6 8RF, London, UK 3 Electrical Engineering Department, Imperial College London 4 Academic Radiology, Sheffield University, Sheffield, UK
Abstract. When imaging tendons and cartilage in a MRI scanner, an increase in signal intensity is observed when they are oriented at 55 degrees with respect to Bo (the “magic angle”). There is a clear clinical importance for considering this effect as part of the diagnosis of orthopaedic and other injury. Experimental studies of this phenomenon have been made harder by practical difficulties of tissue positioning and orientation in the confined environment of cylindrical scanners. An MRI compatible mechatronic system has been developed to position a variety of limbs inside the field of view of the scanner, to be used as a diagnostic and research tool. It is actuated with a novel pneumatic motor comprised of a heavily geared down air turbine, and is controlled in a closed loop using standard optical encoders. MR compatibility is demonstrated as well as the results of preliminary trials used to image the Achilles tendon of human volunteers at different orientations. A 4 to 13 fold increase in signal at the tendon is observed at the magic angle. Keywords: MRI, magic angle, MR compatible robotics, MRI compatibility, medical robotics.
1 Introduction With conventional MR sequences, normal tendons exhibit very little or no signal intensity due to being comprised predominantly of dense collagen with a highly ordered structure [1, 2]. As a result of this structure, dipole-dipole interactions between nuclei are greatly enhanced which leads to a rapid loss of signal directly after excitation displaying short T2 times. Therefore, very little or no signal is detectable with the Echo Time (TE) available on most clinical systems [2]. As indicated in [1] the dipolar interaction between two nuclei is proportional to a factor of 3 cos 2 θ − 1 where θ is the angle between the direction along which the spins N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 604–611, 2007. © Springer-Verlag Berlin Heidelberg 2007
A MR Compatible Mechatronic System
605
are coupled in the structure in which they are held (in this case the tendon), and the direction of the main magnetic field B0. This dipolar interaction disappears when the above factor becomes zero, which is satisfied for θ = 54.74°. This value is generally approximated to 55 degrees and termed the “magic angle” [1], and the phenomenon of increased signal intensity at the magic angle has been coined the “magic angle effect” [3]. It is therefore important to consider the orientation of tissue with respect to the main field when observing increased signal intensity in tendons. Recent studies have demonstrated the clinical and functional importance of exploiting this dipolar coupling effect in cartilage and tendons as part of the evaluation of orthopaedic and other injury [2, 4-6]. To those unfamiliar with this effect, these signal variations may indicate disease, rather than an imaging artifact [7]. The work described in the body of literature documenting the magic angle phenomenon was made very much harder by the practical difficulties of placing a tissue at a specific angle to B0 in the confined environment of a closed bore MRI system. An actuated system introduced inside the scanner bore which can move the anatomy to a desired orientation would make the process quicker, more accurate and reliable and would allow the position of the tissue to be modified remotely without patient repositioning or scanner bed movement. These advantages would justify the increased cost of such a system with respects to inflexible manual jigs and fixtures. This paper describes the development of a new MRI compatible system designed to assist magic angle related experiments in vivo, as a research and diagnostic tool. The intense restrictions imposed by the MRI environment on mechatronic systems and the high forces required to position a limb encouraged the development of a novel pneumatic rotary motor. The system has been shown to work reliably and the results of preclinical trials with human volunteers are presented, showing the increase in signal intensity in the Achilles tendon at and around the magic angle.
2 Description of Limb Positioning System 2.1 System Specifications The objective of this work is to develop a system which is capable of positioning a variety of target tissues relative to the main field inside a MRI scanner bore. The following specifications were set: (i) the system will be located inside the imaging volume of a closed bore 1.5T or 3T MRI scanner, and should be independent of the scanner make and model; (ii) due to spatial constraints it will be able to position legs from the knee down and arms from the elbow down; (iii) its range of motion should allow orientation change by at least 60 degrees with respect to the main field B0 while still ensuring patient comfort; (iv) the system should be controlled remotely via an intuitive graphical user interface; (v) it should not be affected by the fields from the scanner and must not deteriorate the MR images. 2.2 Actuation Method MRI imposes severe restrictions on any mechatronic device introduced in the MR environment [8]. The magnetic fields used by the scanner require the selection of suitable materials, actuators and sensors [9, 10]. Piezoceramic motors have been the most
606
H. Elhawary et al.
common substitute for regular DC motors, and are used in a number of systems [1113]. The high forces required to move human limbs (especially the leg) eliminated the use of piezoceramic actuators in this application. A novel pneumatic air motor system was developed, where air is introduced through a nozzle onto the blades of a turbine rotor, as shown in Fig. 1(a). The output of the rotor is transmitted through a gear train (ratio 4616:1) to produce an adequate speed and torque at the output shaft. The gearbox contains plastic gears (Delrin®), aluminium shafts and plastic bearings with glass balls. Both the turbine rotor and the gear box casing (Fig. 1b and c) are manufactured using an epoxy-resin rapid prototyping technique. The operating pressure was chosen to be 1 bar for safety. The motor presents a maximum speed of 6rpm and can output a torque of up to 0.35Nm (using a flow rate of 40 litres/min). The torque is enough to move a subject's limbs as long as he/she is cooperative. (a)
(b)
(c)
Fig. 1. (a) CAD representation of the operating principle of the pneumatic motor; a flow of air at 1 bar is introduced through a nozzle to move a turbine rotor. Two inlets allow bidirectional motion. (b) and (c) Photographs of turbine, rotor and gear box: 75x75x50mm.
2.3 System Hardware The system consists of a 1 DOF platform as can be seen in Fig 2(a). The patient’s leg is strapped to the platform frame, which is then moved by the air motor via a rack and pinion through the linkage. The slider rail is made of aluminium, and the rest of the hardware is manufactured using Ertalon® and Delrin®, both MR compatible engineering plastics. The base plate is fixed to the scanner bed in one of a number of different initial positions (depending on the size of the patient). The platform can be positioned vertically or horizontally depending on the limb which will be oriented in the scanner. A surface mount optical encoder shown in Fig 2(b) is used to detect the position of the platform (Agilent Technologies AEDR-8000 series). Cables from the encoder are directed into a shielded aluminium box (which acts as a Faraday cage), located inside the scanner room 2m away from the entrance to the bore. Circuits inside the box (powered by alkaline batteries) convert the encoder electrical pulses to fibre optic
A MR Compatible Mechatronic System
607
signals, which are transmitted to the control room through the wave guides to circuits which transform the optic signals back into electrical pulses. A motion controller (National Instruments CompactRio) then decodes the pulses into position, giving the angle of the platform. The controller runs a proportional-integral-derivative (PID) position control algorithm which controls the air supply to the air turbine by pulsewidth-modulating two solenoid valves. The compressed air is supplied by a portable silent compressor in the scanner control room. The position of the platform is specified from a graphical user interface running on a PC inside the control room, developed in LabView (National Instruments), which communicates with the motion controller. A block diagram of the system is presented in Fig 2(c). (a)
(b)
(c)
Fig. 2. (a) A photograph of the one DOF MRI compatible manipulator and its parts, designed for limb positioning inside a closed bore scanner. (b) Surface mount optical encoder on its PCB and MR images showing 20mm maximum artifact produced and the reference object as specified by the ASTM protocol F2119. (c) Block diagram of limb positioning system. The manipulator is located inside the scanner bore with the encoder electronics inside the room. The rest of the system is inside the control room.
To evaluate the accuracy of the system, the position indicated by the encoders was compared to that of an external mechanical tracking arm attached to the platform. It was always positioned to within an accuracy of ±0.5 degrees. Once the limb is located at a desirable position, the image is acquired. As the motor is non-backdrivable, that is, its shaft cannot be moved by an external force, the leg is maintained still while scanning is taking place, not requiring a braking mechanism.
608
H. Elhawary et al.
3 MR Compatibility Tests As the manipulator contains no ferromagnetic or highly paramagnetic materials it will not suffer any forces or torques when introduced into the scanner room. To evaluate MR compatibility two tests were performed: (i) to quantify any image artifacts produced by the system when introduced into the scanner bore, and (ii) to measure the impact the system had on the Signal to Noise Ratio (SNR) of the images. The optical encoders were evaluated to measure any artifacts produced, under the protocol described in the ASTM standard F2119 [14]. The encoder gave a maximum artifact of 20mm as can be seen in Fig. 2(b). This size indicates the minimum distance at which this element must be separated from the region of interest of the MR image. To evaluate the impact of the limb positioning system on the SNR of the MR scans the following images were taken: (i) a control scan of a bottled cylindrical phantom (with a CuSo4 solution) in the magnet isocentre, (ii) the phantom placed next to the manipulator when powered but not actuated, and (iii) the phantom with the manipulator powered and actuated. In each case the SNR was calculated, using the definition given by the following formula [11]:
SNR =
Pcentre , SDnoise
(1)
where Pcentre is the mean signal in the 40x40 pixel region in the centre of the phantom, and SDnoise the standard deviation of the signal in the 40x40 pixel region at the bottom right hand corner of the image. The tests were performed on a 1.5T Siemens Magnetom Vision using a Turbo Spin Echo (TSE) sequence (TR=2000ms, TE=120ms, 5mm slice thickness, 0.5 distance factor, 256 matrix, FOV=300mm, BW=130Hz/px) and a Gradient Echo (GE) sequence Fast Low Angle Shot (FLASH) (TR=22ms, TE=10ms, Flip Angle=80°, 5mm slice thickness, 0.5 distance factor, FOV=300mm, 256 matrix, BW=130Hz/px). In Table 1, the values of the SNR for each sequence are presented along with the variation of SNR with respect to the phantom alone. The results show that the impact of the system on the SNR of the images is virtually negligible, and can therefore be considered fully MR-compatible under the reported conditions. Table 1. Results of SNR measurement test with the system powered and actuated when compared with only a phantom in the scanner
Test Phantom alone System Powered System Actuated
SNR TSE 81.5 80.1 81
Variation (%) 1.71 0.6
SNR GE 25.4 25.9 24.5
Variation (%) -1.04 3.54
4 Preliminary Trials with Human Volunteers Tests were performed on three human volunteers (on the Achilles tendon of the right leg). This tendon was chosen due to its size and ease of location on the body and the
A MR Compatible Mechatronic System
609
MR images. Initial trials were undertaken in a 1.5T Siemens Magnetom Vision scanner. The unit was mounted vertically in the scanner and left of the centre of the machine (see Fig. 3). The subject was positioned in the scanner bore in the left lateral decubitus position. The right leg was strapped to the platform and a flexible surface coil was wrapped around the Achilles tendon to improve imaging. The desirable image pattern is one in which the plane of the images is constant relative to the target tissue. This means that, in the case of the Achilles tendon, the initial images (when the tendon is approximately parallel to the main field) are transverse, but, as the angle between tendon and field increases, become increasingly nearer sagittal, and the main images of interest become increasingly further from the centre of the machine. In order to scan the same cross sections on the leg as it is oriented at different angles, a passive fiducial marker made from Spenco® [15] was fixed to the subject’s leg at the beginning of the session. This served as a reference location on the leg, from which the planes of the tendon to be imaged were defined. The sequence used in the study was a Spin Echo (TR=300ms, TE=20ms, slice thickness 5mm, 5 slices, FOV=250mm, 256 matrix, space between slices 2.5mm). With the leg in place, the platform was actuated in five degree increments starting at zero and images taken at each angle. Due to the pronounced increase in signal after an angle of 45 degrees, an image was taken every two degrees from 45 to 61 degrees.
Fig. 3. The subject's right leg is strapped to the platform and a surface coil is wrapped around the Achilles tendon before being introduced into the scanner bore for imaging
In each case the system could position the lower leg at the desired angle. In Fig. 4(a) and (b) the magic angle effect can clearly be seen on the two MR images. The first displays a transverse scan of the tendon aligned with the main field and displaying virtually no signal and the second shows the tendon at the magic angle with its corresponding increase in signal. Fig. 4(c) presents a graph displaying the average normalised signal of the Achilles tendon (taken over three contiguous slices) for each human volunteer. To compensate for the fact that the sensitivity to magnetisation of the surface coil changes as the platform is rotated, the signal at the tendon is normalised at each orientation, by dividing it with the signal at the tibia. In this way the increase in normalised signal obtained is solely due to the magic angle effect. The graph shows a small signal in the tendon up to an angle of approximately 45º, after which there is a sharp increase in signal that peaks at 55º and then rapidly reduces again. At 55º the signal intensity varies between subjects, but ranges from a 4 to 13 times
610
H. Elhawary et al.
g (a)
p (c)
(b)
Subject 1 2 3
Signal (0o) 0.33 0.21 0.13
Signal (55o) 1.3 1.4 1.78
Ratio 3.94 6.67 13.69
Fig. 4. (a) Achilles tendon shown at 0° with respect to the main field and (b) at 55°. There is a clear increase in signal intensity in the Achilles tendon. (c) The graph shows the average values of normalised signal for each subject. The signal in the tendon peaks when the dipole interaction is zero, showing a 4 to 14 fold increase in SNR.
increase than when aligned with Bo. The graph also shows the dipole-dipole interaction: the signal peaks when this interaction is zero.
5 Discussion and Conclusions This paper has described an MRI compatible mechatronic system actuated with a novel pneumatic motor for positioning of a subject’s limb inside the field of view of a MRI scanner. It is envisaged to be used as a research and diagnostic tool to aid orthopaedic diagnosis and other magic angle related studies independent of the make and model of the MRI scanner. The conditions under which MRI compatibility is assured have been indicated and preliminary trials with human volunteers prove its functionality showing a 4 to 13 fold increase in signal at the tendon. However the system presents some limitations under its present implementation due to its solitary degree of freedom. Current development includes adding additional degrees of freedom to overcome these shortcomings.
A MR Compatible Mechatronic System
611
References [1] Erickson, S.J., Prost, R.W., Timins, M.E.: The ”magic angle” effect: background physics and clinical relevance. Radiology 188, 23–25 (1993) [2] Marshall, H., Howarth, C., Larkman, D.J., Herlihy, A.H., Oatridge, A., Bydder, G.M.: Contrast-enhanced magic-angle MR imaging of the Achilles tendon. AJR Am. J. Roentgenol. 179, 187–192 (2002) [3] Xia, Y.: Magic-angle effect in magnetic resonance imaging of articular cartilage: a review. Investigative radiology 35, 602–621 (2000) [4] Oatridge, A., Herlihy, A.H., Thomas, R.W., Wallace, A.L., Curati, W.L., Hajnal, J.V., Bydder, G.M.: Magnetic resonance: magic angle imaging of the Achilles tendon. The Lancet 358, 1610–1611 (2001) [5] Chappell, K.E., Robson, M.D., Stonebridge-Foster, A., Glover, A., Allsop, J.M., Williams, A.D., Herlihy, A.H., Moss, J., Gishen, P., Bydder, G.M.: Magic angle effects in MR neurography. AJNR Am. J. Neuroradiol. 25, 431–440 (2004) [6] Fullerton, G.D., Cameron, I.L., Ord, V.A.: Orientation of tendons in the magnetic field and its effect on T2 relaxation times. Radiology 155, 433–435 (1985) [7] Bydder, M., Rahal, A., Fullerton, G.D., Bydder, G.M.: The magic angle effect: a source of artifact, determinant of image contrast, and technique for imaging. J. Magn. Reson. Imaging 25, 290–300 (2007) [8] FDA, A Primer on Medical Device Interactions with Magnetic Resonance Imaging Systems, World Wide Web: http://www.fda.gov/cdrh/ode/primerf6.html [9] Elhawary, H., Zivanovic, A., Davies, B., Lamperth, M.: A Review of Magnetic Resonance Imaging Compatible Manipulators in Surgery. In: Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine, vol. 220, pp. 413–424 (2006) [10] Roger, G., Yamamoto, A., Chapuis, D., Dovat, L., Bleuler, H., Burdet, E.: Actuation methods for applications in MR environments. Concepts in Magnetic Resonance Part B: Magnetic Resonance Engineering 29B, 191–209 (2006) [11] Chinzei, K., Hata, N., Jolesz, F.A., Kikinis, R.: Surgical Assist Robot for the Active Navigation in the Intraoperative MRI: Hardware Design Issues. In: presented at Proc. 2000 IEEE/RSJ International Conf. Intelligent Robots and Systems, Maui, HI, USA (2000) [12] Larson, B.T., Erdman, A.G., Tsekos, N.V., Yacoub, E., Tsekos, P.V., Koutlas, I.G.: Design of an MRI-Compatible Robotic Stereotactic Device for Minimally Invasive Interventions in the Breast. Journal of Biomechanical Engineering - Transactions of the ASME 126, 458–465 (2004) [13] Tsekos, N.V., Ozcan, A., Christoforou, E.: A prototype manipulator for magnetic resonance-guided interventions inside standard cylindrical magnetic resonance imaging scanners. J. Biomech. Eng. 127, 972–980 (2005) [14] ASTM, Standard Test Method for Evaluation of MR Image Artifacts from Passive Implants, Designation F2110-01 (2001) [15] Spenco Healthcare International Limited, World Wide Web: http://www.spencohealthcare.co.uk/
Variational Guidewire Tracking Using Phase Congruency Greg Slabaugh1 , Koon Kong2 , Gozde Unal1 , and Tong Fang1 1
2
Intelligent Vision and Reasoning Department, Siemens Corporate Research, USA {greg.slabaugh,gozde.unal,tong.fang}@siemens.com School of Electrical and Comp. Engineering, Georgia Institute of Technology, USA [email protected]
Abstract. We present a novel method to track a guidewire in cardiac x-ray video. Using variational calculus, we derive differential equations that deform a spline, subject to intrinsic and extrinsic forces, so that it matches the image data, remains smooth, and preserves an a priori length. We analytically derive these equations from first principles, and show how they include tangential terms, which we include in our model. To address the poor contrast often observed in x-ray video, we propose using phase congruency as an image-based feature. Experimental results demonstrate the success of the method in tracking guidewires in low contrast x-ray video.
1
Introduction
Endovascular interventions are becoming increasingly more common in the treatment of arterial disease like atherosclerosis. In such procedures, a guidewire is placed in groin and advanced towards the heart. Critical to this process is accurate placement of the guidewire with respect to the vascular anatomy, which is typically imaged using x-ray fluoroscopy. However, placement is often difficult due to complexity of the vasculature, patient motion, and the low signal to noise ratio of the video that results from trying to minimize the radiation exposure to the patient. In this paper, we present a method to track a guidewire in cardiac x-ray video. Tracking the guidewire has many applications, including interventional navigation and adaptive image enhancement of the guidewire. While there exist numerous papers on the subject of line detection in noisy images, there is relatively little literature devoted to the more specific topic of guidewire tracking, which is perhaps unexpected given the clinical importance of endovascular interventions. Palti-Wasserman et al. [1] were to our knowledge the first to consider the problem; in their approach the guidewire is modeled using a second degree polynomial extracted from consecutive frames of a video. Recent work by Baert et al. [2] models the guidewire using a spline, and then optimizes the spline position numerically using Powell’s direction set method. The optimization is designed to deform the spline so that it has minimal length, remains smooth, and matches the guidewire position in the image. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 612–619, 2007. c Springer-Verlag Berlin Heidelberg 2007
Variational Guidewire Tracking Using Phase Congruency
613
Several authors have considered evolving a contour subject to various intrinsic and image-based forces; snakes [3] [4] being a classic example. When the contour is represented using a spline interpolated from discrete control points, the problem then becomes one of evolving the control points, which in turn evolve the spline. Typically, authors consider closed contours [5] or open contours with boundary conditions [6], such as forcing the endpoints to be fixed or have mirror symmetry. However, such boundary conditions are not suitable for guidewire tracking. When evolving closed contours, tangential forces on the contour are typically ignored as they do not change the contour geometry. However, these terms have an effect on an open contour and therefore should be addressed. 1.1
Our Contribution
Our work is inspired by [1] and [2] but has significant differences. We also model the geometry of the guidewire as a spline, defined as a smooth curve that interpolates control points. We compose our energy functional of three terms, one designed to force the guidewire to match the edge-detected pixels in the image, another to keep the spline smooth, and a third designed to retain an a priori length (as opposed to the minimal length of [2]). Unlike [1] [2], we analytically derive, using variational calculus, differential equations that describe the flow of the spline to minimize the energy functional and achieve a locally optimal position of the spline on the current frame. By sampling the spline sufficiently, we obtain an over-determined system of linear equations which can be inverted to relate the motion of the spline to that of the control points. This then gives us a simple mechanism to evolve the control points of an open spline without enforcing any unnatural boundary conditions. We derive the differential equations from an energy formulation, and from this we see tangential terms that are typically ignored for closed contour evolutions. For the image-based terms used to align the spline with the guidewire, we propose the use of phase-congruency [7], which is able to accurately detect guidewire pixels in x-ray images with low contrast.
2
Variational Formulation
In this section we derive the evolution of the spline using variational calculus. We represent the guidewire as an open curve C = [x(s), y(s)] in the image plane, where s ∈ [0, L] is an arc length parameter and L is the contour length. We begin by defining the energy E of the curve as E(C) = w1 · data + w2 · smoothness + w3 · length constraint = w1 F ds + w2 ds + w3 ( ds − L0 )2 , C
C
(1)
C
where w1 , w2 , and w3 are constants used to weigh the terms relative to one another. The data term will require that the spline adhere to the features detected from the x-ray image and is based on F (x, y), which is a conformal factor computed from a feature map of the x-ray image. The smoothness term will require
614
G. Slabaugh et al.
the curve to be smooth, and the length constraint term penalizes the curve’s length from deviating from an a priori length L0 . 2.1
Regularization
Let us consider the second term of Equation 1 first. Taking the partial derivative with respect to an independent time parameter t, and reparameterizing with p ∈ [0, 1], gives 1 1 1 ∂ ∂ ∂ w2 < Cp , Cp > 2 dp ds = w2 ||Cp ||dp = w2 ∂t ∂t ∂t 0 0 C 1 1 < Cpt , Cp > dp = w2 < Cpt , T > dp = w2 ||Cp || 0 0 L 1 < Ct , κN > ds, (2) = w2 < Ct , T > |p=0 − w2 0
where κ is the curvature of the curve, T is the tangent, N is the normal, and <, > denotes an inner product. Thus, to minimize the length of the curve, we evolve the curve using ∂C = w2 κN + w2 δ(p)T − w2 δ(p − 1)T ∂t
(3)
where δ is a delta function. Intuitively, this flow indicates that to minimize the curve’s length, one can simply move each point in the normal direction weighted by curvature, and at the endpoints of the curve, move the curve inwards along the tangent. Note that these tangential terms cancel in closed contour evolutions, for which p = p − 1. Evolving the contour with this differential equation will cause it to both shrink and become smoother. Later, our length preserving term will force the contour to retain an a priori length, so this term will effectively regularize the contour, keeping it smooth. 2.2
Geodesic Flow
Now let us consider the first term of Equation 1. Taking the partial derivative with respect to an independent time parameter t, gives ∂ w1 ∂t
1
F ds = w1 0
C
∂ F ||Cp ||dp = w1 ∂t
1
F
0
1
F < Cpt , T > dp + w1
= w1 0
= w1 F < Ct , T > |1p=0 − w1
0
1 ∂ ∂F < Cp , Cp > 2 + ||Cp || dp ∂t ∂t
L
< ∇F, Ct > ds
0 L
F < Ct , κN > ds + w1
L
< ∇F, Ct > ds
0
(4)
Thus, to minimize the conformally weighted length of the curve, we get ∂C = w1 F κN − w1 ∇F + w1 δ(p)F T − w1 δ(p − 1)F T ∂t
(5)
Variational Guidewire Tracking Using Phase Congruency
615
This derivation is similar to that of [4] except we now consider the tangential terms that act on the open contour. 2.3
Length Preserving Flow
Finally, let us consider the last term of Equation 1. Taking the partial derivative with respect to an independent time parameter t, gives ∂ ∂ 2 w3 ( ds − L0 ) = 2w3 ( ds − L0 ) ds (6) ∂t ∂t C C C Using the result from the first term derivation in Section 2.1, we get the curve evolution, ∂C = 2w3 ( ds − L0 ) [κN + δ(p)T − δ(p − 1)T] (7) ∂t C 2.4
Complete Flow
Combining Equations 3, 5, and 7, we get the final curve evolution, ∂C = −w1 ∇F + κ(w2 + w1 F + 2w3 ( ds − L0 ))N ∂t C + δ(p)(w2 + w1 F + 2w3 ( ds − L0 )) + δ(p − 1)(−w2 − w1 F − 2w3 ( ds − L0 )) T C
2.5
C
(8)
Spline Representation
Equation 8 is the contour evolution equation and is independent of the representation of the contour. That is, the equation would apply to any open contour representation, whether it be described as a polyline, implicit curve, using Fourier descriptors, etc. We choose to model the curve as a spline, with control points P = [P1 . . . PN ]T . Our objective is to derive an equation to evolve the control points, which then update the contour to track the guidewire. To do this, we must relate the differential motion of the control points to that of the contour. This depends on the spline representation. We model the contour geometry using a uniform rational B-spline [8]. In this representation, the contour is represented by M segments that interpolate the N = M + 3 control points. In this paper, we set M = 2 and thus N = 5. This results in a curve with enough degrees of freedom to bend and follow the guidewire, but not too many to result in spurious local minima. The jth segment j+3 Bj (p)Pj , is a weighted combination of four control points, as Cj (p) = j where j = 1 · · · M , p ∈ [0, 1], and is a parametrization variable used to sample Bj , which are third order blending functions. We would like to this equation to express Pj as a function of Cj (p) and differentiate, to yield a differential relationship describing how the motion of the curve segment affects the control points. For example, consider the case when we have spline consisting of M = 2
616
G. Slabaugh et al.
segments, corresponding to N = 5 control points, where we sample each segment L = 4 times. This results in the system of equations: ⎡ ⎤ ⎡ ⎤ a(p1 ) b(p1 ) c(p1 ) d(p1 ) 0 C1 (p1 ) ⎢ C1 (p2 ) ⎥ ⎢ a(p2 ) b(p2 ) c(p2 ) d(p2 ) 0 ⎥ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ C1 (p3 ) ⎥ ⎢ a(p3 ) b(p3 ) c(p3 ) d(p3 ) 0 ⎥ P1 ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ C1 (p4 ) ⎥ ⎢ a(p4 ) b(p4 ) c(p4 ) d(p4 ) 0 ⎥ ⎢ P2 ⎥ ⎢ ⎥=⎢ ⎥ ⎢ P3 ⎥ (9) ⎢ C2 (p1 ) ⎥ ⎢ 0 a(p1 ) b(p1 ) c(p1 ) d(p1 ) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎢ C2 (p2 ) ⎥ ⎢ 0 a(p2 ) b(p2 ) c(p2 ) d(p2 ) ⎥ P4 ⎢ ⎥ ⎢ ⎥ ⎣ C2 (p3 ) ⎦ ⎣ 0 a(p3 ) b(p3 ) c(p3 ) d(p3 ) ⎦ P5 C2 (p4 ) 0 a(p4 ) b(p4 ) c(p4 ) d(p4 ) where a(p) = −p3 + 3p2 − 3p + 1, b(p) = 3p3 − 6p2 + 4, c(p) = −3p3 + 3p2 + 3p + 1, and d(p) = p3 are the elements in the blending function. This system of equations takes the form C = BP, where C is a M L x 2 matrix, B is a M L x N matrix, and P is a N x 2 matrix. As long as the total number of samples M L ≥ N , the system is (possibly over-) determined and we can express P as a function of C using the pseudo-inverse, P = (B T B)−1 B T C. Thus, we can write the evolution of the control points simply as ∂P ∂t = (B T B)−1 B T ∂C , or ∂t ∂P = (B T B)−1 B T −w1 ∇F + κ(w2 + w1 F + 2w3 ( ds − L0 ))N ∂t C + δ(p)(w2 + w1 F + 2w3 ( ds − L0 )) + δ(p − 1)(−w2 − w1 F − 2w3 ( ds − L0 )) T . C
C
(10)
We note that the matrix B T B has size N x N , which is typically quite small, so the pseudo-inverse can be computed efficiently. Equation 10 gives us a differential equation that evolves the control points P in order to reposition the spline C so that it fits the data, remains smooth, and retains an a priori length.
3
Phase Congruency
We base the data term in our technique on a function F that is computed using phase-congruency for guidewire detection. As discussed in [7], phase congruency is a dimensionless measure of feature significance that is less sensitive to contrast than differential techniques (gradient, Hessian, etc.). At an edge in an image, phase information is locally congruent, and the degree of this congruency can serve as an edge detector response. In [7], phase congruency is computed over multiple scales and orientations via a wavelet technique using log Gabor functions. Through experimentation, we use three scales, 6 orientations, σ = 0.7 for the Gaussian in the log-Gabor function, and k = 7.5 standard deviations for the noise energy threshold (see [7] for an explanation). In Figure 1, we provide an comparison of the phase congruency detector vs. the edgemaps produced using the Mexican-hat operator of [1] and the coherence-enhancing diffusion followed
Variational Guidewire Tracking Using Phase Congruency
(a)
(b)
(c)
617
(d)
Fig. 1. An example image (a) and its corresponding feature map computed using the technique of [1] (b), [2] (c), and phase congruency (d). Notice the stronger contrast of the guidewire in the phase congruency result.
by Hessian computations of [2]. We observe that the phase congruency result has a sharp edge response and is not plagued by clutter. Let E be the edge response computed by phase congruency. We then form 1 F = 1+E 2 . We compute ∇F using a smoothed derivative operator, and then run a GVF diffusion [9] on ∇F to increase the capture range of the gradient field.
4
Experimental Results
We manually initialize the contour on the first frame of the sequence by clicking five control points, and the length of this contour is the a priori length L0 . We then execute Equation 10 using w1 = 1, w2 = 1, w3 = 0.1, for a fixed number of iterations, 300 in our case. In our C++ implementation, 300 iterations typically takes approximately 175 milliseconds to run. An example is provided in Figure 2. Note that the spline updates its position to lock on to the guidewire detected using phase congruency, and also expand its length to match a priori length (105 pixels in this case) while remaining smooth. Upon convergence, we advance to the next frame, and use the cross-correlation technique of [2] to shift the spline to a new location, providing the initialization for the next frame. We do this because when the frame rate is low, the spline can jump from one location to the next on the image. The method then continues this process for the entire video. We have tested our technique on 158 images coming from three different video sequences. We show the results of our tracking algorithm in Figure 3 for three consecutive frames from each sequence. The tracking is successful, matching the spline position with the guidewire in each image. The spline remains smooth but follows in the data, and the a priori length constraint discourages the spline from shrinking or expanding from known length (100, 105, 95 pixels, respectively, for these three examples). Once initialized, the tracking is automatic. However, we have observed a small number of failures in the tracking, mainly due to motion blur and clutter. In the motion blur case, the guidewire is not distinctly visible, and the phasecongruency image does not provide sufficient information. In the clutter case,
618
G. Slabaugh et al.
(a)
(b)
(c)
(d)
Fig. 2. Spline evolution (zoomed in to a region of interest). Given the initial spline (a), we evolve the control points, producing an intermediate result (b) and the final result upon convergence (c). We find it helpful to periodically recompute the control points from the contour, as shown in (d) using the spline from (c). We show the contour evolution overlaid on the phase congruency image in this figure. The spline is shown in green and the control points in red. Some control points have moved off the region of interest and not visualized in (b) and (c).
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Fig. 3. Results of the proposed guidewire tracking algorithm on three different endovascular videos. We show the original first image (left) and tracking result (right three images) for three consecutive frames. In each frame the contour is overlaid on top of the original image.
Variational Guidewire Tracking Using Phase Congruency
619
edges corresponding to anatomic structures or surgical instruments near the guidewire look like the guidewire, and the tracker starts to follow these false edges instead. Out of the 158 frames, the tracker failed for 10 frames, for an accuracy of 93%. These results are comparable to [2]; however, a detailed comparison would require running both algorithms on the same data, a subject left for future work. In the case of a tracking failure, the user may reinitialize the contour.
5
Conclusion
This paper presented a variational approach to guidewire tracking in cardiac xray video. We derived analytic equations to evolve the control points of a spline in order for the spline to match the image data, remain smooth, and preserve its length, and demonstrated the method’s usefulness by tracking guidewires in several endovascular x-ray videos. While further experimentation is ongoing, our current experiments demonstrate much promise for this method to accurately track the guidewire, even in low-contrast videos. Further work on this method will include investigations into increasing robustness and detailed comparisons with alternate techniques, as well as modeling the spline as a 3D curve projected into multiple x-ray images. We believe the theory behind this work is quite general and applicable to numerous spline optimization problems.
References 1. Palti-Wasserman, D., Brukstein, A., Beyar, R.: Identifying and Tracking a Guide Wire in the Coronary Arteries During Angioplasty from X-Ray Images. IEEE. Trans. on Biomedical Engineering 44(2), 152–164 (1997) 2. Baert, S., Viergever, M., Niessen, W.: Guide-Wire Tracking During Endovascular Interventions. IEEE. Trans. on Medical Imaging 22(8), 965–972 (2003) 3. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. Intl. Journal of Computer Vision 1(4), 321–331 (1987) 4. Casselles, V., Kimmel, R., Saprio, G.: Geodesic Active Contours. The Intl. Journal of Computer Vision 22(1), 61–79 (1997) 5. Cremers, D., Tischhauser, F., Weickert, J., Schnorr, C.: Diffusion Snakes: Introducing Statistical Shape Knowledge into the Mumford-Shah Functional. International Journal of Computer Vision 50(3), 295–313 (2002) 6. Brigger, P., Hoeg, J., Unser, M.: B-Spline Snakes: A Flexible Tool for Parametric Contour Detection. IEEE Trans. on Image Processing 9(9), 1484–1496 (2000) 7. Kovesi, P.: Image Features From Phase Congruency. Videre: A Journal of Computer Vision Research 1(3) (1999) 8. Foley, J., van Dam, A., Feiner, S., Hughes, J.: Computer Graphics: Principles and Practice, 2nd edn. Addison-Wesley, Reading (1996) 9. Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process 7(3), 359–369 (1998)
Endoscopic Navigation for Minimally Invasive Suturing Christian Wengert1 , Lukas Bossard1, Armin H¨ aberling1 , Charles Baur2 , 1 G´ abor Sz´ekely , and Philippe C. Cattin1 1
Computer Vision Laboratory, ETH Zurich, 8092 Zurich, Switzerland [email protected] 2 Virtual Reality and Active Interfaces, EPFL Lausanne, 1015 Lausanne, Switzerland
Abstract. Manipulating small objects such as needles, screws or plates inside the human body during minimally invasive surgery can be very difficult for less experienced surgeons, due to the loss of 3D depth perception. This paper presents an approach for tracking a suturing needle using a standard endoscope. The resulting pose information of the needle is then used to generate artificial 3D cues on the 2D screen to optimally support surgeons during tissue suturing. Additionally, if an external tracking device is provided to report the endoscope’s position, the suturing needle can be tracked in a hybrid fashion with sub-millimeter accuracy. Finally, a visual navigation aid can be incorporated, if a 3D surface is intraoperatively reconstructed from video or registered from preoperative imaging.
1
Introduction
Surgical interventions are increasingly being performed in a minimally invasive fashion. The main advantages of causing less trauma to the patient, smaller infection rates, and faster recovery are well documented. Minimally invasive surgeries (MIS), however, require a lot of experience and pronounced skills from the surgeon. One of the main reasons lies in the almost complete loss of depth perception that makes the manipulation of small objects such as needles, screws, or plates very difficult. This paper presents a framework for tracking small objects and estimating their pose using a standard, monocular endoscope. As one example application, the tracking and pose estimation of a suturing needle is presented. The resulting pose information is then used for augmenting the surgeon’s view by generating artificial 3D cues onto the 2D display and hence providing additional support during suturing. Such augmentation techniques are especially helpful for trainees and less experienced surgeons. The presented approach is general and can be applied to any object with known geometry by performing minor adaptations of the pose estimation module. Traditional setups usually cannot support the navigation process, as it is often impossible to track small objects due to the missing line of sight and/or because their dimensions or function does not allow marker attachment. While magnetic markers of small dimensions exist, the presence of metallic objects N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 620–627, 2007. c Springer-Verlag Berlin Heidelberg 2007
Endoscopic Navigation for Minimally Invasive Suturing
621
in their vicinity severely deteriorates their accuracy. Our method can offer a solution by relying on an external tracking device reporting the endoscope’s pose. Based on this information the needle can be fully tracked in a hybrid fashion in the world coordinate system. Finally, if pre- or intra-operative data are registered to the system, the possible interaction between the tracked object and the registered data can also be visualized and used as a navigation aid. The methods proposed in this paper rely on the combination of object tracking, pose estimation and view augmentation. Color-coded instrument tracking has been presented in [1] for positioning surgical instruments in a robotized laparoscopic surgical environment. Another color coded approach for tracking laparoscopic instruments was proposed in [2]. The pose computation from objects in 2D images given 2D to 3D point correspondences has been explored in [3], using textures in [4] and from parameterized geometry in [5]. Pose computation from different geometric entities such as ellipses, lines, points and their combinations was presented in [6]. An endoscopic hybrid tracking approach used for spine surgery has been presented in [7]. Passive fiducials are attached to the vertebrae and their pose detected by a tracked endoscope. However, the registration of the fiducials with the target anatomy adds another step to the navigation procedure. In [8], fiducials are attached to the tools and their 3D pose measured by a head-mounted monocular camera. The results are displayed using a HMD. Objects without a direct line of sight to the camera cannot be handled using this system. Other approaches for tracking and pose estimation of surgical instruments include structured light endoscopes [9] and ultrasound probes [10]. An approach for facilitating the depth perception was presented in [11], where the invisible shadows of the tools where artificially rendered into the scene. We propose a system for estimating the pose of very small colored objects in order to augment the surgeon’s view by relying only on the standard environment of endoscopic surgeries. No modifications of the instruments involved (like marker attachment) is needed, only the surface color of the object to be identified has to be adjusted. This approach enables tracking of these objects in a hybrid fashion and enhances existing navigation systems.
2
Methods
For all experiments a 10 mm radial distortion corrected endoscope1 with an oblique viewing angle of 25◦ was used. In order to avoid interlacing artifacts, a progressive frame, color CCD camera with a resolution of 800 × 600 pixels and 30 fps has been incorporated. The camera can be calibrated pre-operatively by the surgeon without requiring technical assistance [12]. As the endoscope/camera combination provides a depth of field in the range of 3 − 7 cm, the focal length of the camera can be kept constant during the entire procedure. This allows to avoid recalibration of the system during surgery. 1
Panoview Plus, Richard Wolf GmbH, http://www.richard-wolf.com/
622
C. Wengert et al.
The surgical needle2 has a nearly half circular shape with a radius r = 8.5 mm and a needle diameter d of 1 mm, see Fig. 4b. Tracking such a needle in a surgical environment is a challenging task due to the moving camera, the cluttered background, and the unconstrained motion. In addition, partial occlusions of the needle occur during the suturing process and while being held by the gripper. Finally, the coaxial illumination yields strong specularities and an inhomogeneous light distribution. The needle was painted using a light matt green color finish, allowing robustly dealing with the problems listed above. On startup, the system enters an interactive phase, where the needle first needs to pass through the central search area in the image. Once the needle is found, the tracking and pose estimation starts. Due to the inherently ambiguous solution of the orientation of the needle, the surgeon quickly needs to verify whether the correct orientation has been chosen and manually switch to the correct solution if necessary. The whole algorithm is depicted in Fig. 1 and presented in more detail in the following paragraph.
Fig. 1. Needle detection algorithm after interactive initialization
A color segmentation is applied to the search area, selecting all green pixels. The RGB image is converted to the HSI-color space and all pixels within the hue range from 65◦ ≤ H ≤ 160◦ are selected. Additional constraints for the saturation S ≥ 0.15 and intensity I ≥ 0.06 reduce the specularities and help to remove dark areas. All of these values were empirically determined but remained constant through all experiments. The segmented pixels are labeled using a connected component labeling algorithm while small components of less than 50 pixels are discarded. An ellipse is fitted to all remaining components using the T method proposed in [13]. This results in the center of the ellipse ce = [x0 , y0 ] , the major and minor axis a and b, and the tilt angle θ, defined as the angle between the major axis a and the x-axis, as shown in Fig. 2a. The actual parameter set pt = [x0 , y0 , a, b, θ]T is compared to the previously computed set pt−1 . If the new ellipse center is within the bounds defined by the predicted motion and the change of the parameters a, b and θ is below 30 %, the new model is regarded as valid and a new motion vector is computed for adjusting the search area in the next image. Otherwise the found ellipse is discarded, the next image acquired, and the ellipse’s parameters are estimated under the same conditions 2
Atraloc, Ethicon Ltd., http://www.ethicon.com
Endoscopic Navigation for Minimally Invasive Suturing
a)
y y0
b ce
b)
a θ
x
y x0
x
623
n z Image plane with 2D ellipse
(Xc,Yc,Zc) 3D circle
Fig. 2. a) 2D Ellipse parameters ce = [x0 , y0 ]T , a, b, θ; b) projection of a 3D circle, defined by C = [Xc , Yc , Zc ]T , n, to the image plane
as before. From the found 2D ellipse parameters, the corresponding 3D circle is computed using the method from [14]. This results in the location of the circle T center C = [Xc , Yc , Zc ] and the normal n of the plane containing the needle as depicted in Fig. 2b. The computation of the needle pose is, however, inherently ambiguous and results in two distinct solutions for the center (C , C ) and for the normal (n , n ). The correct circle center can be determined by backprojecting both solutions c = P C to the image and choosing the solution yielding the smaller distance to the previously computed ellipse center ce , with P being the projection matrix of the camera containing the intrinsic parameters. In order to choose the correct circle plane nt , the current plane normals (n , n ) are compared to the previously computed normal nt−1 and the solution with the smaller angular difference d = cos−1 (nt · nt−1 ) is used. In cases where the needle passes through a degenerate pose as shown in Fig. 3d, the current normal is selected to move consistently with the prior motion. The methods presented above can be used to compensate the loss of 3D depth perception by rendering artificial orientation cues. This does not require significant adaptations from the surgeon as his working environment remains unchanged. For example, a semi-transparent plane containing the needle can be projected onto the 2D image indicating the relative pose of the needle with respect to the camera as can be seen in Fig. 3. The plane is represented by the square surrounding the circle containing the needle, whose vertices X i are backprojected to the image xi = P X i . The square is displayed in a semi-transparent way in order to minimize the loss of information due to occlusion and its projective distortion indicates which part of the needle is closer to the camera. As mentioned above, the system described can be extended to allow object tracking, too. If the endoscope is tracked externally and the needle pose is computed in the camera coordinate system, as indicated in Fig. 4a, hybrid tracking needle needle cam marker marker = Hcam Hmarker Hworld , with Hworld the of the needle is possible: Hworld cam pose of the marker in the world coordinate system, Hmarker the transformation relating the camera with the world coordinate system as computed during needle the calibration process, and Hcam the resulting transformation from the pose estimation process as described above. For these experiments, a marker is attached to the endoscope that is being followed by the active optical tracker3 providing accurate position (< 0.2 mm 3
EasyTrack500, Atracsys LLC., http://www.atracsys.com
624
C. Wengert et al.
a)
b)
c)
d)
Fig. 3. a) Example visualization of the needle showing the detected ellipse and the plane, b) needle held by gripper, c) example showing partial occlusion during the suturing process on a phantom mockup, d) degenerate case with the needle plane being almost perpendicular to the image plane
localization error) and orientation information in a working volume of 50 × 50 × 50 cm3 . An external hardware triggering logic ensures the synchronized acquisition of the tracking data and the camera images during dynamic freehand manipulation. a)
marker H world
d
2D Screen
r
Marker
H needle world
b)
Hardware Synchronisation
Optical tracking system
CCD Camera
H cam
PC
marker
H needle
Endoscope
Light source
cam
Gripper Needle
Fig. 4. a) Overview of the different components of the presented system, b) close-up of the needle held by a gripper
Even more navigation cues can be integrated if registered 3D data such as CT or MRI are available and if the needle is tracked. This allows to visualize the possible interactions between the tracked object and the 3D data resulting in a navigation aid. In this paper, such static 3D models are created by the same endoscope, using an intra-operative 3D reconstruction method, as described in [15]. As the endoscope is moved over the surgical scene, the images from this sequence are used to build a 3D model of the visible anatomy online during the intervention, resulting in a 3D triangle mesh Mworld . The plane πworld defined by (C, n) in the world coordinate system can then be cut with the 3D triangle mesh Mworld resulting in a lineset l = πworld ∩ Mworld . A visibility filter vcam returns those lines from this set that are visible in the current view: lvis = vcam (l, Mworld, [R, t]), with [R, t] being the camera pose in the world coordinate system. vcam casts rays from the camera position to both vertices of each line. If a face is encountered between the camera position and one of the line vertices, this line is set to invisible. This pessimistic approach may lead to the complete
Endoscopic Navigation for Minimally Invasive Suturing
a)
625
b)
Fig. 5. a) 2D augmented view with needle plane and cutting line, b) internal 3D representation of the same scene showing the intra-operatively reconstructed texturemapped 3D model, a disk representing the needle and the 3D cutting line
loss of only partially visible line segments. However, this is an improbable event considering usual organ topography. In rare cases where multiple cutting lines result from this procedure, the solution closer to the needle is selected. Please note that, while the reconstructed 3D model is essential for these operations, it remains hidden from the surgeon and the system only presents the final result (i.e. the cutting line), overlaid on the endoscopic view (see Fig. 5a).
3
Results
The framerate for the hybrid tracking is mainly determined by the needle detection and the ellipse fitting process. They both depend on the number of pixels thus the processing time decreases with the distance between the needle and the camera. In the working range of 3 − 7 cm the system runs in real-time and the framerate on a 2.8 GHz CPU is between 15 − 30 fps. The framerate for the virtual interaction depends on the number of vertices and faces of the 3D model. For the 3D models in our experiments with approximately 500 − 1000 faces the framerate dropped to 10 − 15 fps. In order to assess the accuracy of the pose estimation, a 2 DOF positioning table with a resolution of 0.02 mm was used to move the needle to distinct positions, covering the whole working volume, as depicted in Fig. 6a. As the errors in x and y are very similar (both depend mainly on the in-plane resolution of the camera), only the errors in x and z are presented here. Around each distinct position, defining the reference coordinate system eight shifts of 0.1 mm were introduced and the pose estimated by the system. The measurements were then compared to the manually set (ground truth) position. The standard deviations of the errors in the z = 30 mm plane were 0.003 ± 0.065 mm along the optical zaxis and −0.01 ± 0.018 mm along the x-axis. On the lateral positions the errors were −0.02 ± 0.024 mm along the optical z-axis and −0.01 ± 0.015 mm along
626
C. Wengert et al.
a)
30mm
x
70mm
b)
Needle parallel to image plane
Inclined needle plane
z y Camera 10mm
60o
Fig. 6. a) Setup for measuring the errors, b) needle plane parallel to image plane and with an inclination of 60◦
the x-axis. For the z = 70 mm plane, the errors along the optical axis were 0.053 ± 0.05 mm in the z-direction and 0.04 ± 0.03 mm in the x-direction. The measurements on the lateral positions resulted in errors of 0.037 ± 0.024 mm and 0.026 ± 0.043 mm respectively. The angular accuracy was quantified in a similar way as in the above experiment by introducing controlled rotations (5◦ ) of the needle with respect to the image plane. This has been repeated with the needle being parallel to the image plane, and inclined by 30◦ and 60◦ respectively. For the needle plane being parallel to the image plane, the standard deviations for the angular errors were 2.3 ± 0.5◦ in the z = 30 mm plane and 2.5 ± 0.5◦ in the z = 70 mm plane. For the needle plane inclined by 30◦ the standard deviations were 1.1 ± 0.7◦ close to the camera and 1.3 ± 0.7◦ at the farthest position. Finally, for the 60◦ case, the errors were 0.7 ± 0.5◦ and 1.0 ± 0.5◦ respectively. As expected from the projective nature of the image formation, the error decreases with the increasing inclination of the needle plane.
4
Conclusions
In this paper a multi-purpose tracking system for small artificial objects was presented that can cope with partial occlusions, cluttered background and fast object movement. The proposed system allows real-time tracking of a suturing needle with sub-millimeter accuracy. The needle tracking is very robust and quickly recovers from occlusions, however their influence on the accuracy still needs to be carefully investigated. The augmented visualization helps the surgeon to perceive 3D cues from 2D images using the existing instrumentation. The hybrid tracking can improve existing navigation systems by adding the possibility to handle small objects. The color dependence of the system requires the white balance to be set accurately in advance, which has to be integrated into the calibration procedure. The pose ambiguity cannot always be resolved correctly, therefore the system allows to manually switch to the correct solution. It is planned to solve this ambiguity by using other visual cues. Future work includes tracking and pose estimation for other objects such as screws and plates leading to a greater flexibility of the system as well as enabling
Endoscopic Navigation for Minimally Invasive Suturing
627
to simultaneously track multiple objects. The most important step to come is the clinical validation of the system and its impact on the surgeons’ performance. We are currently setting up studies for this purpose, which will also enable us to investigate, which augmentation technique offers the most benefits. Acknowledgment. This work has been supported by the CO-ME/NCCR research network of the Swiss National Science Foundation (http://co-me.ch).
References 1. Krupa, A., et al.: Automatic 3-d positioning of surgical instruments during robotized laparoscopic surgery using automatic visual feedback. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, Springer, Heidelberg (2002) 2. Wei, G.Q., et al.: Automatic tracking of laparoscopic instruments by color coding. In: First Joint Conference on Computer Vision, Virtual Reality and Robotics in Medicine and Medial Robotics and Computer-Assisted Surgery, London, UK (1997) 3. Haralick, R.M., et al.: Pose estimation from corresponding point data. Systems, Man and Cybernetics, IEEE Transactions, 1426–1446 (1989) 4. Rosenhahn, B., et al.: Texture driven pose estimation. In: Proc. of the Int. Conference on Computer Graphics, Imaging and Visualization, pp. 271–277 (2005) 5. Lowe, D.G.: Fitting parameterized three-dimensional models to images. Pattern Analysis and Machine Intelligence, IEEE Transactions, 441–450 (1991) 6. Ji, Q., Haralick, R.: A statistically efficient method for ellipse detection. In: Image Processing, Int. Conference, pp. 730–734 (1999) 7. Thoranaghatte, R.U., et al.: Endoscope based hybrid-navigation system for minimally invasive ventral-spine surgeries. Computer Aided Surgery, 351–356 (2005) 8. Sauer, F., Khamene, A., Vogt, S.: An Augmented Reality Navigation System with a Single-Camera Tracker: System Design and Needle Biopsy Phantom Trial. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 116–124. Springer, Heidelberg (2002) 9. Fuchs, H., et al.: Augmented reality visualization for laparoscopic surgery. In: Wells, W.M., Colchester, A.C.F., Delp, S.L. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 934–943. Springer, Heidelberg (1998) 10. Novotny, P.M., et al.: GPU Based Real-Time Instrument Tracking with Three Dimensional Ultrasound. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 58–65. Springer, Heidelberg (2006) 11. Nicolaou, M., James, A., Lo, B., Darzi, A., Guang-Zhong, Y.: Invisible shadow for navigation and planning in minimal invasive surgery. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 25–32. Springer, Heidelberg (2005) 12. Wengert, C., Reeff, M., Cattin, P., Sz´ekely, G.: Fully Automatic Endoscope Calibration for Intraoperative Use. In: Bildverarbeitung f¨ ur die Medizin, pp. 419–423 (2006) 13. Fitzgibbon, A.W., Pilu, M., Fisher, R.B.: Direct least squares fitting of ellipses. IEEE Transactions on Pattern Analysis and Machine Intelligence (1999) 14. De Ipi˜ na, D.L., Mendon¸ca, P.R.S., Hopper, A.: Trip: A low-cost vision-based location system for ubiquitous computing. Personal Ubiquitous Computing (2002) 15. Wengert, C., Cattin, P.C., Duff, J.M., Sz´ekely, G.: Markerless endoscopic registration and referencing. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 806–814. Springer, Heidelberg (2006)
On Fiducial Target Registration Error in the Presence of Anisotropic Noise Burton Ma1 , Mehdi H. Moghari2, Randy E. Ellis1,3 , and Purang Abolmaesumi1,2,3 1
Human Mobility Research Centre, Kingston General Hospital, Kingston, Ontario, Canada 2 Department of Electrical Engineering, Queen’s University, Kingston, Ontario, Canada 3 School of Computing, Queen’s University, Kingston, Ontario, Canada mab,hedjazi,ellis,[email protected]
Abstract. We study the effect of anisotropic noise on target registration error (TRE) by using a tracked and calibrated stylus tip as the fiducial registration application. We present a simple, efficient unscented Kalman filter algorithm that is suitable for fiducial registration even with a small number of fiducials. We also derive an equation that predicts TRE under anisotropic noise. The predicted TRE values are shown to closely match the simulated TRE values achieved using our UKF-based algorithm.
1 Introduction Many least-squares solutions have been proposed for the problem of fiducial (pairedpoint) registration. The use of least-squares assumes that one set of points is noise free and the other set of points is contaminated with isotropic, zero mean, independent, identically distributed (iid) Gaussian noise. Optical tracking systems that sense points of infrared light are commonly used in commercial navigated surgical systems. These systems measure coordinate reference frames (CRFs), which are essentially a set of infrared emitting/reflecting fiducial markers, that are rigidly attached to the tracked object. The measurement precision is typically worse in the viewing direction of the cameras for such tracking systems. Khadem and colleagues [1] found that the jitter in the measured position of a static target was anisotropic with the greatest deviation occurring in the viewing direction of the tracking system. Their results showed an anisotropy as large as a factor of five or more when using a Polaris tracking system with a passive target. Ohta and Kanatani [2] described an algorithm designed to accommodate anisotropic, non-identical Gaussian noise in both the model and measurement coordinate systems. This algorithm was used in a modified version of the ICP algorithm [3]. Pennec and Thirion [4] used an extended Kalman filter as part of a framework for registration using points and frames. Their approach accommodated anisotropic noise in both sets of points to be registered. Fitzpatrick and colleagues [5] derived an expression for fiducial target registration error (TRE) in k-dimensions. Target registration error is simply the magnitude r − r where r is the expected location of a target point and r is the registered location the target point. Their derivation was performed assuming zero-mean, isotropic, iid N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 628–635, 2007. c Springer-Verlag Berlin Heidelberg 2007
On Fiducial TRE in the Presence of Anisotropic Noise
629
Gaussian noise. West and Maurer [6] described how to use these results to design targets for optically tracked surgical instruments. Ma and Ellis [7] presented analytic expressions based on a spatial stiffness model for TRE for both fiducial and surface-based registration. Their expression for fiducial TRE was identical to that published by [5]; thus, their approach was only applicable to isotropic Gaussian iid noise. It can be shown that the stiffness matrices they derived are based on first-order Taylor series approximations of rotation and translation [8]. Moghari and Abolmaesumi [9] used the unscented Kalman filter (UKF) to solve the fiducial registration problem and estimate the covariance of the state parameters [tx , ty , tz , θx , θy , θz ]T , where [tx , ty , tz ]T is the translation and [θx , θy , θz ]T is the vector of ZY X Euler angles. Given a sufficient number of markers, their algorithm was able to estimate the mean squared TRE and the distribution of TRE. Their work appears to be an improvement over [4] with regard to estimating TRE and its distribution [10]. We present three significant contributions over prior art in this article. The first contribution is a comparison of fiducial registration algorithms when there is anisotropic, identically distributed noise in the fiducial measurements. The second contribution is the derivation of an equation that predicts the expected root mean square (RMS) TRE for fiducial registration with anisotropic noise. The third contribution is the introduction of a simple, UKF-based fiducial registration algorithm that, as we demonstrate, achieves the TRE predicted by our derived equation. We use simulations of a pointing stylus and an image registration problem to demonstrate the effects of anisotropic noise on TRE.
2 Method 2.1 UKF Fiducial Registration Algorithm A conventional filtering algorithm processes observations as they are made available, and then does not reconsider them; in this way, it is able to efficiently perform a sequential update of the state estimate. The UKF algorithm described in [9] is unusual in that it continually reprocesses previous observations by appending new observations to the vector of old observations, lengthening the observation vector with each marker observation; this negates the advantage of efficient state estimate updates. Our UKF algorithm processes all fiducial marker observations in one update. The state model is xi+1 = xi + vi where xi = [tx , ty , tz , θx , θy , θz ]Ti is the vector of registration parameters (identical to that in [9]) at time i and vi is the noise associated with the uncertainty of the state estimate. We assume that vi is drawn from a zero-mean Gaussian with covariance matrix Vi . Our⎤observation model for n fiducial markers is ⎡ T g1
yi =
.. .
gn
=⎣
R(θx ,θy ,θz )i f1 +[tx ,ty ,tz ]i
.. .
⎦ + ni where gj is the j th measured fiducial
R(θx ,θy ,θz )i fn +[tx ,ty ,tz ]T i
location, fj is the j th model fiducial location, R(θx , θy , θz )i is the ZY X Euler rotation matrix computed using the estimated rotation state at time i, [tx , ty , tz ]T is the estimated translation at time i, and ni is the measurement noise. We can compute a good initial state estimate x0 using a least-squares algorithm such as Horn’s method [11]. One iteration of the UKF is performed producing a new estimate of the registration parameters and parameter covariances. The estimate is accurate
630
B. Ma et al.
enough that it can be corrected by the filter even though Horn’s method is known to be suboptimal under anisotropic noise. We initialize the state parameter covariance V0 with a reasonable guess based on the expected measurement noise magnitude. All simulations in this article used a diagonal matrix for V0 with elements of (1mm)2 for translation and (1rad)2 for rotation. 2.2 Spatial Stiffness Analysis for Anisotropic Noise The model of fiducial registration published in [7] treated each noise-free fiducial location as an end point of a zero-length linear spring. Noise in the registered marker location was considered as a small extension of the spring. The springs had no preferred direction, which was appropriate for isotropic noise. We modify their model to accommodate anisotropic noise by replacing the single spring with three directional springs with appropriate spring constants. In general, the spring constants and directions are related to the principal components of the noise covariance matrix for the fiducial marker; the spring constants are the reciprocals of the eigenvalues of the covariance matrix, and the directions are the eigenvectors of the covariance matrix. If the covariance matrix is diagonal, then the spring constants are the reciprocals of the variances; that is, we weight the stiffness of each spring in inverse proportion to the noise variance just as we would in a typical weighted least-squares solution. For the purposes of this article, we will assume that we are working with fiducials measured in the optical tracker coordinate system so that the noise covariance matrix for the marker locations is diagonal. Our derivation of the anisotropic stiffness matrix follows that of [7]. Let the j th fiducial be fj = [xj , yj , zj ]T . If fj is perturbed by a small rotation R(θx , θy , θz ) and translation t = [tx , ty , tz ]T , its new position is gj = R(θx , θy , θz )fj + t. The potential energy stored in the springs associated with the marker can be written as Uj = 12 (gj − fj )T diag(kxj , kyj , kzj )(gj −fj ) where diag(kxj , kyj , kzj ) is the 3×3 diagonal matrix of spring constants. The Hessian Hj of Uj evaluated at zero displacement is Hj = H(Uj ; θx = θy = θz = tx = ty = tz = 0) ⎡ kx 0 0 0 kx zj −kzj xj
−kxj yj kyj xj 0
kzj yj2 +kyj zj2
−kzj xj yj
−kyj xj zj
−kzj xj
−kzj xj yj
kzj x2j +kxj zj2
−kxj yj zj
0
−kyj xj zj
−kxj yj zj
kyj x2j +kxj yj2
j
⎢ ⎢ =⎢ ⎢ ⎣
0 0 0 kxj zj
kyj 0
−kyj zj kzj yj
0 kzj
−kyj zj kzj yj 0
−kxj yj kyj xj
j
0
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
B The stiffness matrix for n markers is K = nj=1 Hj = BAT D where A, B, and D are 3 × 3 block matrices. Following [7] we can write the mean squared TRE as TRE2 (r) ∝
1 σ
1
+
1 σ
2
+
1 σ3
translational component
1 1 + μeq,1 + μeq,2 +
1 μeq,3
(1)
rotational component
where r is the target location, σ1 , σ2 , σ3 are called the principal translational stiffnesses and μeq,1 , μeq,2 , μeq,3 are called the equivalent principal rotational stiffnesses. The stiffness quantities were first described by Lin and colleagues [12]; refer to [12] for details on computing the principal stiffnesses. It was previously shown that the rotational and
On Fiducial TRE in the Presence of Anisotropic Noise
631
translational stiffnesses are independent for fiducial registration [7]; thus, we are justified in using addition in quadrature of the two components of TRE in Equation 1. Only the rotational component of Equation 1 depends on the target location r. Equation (1) only gives the mean squared TRE to within a constant factor because the spring constants were scaled by the inverse of the variances. In the case of identical noise in all of the fiducials, we can recover the constant factor because the magnitude of the translational component can be computed in a different way. Let the identically distributed, zero-mean noise have covariance matrix N = diag(n2x , n2y , n2z ). Let the mean ¯ = [F¯x , F¯y , F¯z ]T . The mean of the noise-free of the n noisy marker locations be F ¯ The expected squared translation magnimarker locations is the expected value E[F]. ¯ − E[F] ¯ tude of the mean marker location is simply the expected magnitude of F ¯ − E[F] ¯ 2] E[δ 2 ] = E[F = E[(F¯x − E[F¯x ])2 ] + E[(F¯y − E[F¯y ])2 ] + E[(F¯z − E[F¯z ])2 ] = var(F¯x ) + var(F¯y ) + var(F¯z ) = n1 n2x + n2y + n2z .
(2)
Equation (1) can now be rewritten to give the estimated squared TRE as TRE2 (r) = f
1 σ1
+
1 σ2
+
1 σ3
+
1 μeq,1
+
1 μeq,2
+
1 μeq,3
,
f=
1 2 2 2 ) n (nx +ny +nz 1 1 1 + + σ σ σ 1
2
(3)
3
2.3 Experimental Validation Consider an optical tracking system and a calibrated digitizing stylus like the one shown in Figure 1. Suppose that the stylus is oriented so that its z = 0 plane is perpendicular to the viewing direction of the optical tracker (i.e., directly facing the tracker). In our simulations, we rotated the stylus about its x axis from −45◦ to 45◦ in increments of 15◦ . At each angle of rotation, we generated 10,000 sets of measured marker locations for the CRF where each measured marker location gj was the model marker location fj rotated by the angle of rotation and contaminated with zero-mean, additive Gaussian noise of covariance Ni = N = diag(n2xc , n2yc , n2zc ) where nxc = nyc (isotropic noise in the camera viewing plane), nzc = snxc for some scalar s ≥ 1 (anisotropic noise in the viewing direction), and n2xc + n2yc + n2zc = c for a constant value c = 0.12 + 0.12 + 0.32 mm2 (constant total noise magnitude). All noise variances are given in the tracking camera coordinate system. The model marker locations were registered to the noisy measured marker locations using Horn’s method (Horn), Ohta and Kanatani’s (Ohta) method, and our UKF method (UKFreg). For each registration, we computed TRE using the tip of the stylus as the target. We also computed the expected mean squared TRE for each trial of the UKF method using the calculation described by [9] (UKFest). At each angle, we computed TRE using Equation (3). We performed the simulations with two different stylus CRFs. The first CRF was identical to that shown in Figure 1. The second CRF also used four markers but in a tetrahedral arrangement; the marker coordinates in units of millimeters were [45, 25, 0]T , [0, −50, 0]T , [−45, 25, 0]T , and [0, 0, 50]T .
632
B. Ma et al.
Fig. 1. (Left) Model of the pointing stylus used in our simulations; all units are in millimeters. (Right) Optical tracking system and stylus orientation used in our simulations. Measurement noise variance in the viewing direction −zc is typically greater than those in the viewing plane.
Fig. 2. Simulation results for CRF 1. (Left) TRE versus rotation angle for isotropic noise, s = 1. (Middle) TRE versus rotation angle for anisotropic noise, s = 5. (Right) Square root of the largest principal component of the tip error covariance matrix for anisotropic noise s = 3.
We also performed registration simulations based on identifying fiducials in CT images where the slice spacing is much greater than the pixel size. The Ilizarov frame registration problem described by Ma and colleagues [13] was studied. Only the proximal ring was used, to which we applied a 30◦ rotation about the x-axis. We used uniformly distributed noise in the range of [−0.5, 0.5] mm for the x and y directions and [−2.5, 2.5] for the z direction. TRE was computed on a regular grid with 20 mm spacing in the plane of the ring.
3 Results The RMS TRE as a function of the stylus rotation angle is shown in Figure 2 for CRF 1. Note that the TRE curves are identical and constant for isotropic noise; the RMS value of 0.68 mm matches the value predicted by West and Maurer [6, Equation (16)]. ˆ i p0 + ˆti ) We computed the covariance of the tip registration error vector pθ − (R ˆ where pθ is the true position of the tip at rotation angle θ, Ri and ˆti are the estimated rotation and translation for trial i, and p0 = [0, −200, 0]T is the model tip location in millimeters. We then computed the eigenvalues of the covariance matrix (the decorrelated variances); the square root of the two largest eigenvalues (the two largest “standard deviations”) as a function of rotation angle are also shown in Figure 2. The RMS TRE computed from the simulation using UKFreg, UKFest, and Equation (3) are shown in Figure 3. UKFest tended to overestimate TRE, whereas Equation (3) predicted a value that agreed surprisingly well with the simulation results of UKFreg. We computed the confidence intervals for the RMS TRE values of UKFreg; we used the BCa bootstrap [14] to deal with the asymmetry of the TRE distributions.
On Fiducial TRE in the Presence of Anisotropic Noise
633
Fig. 3. Root mean squared TRE versus stylus rotation angle for CRF 1. (Left) TRE estimated using UKFest and simulated TRE values. (Right) TRE estimated using Equation (3) and simulated TRE values.
Fig. 4. Root mean squared TRE versus stylus rotation angle for CRF 2. (Left) Simulated values of RMS TRE for s = 3. (Right) Predicted values of RMS TRE.
Fig. 5. Ilizarov frame experiment results. (Left) Difference in TRE between Horn’s method and UKFreg. (Right) Difference in TRE between UKFest and UKFreg.
We found that the value given by Equation (3) was always within the 95% confidence interval for isotropic noise. For anisotropic noise, Equation (3) gave a value that was slightly below the lower limits of the confidence intervals; however, the difference between the RMS value and Equation (3) was never more than 7% of the RMS value. The results for the second stylus CRF configuration are shown in Figure 4. The TRE behavior for this stylus was quite uniform over the range of rotation angle. UKFreg had the best TRE performance, but there was dramatically less difference between the registration algorithms.
634
B. Ma et al.
Fig. 6. (Left) Tip TRE is worst when the stylus is oriented to face the direction of greatest noise anisotropy (typically the viewing direction of the camera) because such an orientation results in the greatest expected rotational error. (Right) Tip TRE is minimized by orienting the stylus face away from the camera viewing direction which minimizes the contribution of the rotational error.
The results for the Ilizarov frame example are shown in Figure 5. UKFreg always produced a smaller RMS TRE value than Horn’s method. UKFest predicted the simulation results of UKFreg to within 12% of the RMS TRE value even though we used uniformly distributed noise.
4 Discussion and Conclusion All three estimators used in the simulations had identical worst performances when the CRF of the stylus was directly facing the tracking system (rotation angle of 0◦ ). This result is easily explained with reference to Figure 6. Aligning the CRF to face the tracking system produces the situation that allows for the greatest expected rotational error which causes a displacement of the apparent tip location that is proportional to the length of the stylus. Clearly, the stylus orientation that produces the least expected rotation error is that where the face of the CRF is perpendicular to the viewing direction; of course, in practice the CRF would not be visible to the camera in this orientation. Horn’s method, which assumes isotropic noise, performed the worst in our simulations. Ohta and Kanatani’s method, which is optimal with respect to their definition of rotation covariance, performed better than Horn’s method but not as well as our UKFreg algorithm; also, it did not produce results consistent with our theoretical prediction given by Equation (3). We are continuing to investigate the cause of this discrepancy. We implemented the method, UKFest, described by Moghari and Abolmaesumi [9,10] to estimate TRE. UKFest uses the state covariance estimate of the UKF to predict TRE. We found that it overestimates TRE when using a small number of fiducials. This result was not surprising because the estimated covariance is unlikely to be very accurate given the small number of fiducials and the single step update we used. We were pleasantly surprised by the degree of similarity between the predicted RMS TRE of Equation (3) and the simulated RMS TRE of our UKF algorithm. The superior performance of our algorithm in terms of TRE versus the other algorithms we tested combined with the strong agreement to Equation (3) leads us to speculate that our algorithm is almost optimal under conditions of identically distributed, anisotropic noise. Note that UKFreg does not differ substantially from the EKF algorithm of [4]. Arguably, the UKF is easier to implement because it does not require the computation of Jacobians. We have demonstrated that UKFreg achieves the TRE predicted by our stiffness model, and we expect that the EKF algorithm would perform similarly.
On Fiducial TRE in the Presence of Anisotropic Noise
635
The simulations using the CRF with one out of plane marker demonstrated the superiority of this design over the flat CRF. West and Maurer [6] showed that a regular tetrahedron was the ideal configuration of fiducials for isotropic noise. Our results show that such a configuration is also preferred over a flat CRF for anisotropic noise. We have studied the case where a registration algorithm is used to match a model of a target to the measurements made by a tracking system. An alternative method is to use a Kalman-type filter to perform the tracking, which removes the need for an explicit registration algorithm. The UKF we used is easily adapted to such a purpose. In summary, we have demonstrated that anisotropic noise can have a significant effect on TRE, especially when a suboptimal registration algorithm is used. Our registration algorithm works well in the presence of anisotropic noise, and it produces results consistent with a theoretical model of fiducial registration TRE.
References 1. Khadem, R., Yeh, C.C., Sadeghi-Tehrani, M., Bax, M.R., Johnson, J.A., Welch, J.N., Wilkinson, E.P., Shahidi, R.: Comparative tracking error analysis of five different optical tracking systems. Computer Aided Surgery 5, 98–107 (2000) 2. Ohta, N., Kanatani, K.: Optimal estimation of three-dimensional rotation and reliability evaluation. IEICE Transactions on Information and Systems E82-D(11), 1247–1252 (1998) 3. Est´epar, R.S.J., Brun, A., Westin, C.F.: Robust generalized total least squares iterative closest point registration. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 234–241. Springer, Heidelberg (2004) 4. Pennec, X., Thirion, J.P.: A framework for uncertainty and validation of 3D registration methods based on points and frames. IJCV 25(3), 203–229 (1997) 5. Fitzpatrick, J.M., West, J.B., Maurer, Jr, C.R.: Predicting error in rigid-body point-based registration. IEEE Transactions on Medical Imaging 17(5), 694–702 (1998) 6. West, J.B., Maurer, Jr, C.R.: Designing optically tracked instruments for image-guided surgery. IEEE Transactions on Medical Imaging 23(5), 533–545 (2004) 7. Ma, B., Ellis, R.E.: Analytic expressions for fiducial and surface target registration error. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 637–644. Springer, Heidelberg (2006) 8. Huang, S., Schimmels, J.M.: The bounds and realization of spatial stiffness achieved with simple springs connected in parallel. IEEE Trans. Robot and Automat. 14(3), 466–475 (1998) 9. Moghari, M.H., Abolmaesumi, P.: A high-order solution for the distribution of tre in rigidbody point-based registration. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 603–700. Springer, Heidelberg (2006) 10. Moghari, M.H., Abolmaesumi, P.: Comparing the unscented and extended Kalman filter algorithms in the rigid-body point-based registration. In: IEEE EMB, pp. 497–500. IEEE Computer Society Press, Los Alamitos (2006) 11. Horn, B.K.P.: Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America A 4, 629–642 (1987) 12. Lin, Q., Burdick, J., Rimon, E.: A stiffness-based quality measure for compliant grasps and fixtures. IEEE Transactions on Robotics and Automation 16(6), 675–688 (2000) 13. Ma, B., Simpson, A.L., Ellis, R.E.: Proof of concept of a simple computer-assisted technique for correcting bone deformities. In: Ayache, N., Ourselin, S., Maeder, A. (eds.) MICCAI 2007. LNCS, vol. 4792, pp. 935–942. Springer, Heidelberg (2007) 14. DiCiccio, T.J., Efron, B.: Bootstrap confidence intervals. Statistical Science 11(2), 189–228 (1996)
Rotational Roadmapping: A New Image-Based Navigation Technique for the Interventional Room Markus Kukuk1,2 and Sandy Napel2 2
1 Siemens Medical Solutions, Forchheim, Germany Stanford University, Department of Radiology, USA [email protected]
Abstract. For decades, conventional 2D-roadmaping has been the method of choice for image-based guidewire navigation during endovascular procedures. Only recently have 3D-roadmapping techniques become available that are based on the acquisition and reconstruction of a 3D image of the vascular tree. In this paper, we present a new image-based navigation technique called RoRo (Rotational Roadmapping) that eliminates the guess-work inherent to the conventional 2D method, but does not require a 3D image. Our preliminary clinical results show that there are situations in which RoRo is preferred over the existing two methods, thus demonstrating potential for filling a clinical niche and complementing the spectrum of available navigation tools.
1 Introduction The number and breadth of minimally invasive, image-guided therapies is ever increasing. Of particular interest are endovascular procedures, which allow minimally invasive access to all areas of the human body through the vascular system as, for example angioplasty, vascular stenting, embolization, chemoembolization, thrombolysis and TIPS (Transjugular Intrahepatic Portosystemic Shunt). These procedures are typically performed in an interventional room using C-arm based X-ray imaging (see Fig. 1), together with the selective injection of contrast material for blood vessel opacification. Common to all endovascular procedures is the navigation of a guidewire or catheter through the vasculature to a target site. Depending on the degree of tortuousity and structural complexity of the vascular tree, especially in diseased vasculature, guidance (image- or sensor-based [1]) is often required for targeted steering. Essentially unchanged from its first introduction in the early 1980s, a technique called 2D-roadmapping [2] has long become clinical routine for image-based guidewire navigation. The basic idea is to acquire an image of the vasculature of interest and to store it as a “roadmap” image. Then, the guidewire or other instrument as shown under live fluoroscopy is continuously superimposed onto the roadmap image, thus visualizing the instrument with respect to the vasculature. For acquiring the roadmap image, the interventionalist estimates how to position the C-arm for finding a suitable working view, which is considered ideal if it shows the vessel bifurcation to be negotiated perpendicular to the viewing direction, thus eliminating self occlusion. However, the vessel tree is visible only after the injection of contrast media, which N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 636–643, 2007. © Springer-Verlag Berlin Heidelberg 2007
Rotational Roadmapping
637
Fig. 1. Left: C-arm system “Siemens Axiom Artis dTA” (top) and its table side controls (bottom). Arrows indicate the C-arm joystick (left) and RoRo joystick (right). Right: RoRo screen, consisting of three columns: (a) image panel, (b) alignment signal, (c) control panel.
may or may not reveal the desired viewing angle. During the course of a lengthy intervention, the individual injections of contrast material, which is toxic in large doses, can add up to a significant amount due to the trial-and-error nature of the approach. Only recently, a new image-based navigation technique has been introduced: 3Droadmapping [3]. This technique is based on the acquisition and reconstruction of a 3D image (C-arm CT, CTA, MRA) of the vessel tree, which then serves as the roadmap image. CTA and MRA are acquired during an intra-venous contrast injection and require registration with the C-arm system (2D/3D registration [4]). C-arm CT images, acquired during a selective, intra-arterial injection during the procedure are obtained from a rotational acquisition of projection images and the subsequent application of a cone-beam reconstruction method [5]. Following a one-time calibration step, one can correctly render an image of the 3D volume as would be “seen” by any given C-arm configuration. In other words, roadmap images are available from any angle, thus eliminating the guess-work in finding an ideal working view. In situations where a 3D angiographic acquisition is routinely performed for diagnostic purposes, 3D-roadmapping appears to be the perfect navigation tool at no extra “cost” regarding dose or contrast agent. However, there are clinical situations in which a 3D acquisition for the purpose of navigation is either not justified or technically challenging. The following represents a list of situations that may cause reconstruction artifacts and may therefore result in compromised image quality: cardiac and respiratory motion during acquisition, inhomogenous flow of contrast material due to cardiac pulsation, high velocity flow of contrast material, photon starvation along the shoulder axis and in the presence of indwelling metal such as coils and stents. Furthermore, image quality is directly related to the number of acquired projections and therefore a function of volume of contrast media injected. In this paper we present a new image-based navigation technique that fits technically and clinically between the described 2D- and 3D roadmapping techniques:
638
M. Kukuk and S. Napel
Rotational Roadmapping (RoRo). RoRo is based on a single rotational acquisition of multiple views (2.5D-roadmapping). Because 3D reconstructions are not performed, acquisitions of any length and any number of projections are possible. Instead, the projections are directly used for roadmap navigation. RoRo can be regarded as a natural extension of the conventional 2D roadmapping technique, as it allows rotating the C-arm during contrast injection, instead of keeping it stationary. Thus, more information is acquired per unit of contrast agent, producing a “rotatable roadmap” that shows the vessel tree from multiple views. After the acquisition, each projection of the rotatable roadmap can be selected to serve as a roadmap image for classic 2D-roadmapping navigation. The same rotatable roadmap can be used again and again to find the ideal working view for each segment of the vessel tree, thus rendering the injection of additional contrast media unnecessary. Furthermore, the vessel trajectories can be viewed in 3D, using stereographic visualization (3D glasses) using suitably selected pairs of views, separated by a small angle [6]. We have developed a research prototype that implements the RoRo approach, with software and hardware fully integrated into a C-arm system (Axiom Artis dTA, dBA, Siemens Medical Solutions) that can be controlled during an intervention from the patient table side (see Fig. 1). We present first clinical results that demonstrate RoRo’s potential for filling the gap between the existing methods.
2 Material and Methods Although RoRo’s principal idea is straightforward, several challenges need to be addressed in order to provide a solution that meets the high demands of clinical, intraoperative software: seamless workflow integration, accurate C-arm/image alignment and vessel tree/instrument visualization. 2.1 Clinical Workflow 1. 2. 3. 4. 5. 6.
Image acquisition (not part of RoRo) Image transfer to RoRo RoRo Phase 1: Find (suitable) working view RoRo Phase 2: Align C-arm with selected working view RoRo Phase 3: Roadmapping (Instrument guidance using selected working view) If a different working view is needed, return to RoRo Phase 1
Image acquisition consists of a conventional rotational acquisition of any length and angular increment. For example, in case of a rotational DSA (Digital Subtraction Angiography) acquisition, two identical rotational runs are performed: the first without (mask run) and the second with the injection of contrast media (fill run). After acquisition, the study consisting of two runs of 2D projections is sent to the RoRo application by means of a DICOM transfer. Upon completion of the image transfer, RoRo computes a third “DSA run,” by subtracting corresponding fill from mask images. RoRo then automatically enters phase 1 by displaying the DSA run as a “rotatable roadmap.” For finding a suitable working view, the C-arm and the rotatable roadmap are linked to each other in two
Rotational Roadmapping
639
Light blue Head/ Feet
Left/ Right
Red Left/ Right
Dark blue
Red
Head/ Feet
Fig. 2. Interactive C-arm/image alignment. Left: C-arm joystick; Center: Alignment signal. Right: C-arm’s Left/Right (top) and Head/Feet (bottom) plane, indicating the current C-arm position by an arrow and the currently displayed image by a box. Images were acquired every 1.5˚ (dots) along the Left/Right plane at 0˚ Head/Feet angulation. The deviation between the C-arm position and the currently displayed image is indicated by the alignment signal. The top half shows the alignment for the Left/Right plane to be within 1.0˚ (light blue) while the bottom half shows the alignment for the Head/Feet plane to be within 0.5˚ (dark blue). The two arrows in the alignment signal indicate the direction in which to move the joystick to improve alignment for the respective plane: “upward” and “left” will move the C-arm closer to 93˚L/R and 0˚ H/F.
ways, as described in the next section. At this point, zoom formats and SID (Source Image Distance) can be adjusted as needed. Once the C-arm and the selected working view are aligned, RoRo enters Phase 3 “roadmapping.” By pressing the fluoro footswitch, live fluoroscopic images of the instruments being guided are acquired and overlaid onto the selected view of the vessel tree, similar to the classic 2D roadmapping technique. At any time during Phase 3, the user can return to Phase 1 for finding a new working view. This is done either by moving the C-arm into a new position (roadmap follows C-arm), or by using the RoRo-joystick to rotate the roadmap (C-arm follows roadmap). The RoRo joystick is sterilized together with the standard controls by draping them in clear plastic sheets. 2.2 C-Arm/Image Alignment One principal challenge with the RoRo approach is to provide for a fast and accurate alignment of the C-arm with the currently displayed working view. After image acquisition, vessel images are only available at discrete points along the acquisition trajectory (see Fig. 2, right), while the C-arm operates in continuous space and can therefore be moved in any position. Accurate roadmap visualization can only be provided if the C-arm is positioned close enough to the exact position at which the currently displayed vessel image was acquired. To provide visual feedback regarding the current C-arm/image alignment, an “alignment signal” (Fig 1 (b)), shaped like an arrow head pointing to the left is used. It is divided in two parts: The upper half represents the alignment with respect to the Left/Right image plane, while the lower half represents the alignment with respect to the Head/Feet image plane. The degree of alignment is expressed in a color code as well as in an exact angle value. A dark blue and light blue color represents a “very good” (≤ 0.5˚) and “good” (≤ 1.0˚) alignment, while the color red signals an “unacceptable” (> 1.0˚) alignment. Roadmapping is
640
M. Kukuk and S. Napel
only possible for a “very good” or “good” alignment. We implemented two solutions to the problem of fast and accurate C-arm alignment that appear to have their respective advantages and shortcomings and are therefore in practice commonly used interchangeably: automatic and interactive alignment. Automatic alignment is performed in two steps, corresponding to RoRo Phases 1 and 2. First, the interventionalist uses the RoRo joystick to browse through the available projections in order to find the most suitable working view. Then, he/she clicks on the “2 - Move-C-arm” button in the control panel (Fig. 1 (c)). This sets the C-arm system into “automatic run” mode. Simply deflecting the C-arm joystick will automatically move and stop the C-arm at the correct position. After the position has been reached, the “Alignment signal” switches to “dark blue” and roadmapping can begin. Interactive alignment is done in only one step. As the interventionalist moves the C-arm back and forth along the acquisition plane, the image closest to the current Carm position is displayed. This is perceived as the vessel tree “following” the C-arm. In other words, browsing and aligning are done in parallel. As can be seen in Fig 2, top right, if the difference d between two images is sufficiently small (e.g 1.5˚), roadmapping will be possible for any C-arm position within acquisition range. If d is greater than 2.0˚, there will be C-arm positions, for which the alignment is “unacceptable” (red signal) and therefore roadmapping will not be allowed. In this case, the interventionalist interactively fine-tunes the C-arm position using the alignment signal, until a “very good” or “good” alignment is reached. While automatic alignment is exact, it requires the use of two different joysticks and is therefore relatively slow. On the other hand, interactive alignment only requires the use of the C-arm joystick but has an associated “learning curve”. In most cases, images can be acquired at small enough intervals (d ≤ 1.5˚), in which case no finetuning is required and alignment is achieved once the working view has been found. 2.3 Visualization Although visualization is limited to 2D image display, several challenges for a variety of display modes need to be addressed. For a smooth display of all guidewire manipulations, fluoroscopy frame rates of 30fps are often used. To match this frame rate, OpenGL texture mapping was employed for image display. To additionally allow for the display of non-square, non-power-of-two projections (limitation to OpenGL versions before 2.0), images are broken up into square, power-of-two tiles, which are then individually mapped to the screen. During all display modes, basic display functionality, such as window/level, zoom and pan is available. It is of particular clinical importance to allow the interventionalist to change the imaging system’s zoom format or SID (source image distance) after the acquisition of the rotational run. RoRo then automatically reformats the DSA projections in real time, to reflect the changes. Additionally, a stereo visualization mode allows the interventionalist to perceive vessel trajectories in 3D. This is achieved by simply selecting two projections as a left and right eye stereo-pair and displaying them such that each eye only sees its respective image. We implemented a stereo projection system using polarizing filters in front of two DLP projectors with matching polarized glasses and the red/blue (anaglyph) technique, using red/blue glasses.
Rotational Roadmapping
641
chin chin
collarbone
rib catheter vertebra
Fig. 3. Clinical results. Case #1: Neuro interventional room (top, left). Case #2: RoRo during Phase 1 - find working view, displaying the “rotatable roadmap” as obtained from a rotational DSA acquisition. Fill image (top, right) and subtracted images (bottom row) at different viewing angles. Note inhomogeneously opacified vessel (arrows) caused by pulsation.
During Phase 1 - find working view, the interventionalist can toggle through all types of images available. In case of a DSA acquisition (see Fig. 3) these are: native, fill and DSA run. During Phase 3 (roadmapping) three display modes are available: subtraction, anatomical background and fluoro-fade. In subtraction mode, the guidewire is segmented, by continuously subtracting the live fluoroscopic image from a previously acquired fluoro mask. The image showing the segmented wire is then subtracted from the currently selected DSA working view (see Fig. 4, left). Since subtraction increases noise, a temporal filter is used in the segmentation of the wire. The same technique is used for the anatomical background display mode (see Fig. 4, right), with the difference, that the segmented wire is subtracted from the fill image of the currently selected working view, thus showing the instrument and vessels in relation to bony structures. However, fluoro mask creation fails in the presence of motion, in which case the fluoro-fade mode should be used. Here, the live fluoroscopic image and the current DSA working view are directly merged, by displaying them in different color channels using OpenGL color blending.
642
M. Kukuk and S. Napel
Fig. 4. RoRo Phase 3 - roadmapping. A guidewire is navigated into the feeding vessels of the tumor using subtraction mode (left) and anatomical background mode (right). Note that the alignment signal is on “blue”, indicating that the C-arm is aligned with the displayed image.
3 Clinical Results RoRo has been used for advanced guidewire navigation in two clinical cases. As a start, a routine and relatively straightforward case was selected. An 81 year old male presented with a cerebral aneurysm. As a routine measure, a 3D acquisition was performed to assess the neck of the aneurysm in 3D. After the treatment plan was determined, the acquired projections were sent to RoRo. A guidewire was then navigated into the aneurysm by selecting the best working view and automatically aligning the C-arm, using RoRo. However, when a 0.3mm micro-guidewire was used it became evident, that for this research prototype visualization of very thin instruments needs improvement. In the second case, a 41 year old male presented with a spinal cord tumor. The treatment plan involved embolization of the tumor to facilitate later resection. A 3D acquisition was not considered due to the proximity of the tumor to the shoulder axis. Acquisitions along the shoulder axis are typically very dark due to photon starvation and may therefore result in compromised 3D image quality. Instead, a short 82˚ rotational acquisition (55 projections) was performed, sent to RoRo and displayed within 20s. Image quality was excellent with a pixel size of 0.2mm. After rotational acquisition, but before treatment started, zoom formats and SID (source image distance) were adjusted as needed. RoRo was then controlled from the tableside and repeatedly used for finding the ideal working view for guidewire navigation and embolization of the tumor’s feeding vessels. Interactive C-arm alignment was found to be ideal, since fine-tuning was not required. Due to an image spacing of 1.5˚, all C-arm positions within the acquisition range could directly be used for roadmap navigation. Closer examination of the angiographic images revealed another reason why a 3D acquisition and subsequent reconstruction would have been challenging. As shown by the arrows in Fig. 3, one large vessel segment was inhomogeneously opacified over time, resulting in inconsistent views, which typically creates reconstruction artifacts and compromises image quality.
Rotational Roadmapping
643
4 Discussion and Conclusion RoRo effectively eliminates the main limitation of the 2D roadmapping technique, by allowing the interventionalist to visually select the best working view. In addition, RoRo provides depth perception through stereoscopic viewing, allowing the assessment of vessel trajectories in 3D. While the 3D roadmapping technique appears to be the method of choice in cases where a high quality 3D image can be acquired, in clinical practice this is a challenging task, since several factors have to be taken into account: cardiac and respiratory motion, bolus timing, filling artifacts, acquisition speed, washout rate, acquisition range and the presence of metal artifacts. Each of these factors can greatly compromise image quality and therefore the use of the image for roadmap navigation. Unfortunately, their presence and extent is often realized only after image acquisition, when the contrast material and radiation has already been administered. Until now, in such cases, the interventionalist is forced to fall back to the 2D roadmapping technique. In contrast, we propose a roadmapping technique that is robust, flexible and contrast media efficient with respect to image acquisition. “Robust,” because roadmapping is done directly on the 2D projection images without performing image reconstruction, and “flexible,” because acquisition parameters can be adapted. For example, a fast 2s acquisition can be performed, covering 72˚, while the fastest 3D acquisition is currently 5s. At the same time, image quality can be optimized by performing a 3s acquisition at a 2k matrix size and 0.15mm pixel size, covering 45˚. RoRo can be considered “contrast media efficient” since it requires an amount of contrast equivalent to the acquisition of only 2-4 conventional 2D roadmaps. RoRo can be used directly, if a 3D acquisition is not considered, or as a fall-back method by using the projections that resulted in a reconstruction of insufficient image quality. RoRo has the potential of filling the gap between the 2D and 3D roadmapping techniques. More clinical studies are currently under way to explore applications in neuro and body imaging.
References 1. Nagel, M., Hoheisel, M., Petzold, R., Kalender, W., Krause, U.: Needle and catheter navigation using electromagnetic tracking for computer-assisted C-arm CT interventions. In: Proc. of SPIE Medical Imaging, vol. 6509 (2007) 2. Tobis, J., et al.: Digital coronary roadmapping as an aid for performing coronary angioplasty. Am. J. Cardiol. 56(4), 237–241 (1985) 3. syngo iPilot – enhanced visual information. In: Axiom Innovation in Intervention, No. 2, pp. 10–17 (March 2006), http://www.siemens.com/axiominnovation 4. Zöllei, L.: 2D-3D Rigid-Body Registration of X-Ray Fluoroscopy and CT Images. Massachusetts Institute of Technology, PhD-Thesis (August 2001) 5. Wiesent, K., Barth, K., Navab, N., Durlak, P., Brunner, T., Schuetz, O., Seissler, W.: Enhanced 3-D-reconstruction algorithm for C-arm systems suitable for interventional procedures. IEEE Trans. Med. Imaging 19, 391–403 (2000) 6. Talukdar, A.S., Wilson, D.: Modeling and Optimization of Rotational C-arm Stereoscopic X-Ray Angiography. IEEE Trans. Med. Imaging 18, 604–616 (1999)
Bronchoscope Tracking Without Fiducial Markers Using Ultra-tiny Electromagnetic Tracking System and Its Evaluation in Different Environments Kensaku Mori1,2 , Daisuke Deguchi2 , Kazuyoshi Ishitani1 , Takayuki Kitasaka1,2 , Yasuhito Suenaga1,2 , Yosihnori Hasegawa3,2, Kazuyoshi Imaizumi3 , and Hirotsugu Takabatake4 1
2
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8603, Japan [email protected] Innovative Research Center for Preventive Medical Engineering, Nagoya University 3 Graduate School of Medicine, Nagoya University 4 Sapporo Minami-Sanjyo Hospital
Abstract. This paper presents a method for bronchoscope tracking without any fiducial markers using an ultra-tiny electromagnetic tracker (UEMT) for a bronchoscopy guidance system. The proposed method calculates the transformation matrix, which shows the relationship between the coordinates systems of the pre-operative CT images and the UEMT, by registering bronchial branches segmented from CT images and points measured by the UEMT attached at the tip of a bronchoscope. We dynamically compute the transformation matrix for every pre-defined number of measurements. We applied the proposed method to a bronchial phantom in several experimental environments. The experimental results showed the proposed method can track a bronchoscope camera with about 3.3mm of target registration error (TRE) for wood table environment and 4.0mm of TRE for examination table environment.
1
Introduction
A bronchoscope is a flexible endoscope that is a tool to observe the inside of the bronchus. A physician inserts a bronchoscope into a patient airway and diagnoses the inside of it with watching a TV monitor showing video images captured by the camera. Since the bronchus has complex tree structure, a physician easily gets disoriented during bronchoscopy. It is strongly expected to develop a bronchoscopy guidance system that assists a physician during bronchoscopy. In construction of a bronchoscopy guidance system, a function for bronchoscope tracking is quite important. There are two types of methods in bronchoscope tracking: (a) methods based on image registration and (b) methods using trackers (positional sensors). Bronchoscope tracking based on image registration basically tracks the tip of the real bronchoscope (RB) by finding a N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 644–651, 2007. c Springer-Verlag Berlin Heidelberg 2007
Bronchoscope Tracking Without Fiducial Markers
645
virtual bronchoscopic (VB) image that is the most similar to the current frame of the real bronchoscope. VB images are generated from CT images. Mori et al. reported a method for tracking a bronchoscope using epipolar geometry analysis and intensity-based image registration of real and virtual endoscopic images [1]. However, image-based registration is very weak against occlusion of RB views. Methods for bronchoscope tracking using positional trackers try to directly capture the motion of the RB camera by attaching trackers at the tip of the bronchoscope. They are very robust to occlusion of views. Nowadays, ultra-tiny electromagnetic trackers (UEMT) are now available from several companies. These trackers can be inserted into the working channel of the bronchoscope. Schneider et al. used a UEMT sensor to obtain the bronchoscope camera position [2]. Their registration errors were about 4mm at the trachea and 15mm at the right upper lobe bronchus. Wegner et al. also tried to track a bronchoscopy by UEMT, but details and performance are not presented in [3]. Deligianni et al. proposed bronchoscope tracking using the UEMT sensor, and pq-based registration techniques are applied to improve tracking accuracy [4]. In general, most of surgical navigation systems utilizing a tracker require fiducial markers attached on a patient’s body for registering the coordinate system (CS) of a pre-operative image (i.e. CT images) and the CS of the tracker. In the case of bronchoscopy, effective placement of fiducial markers is impossible, since we need to place fiducial markers inside the bronchus for better tracking performance. This paper tries to develop a method for tracking an RB camera by using a UEMT tracker. Here, we develop a fiducial-marker free algorithm that registers the CSs of a tracker and pre-operative CT images. The proposed method calculates the transformation matrix showing the relationship between the CSs of the pre-operative CT image and the UEMT by dynamically finding the relation between bronchial branches segmented from CT images and the positions of the RB tip. Also, we evaluate the proposed method in different environments including a wood table and a real examination table.
2 2.1
Method Overview
We attach the sensing coil of a UEMT at the tip of the bronchoscope. The location of the sensing coil is very close to the RB camera. We assume that the RB camera moves only inside the bronchus and moves along the medial axis of the bronchus. We calculate the transformation matrix showing the relationship between the CSs of the UEMT and the pre-operative CT images by registering bronchial branches acquired from the CT images and the points measured by the UEMT. In this method, the transformation matrix showing the relationship between two CSs are gradually updated as the bronchoscope moves inside the bronchus. Points used in registration are also dynamically updated.
646
K. Mori et al.
Fig. 1. Illustration showing the relationship between coordinate systems
2.2
Definition of Coordinate Systems
Figure 1 shows the relationship between each CS. We denote a position in the CS of the RB camera, C, as pC . The position of the corresponding point in the CT CS is described as pCT . We consider the CS defined by the sensor attached on a patient body for tracking patient movement (usually called DRF) as the world CS W. The position of the point pC in the world CS is described as pW . At time k, the relationship between pC and pCT is described as W k W k R tS S CT W F kS CT W kS CT pCT = W T F T S T C T pC = W T S T C T pC = W T S T T pC (1) C 0 1 k = CT W T pW ,
where the sensor CS S is a CS defined by the UEMT sensor attached at the RB camera and SC T is the transformation matrix from CS C to S. The CS of the magnetic field generator (MFG) of the UEMT is described as F. Transformation T is the transformation from UEMT CS F to the world CS W, and CT T is matrix W F W the transformation matrix from the world CS W to the CT CS. Transformation Tk = W TFS Tk . In Eq. (1), W Rk and W tkS are the from S to W is given by W S F S rotation matrix and transformation vector measure by the UEMT sensor at time k, respectively. Transformation SC T shows the relationship between the CSs C and S. SC T can be obtained when the sensor is attached at the tip of the RB. The position and the orientation of the RB camera in the world CS W can be obtained from the output of the UEMT and SC T. By finding the transformation CT T, it is possible to register the CSs C and CT. In this paper, we use the W coordinate system CT as the CS of the VB camera. 2.3
Bronchial Tree Representation
The proposed method uses bronchial branches extracted from pre-operative CT images to register CSs. Here, we represent the i-th bronchial branch as bi = {bsi , bei }, where bsi and bei are the start and the end positions of bi on the CT CS. Also, we represent a set of bronchial branches B = {bi | i = 1, . . . , n}, where n is the number of bronchial branches.
Bronchoscope Tracking Without Fiducial Markers
2.4
647
Processing Procedure
Extraction of bronchial branches. To obtain bronchial branches from CT images, we utilize Kitasaka’s method [5]. This method can extract bronchi regions and their tree representations, simultaneously. W k from the k-th output Estimation of CT W T. During bronchoscopy, we obtain S T of the UEMT attached at the tip of the bronchoscope. The RB camera position pkW at time k in the world coordinate system is computed by
pkW = W Tk SC T pC . S
(2) k During bronchoscopy, we record a set of the RB camera positions P = pW (k = 1 , . . . , N1 ). The l-th updated transformation matrix is denoted as CT Tl . Updates are performed for every pre-defined number of measurements. W l The set of the RB camera position used for l-th update of CT W T is represented as l ˜ ˆ P . P and P are temporary sets to store the RB camera positions. Q is the set of positions corresponding to the pkW on bronchial branches B. The procedure for computing CT W T is described in below. Procedure for estimating matrix
CT W
T
[Step 1] Initialize variables as k = 1, l = 0, and P 0 = φ. ˜ = φ and l = l + 1. [Step 2] P [Step 3] Compute pkW using output W Tk of the UEMT sensor by Eq. (2) and S ˜ append them to temporary set P. [Step 4] k = k + 1. ˜ ≥ N1 is satisfied. (∪ : the set [Step 5] Repeat Steps 3 and 4 until P l−1 ∪ P [Step 6] [Step 7] [Step 8]
[Step 9]
union. |A|: the number of elements included in the set A) ˜ Compute P l as P l = P l−1 ∪ P. CT l CT l CT l−1 Initialize W T as W T = W T We compute CT Tl by the well-known ICP-like algorithm (called ICB W algorithm). The difference from original is that we are measuring the distance between a point and a tree structure (branches). The inputs of the ICB algorithms are (a) a set of points P l and (b) a set of branches B. The distance between a point and a branch is calculated by finding the closest branch to the target point and computing the minimum distance between the selected branch and the target point. The ICB 2 algorithm is terminated when p ∈P , q ∈Q q j − CT Tl pj does not W j j change. Here, Q is a set of points q j that are corresponding to pj and are on the selected branch, Reduce the number of elements included in the set P l into N2 by taking the following procedures:(a) Select a point whose transformed position ˆ (c) Repeat is closest for each branch. (b) Store the selected point P. ˆ (a) and (b) while P < N2 .
ˆ Return to Step 3. [Step 10] P l = P.
648
3
K. Mori et al.
Experiments and Results
We applied the proposed method to a rubber bronchial phantom. Two different UEMTs, the microBird (Tracker1) and the 3D-Guidance (Tracker2) (Assention Technology Inc., Burlington, VT, USA) are used. Tracker1 has a cubic MFG and Tracker2 has a flatbed MFG that can be placed underneath a patient, respectively. Sensing coils are inserted into the working channel of the bronchoscope BF-200 (Olympus, Tokyo). For each tracker, we evaluated the proposed method in two types of environments: (a) a wood table and (b) a real examination table (see Table 1.) Also, we utilized a method for compensating UEMT outputs (called tracker compensation) [6]. To evaluate the stability and effectiveness of the proposed method, we used a rubber bronchial phantom. CT images of the phantom were taken by a multidetector CT scanner (512 × 512 pixels, 341 slices, 0.684mm resolution on a slice image, 1.25mm of X-ray collimation, 0.5mm of reconstruction pitch.) Bronchial region is segmented by Kitasaka’s method [5]. In experiments, we inserted the RB into several branches including the trachea, the right main bronchus, and the left main bronchus. We obtained 1000 pairs of UEMT outputs and RB images. l The transformation matrix CT W T was dynamically updated under the parameters of N1 = 200 and N2 = 180. In real bronchoscopy, the posture of the RB at the trachea is almost same for all patients. The algorithm initialized CT T0 W by using the output of the UEMT at the trachea. The system shows a virtual bronchoscopic view at the carina (first bifurcation). An operator inserts a bronchoscope to this point and set up a bronchoscope to take a real bronchoscopic view similar to virtual bronchoscopic view. By pushing a button, the system T0 . Figure 5 shows the results of the proposed method in examcalculates CT W ination table environment with Tracker2. UEMT compensation technique was used in this case. Attached video was captured in a wood table environment with Tracker2 of no compensation. Intrinsic parameters of the RB camera obtained by Zhang’s method are used to generate VB images. We measured target registration error (TRE) to evaluate accuracy of the proposed method. During bronchoscopy. Two kind of TRE, (a) Internal TRE and (b) external TRE, were measured. For internal TRE, we held the RB camera at five branching points. Then, the RB camera position piW (i = 1, . . . , 5) was computed from the outputs of the UEMT. On the other hands, the positions of the corresponding points piCT (i = 1, . . . , 5) on CT images are manually identified by three engineers. The averaged positions are used as gold standard. Internal TRE at the l-th update 5 l i (Errl ) was calculated by Errl = 15 i=1 piCT − CT T p . In calculation of exW W ternal TRE, we used 18 fiducials allocated outside of the bronchus phantom. The points of these fiducials were also measured by the UEMT and were measured on CT images. Then, external TREs were calculated in same way. Internal TRE measurements were performed in four different environments shown in Table 1. Results are also presented in the same table. Figure 3 shows TREs measured in different experimental environments. Figure 4 shows external TREs measured in Environment 3 without compensation with changing N2 for fixed N1 = 200.
Bronchoscope Tracking Without Fiducial Markers
649
Fig. 2. Results of camera tracking. The top row shows the RB images at four positions of the phantom: the trachea, the left main bronchus, the right lower lobe bronchus, and the left lower lobe bronchus. The bottom row shows VB images generated using 40 , which is estimated by the proposed method (Tracker2 / transformation matrix CT W T Examination table / Output Compensation). Table 1. TREs for each experimental environments (Env.: Environment) Env.
Env.1 Env.2 Env.3 Env.4
4
Equipments
Min. Internal Max. Internal TRE (mm) TRE (l > 10) (mm) w/o Comp. w/ Comp. w/o Comp. w/ Com. Tracker1/Wood table 4.5 4.6 5.7 7.2 Tracker1/Examination table 4.4 6.1 6.9 8.3 Tracker2/Wood table 3.3 3.4 4.7 4.2 Tracker2/Examination table 3.9 4.0 5.2 5.2
Discussion
As shown in Fig. 2, VB images generated from the transformation matrix CT T40 W are quite similar to RB images. This figure shows the proposed method can properly register the coordinate systems of CT image and the UEMT. The initial T0 are about 15mm to 30.0mm. However, TREs of the transformation matrix CT W they gradually decrease as enough number of updates are performed (Fig. 3). For example, the TREs take minimum values of 3.3mm at l = 38 in Environment 3 and 3.8mm at l = 11 in Environment 4, respectively. This result is almost same or better than Schneider’s experiment [2]. Although Schneider’s method requires fiducial markers or reference positions to obtain the RB camera position, the proposed method requires no fiducial marker. This is great advantage of the proposed method, because it is quite difficult to place fiducial markers inside the bronchus. From Fig. 4, we can say that errors are relatively small for larger N2 . This is because we use global information for larger N2 . Since we tested the proposed method by only using a static phantom. it is required to
650
K. Mori et al. 8
35
Environment 1: Tracker1 / Wood Table TREs (mm)
25
Environment 2: Tracker1 / Examination Table 20
Environment 3: Tracker2 / Wood Table 15
Environment 4: Tracker2 / Examination Table 10
Registration error [mm]
7
No Compensation of UEMT Output 30
N2 = 140
6
N2 = 160
5 4 3
5
N2 = 190
2
N2 = 180
40
34 36 38
30 32
26 28
22 24
18 20
14 16
6
8 10 12
4
2
0
0
Number of Updates (l)
Fig. 3. Internal TREs of estimated
200
400
600
800
1000
1200
1400
The number of sensor outputs
CT W
Tl Fig. 4. External TREs of estimated with fixing N1 = 200 and changing N2
Fig. 5. Results of VB images generated by transformation matrix branching points when number of update l increases
CT W
CT W
Tl
Tl at different
evaluate by using dynamic phantom simulating breathing motion. Also, we need to investigate influence of deformation of the bronchus in future experiments. We obtained UEMT sensor output by inserting the RB camera in the order of trachea, the right main bronchus, the left main bronchus, the right upper lobe bronchus, the right lower lobe bronchus, and the left lower lobe bronchus. In T is calculated from only UEMT outputs Fig. 5, the transformation matrix CT W around the trachea and the right main bronchus for low l. Since the RB camera goes through the right main bronchus in later updates, VB images generated by the RB camera positions in the CT CS become very close to the RB images in later updates. As seen in Table 1, the proposed method worked well in several environments. Even in the environment using the examination table, the proposed method can track the RB camera in error of 5mm or less. Compensation of UEMT outputs
Bronchoscope Tracking Without Fiducial Markers
651
did not improve tracking performance. Since the proposed method registers measured points to the medial axis of the bronchus, the proposed method can correct sensing errors of the UEMT by geometric constraints (bronchial branches.) The compensation method affected worse. Tracking performance of Tracker2 was better than that of Tracker1. We assumed that bronchial branches are accurately segmented by Kitasaka’s Tl if exmethod. Therefore, the proposed method cannot estimate the correct CT W traction of bronchial branches fails. Also, the proposed method estimates transformation matrix by assuming that the RB camera moves on the medial line of the bronchi region or almost at the center of the bronchi region. Precise evaluation about this assumption should be performed. Also, the proposed method requires that a bronchoscope should be inserted into the right and left lungs. We think this requirement is also acceptable, since a bronchoscopist normally inserts a bronchoscope into both sides of the bronchus. Future work includes (a) precise analysis about the stability of the proposed method, (b) test on real patients, and (c) development of a precise validation method. Acknowledgments. The authors would like to thank Dr. Hiroshi Natori of Keiwakai Nishioka Hospital and Dr. Masaki Mori of Sapporo Kosei General Hospital for their advises. This work is supported by the program of the formation of innovation center for fusion of advanced technologies “Establishment of Early Preventing Medical Treatment Based on Medical Engineering for Analysis and Diagnosis” funded by the MEXT.
References 1. Mori, K., Deguchi, D., Sugiyama, J., et al.: Tracking of a bronchoscope using epipolar geometry analysis and intensity-based image registration of real and virtual endoscopic images. Medical Image Analysis 6, 321–336 (2002) 2. Schneider, A., Hautmann, H., Barfuss, H., et al.: Real-time image tracking of a flexible bronchoscope. In: Proc. of CARS 2004, vol. 1268, pp. 753–757 (2004) 3. Wegner, I., Vetter, M., Schoebinger, M., et al.: Development of a Navigation System for Endoluminal Brachytherapy in Human Lungs. In: Proc. of SPIE, vol. 6141, 614105-1 (2006) 4. Deligianni, F., Chung, A.J., Yang, G.Z.: Nonrigid 2-D/3-D Registration for Patient Specific Bronchoscopy Simulation With Statistical Shape Modeling: Phantom Validation. IEEE TMI 25(11), 1462–1471 (2006) 5. Kitasaka, T., Mori, K., Hasegawa, J., et al.: A Method for Extraction of Bronchus Regions from 3D Chest X-ray CT Images by Analyzing Structural Features of the Bronchus. FORMA 17(4), 321–338 (2002) 6. Nakada, K., Nakamoto, M., Sato, Y., et al.: A Rapid Method for Magnetic Tracker Calibration Using a Magneto-Optic Hybrid Tracker. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2879, pp. 285–293. Springer, Heidelberg (2003)
Online Estimation of the Target Registration Error for n-Ocular Optical Tracking Systems Tobias Sielhorst1 , Martin Bauer1 , Oliver Wenisch2 , Gudrun Klinker1 , and Nassir Navab1 1
Chair for Computer Aided Medical Procedures (CAMP), TU Munich, Germany {sielhors,bauerma,klinker,navab}@cs.tum.edu 2 Advanced Realtime Tracking GmbH, Weilheim, Germany [email protected]
Abstract. For current surgical navigation systems optical tracking is state of the art. The accuracy of these tracking systems is currently determined statically for the case of full visibility of all tracking targets. We propose a dynamic determination of the accuracy based on the visibility and geometry of the tracking setup. This real time estimation of accuracy has a multitude of applications. For multiple camera systems it allows reducing line of sight problems and guaranteeing a certain accuracy. The visualization of these accuracies allows surgeons to perform the procedures taking to the tracking accuracy into account. It also allows engineers to design tracking setups interactively guaranteeing a certain accuracy. Our model is an extension to the state of the art models of Fitzpatrick et al.[1] and Hoff et al. [2]. We model the error in the camera sensor plane. The error is propagated using the internal camera parameter, camera poses, tracking target poses, target geometry and marker visibility, in order to estimate the final accuracy of the tracked instrument.
1
Introduction
Intraoperative guidance systems (IGS) require the pose estimation of the patient and instruments. Most commercially available systems use optical tracking technology. The tracking system consists of n-ocular camera systems using linear CCD cameras or matrix cameras. The cameras detect the pose of the instruments by a rigidly attached set of fiducials. These fiducials are either light-emitting diodes or retroreflective objects reflecting light of flashes that are attached to the tracking cameras. In this paper we refer to a set of fiducials as tracking target. The accuracy of such a system depends of a number of variables that interact in a non obvious manner. The accuracy of the tracking system strongly depends on the geometry of the system. Also the measurement technology as the cameras, computer vision and optimization algorithms, light conditions, and physical properties of fiducials play an important role. The geometry includes the number of visible fiducials, the spatial arrangement of these fiducials, and the pose of the tracking cameras. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 652–659, 2007. c Springer-Verlag Berlin Heidelberg 2007
Online Estimation of the Target Registration Error
653
We model the tracking system error as follows: IPE. The image plane error (IPE) is the measurement error on the camera sensor. In our previous work we have shown that it is valid to assume that a variety of systematic and random errors can be modeled by a Gaussian error distribution in the image plane []. FLE. The fiducial location error (FLE) represents the three-dimensional error of a single fiducial. It depends on the IPE and the geometrical setup of the cameras. MTE. The mean target error (MTE) represents the spatial error of a rigid tracking target in the centroid of the fiducials. It comprises of rotational and translational parts. It depends on the FLE of each fiducial and the spatial arrangement of the tracking target. TRE. The target registration error (TRE) represents the actual error of a point in the tracking target coordinate system. The point of interest is generally not the centroid of the fiducial, but a point on the attached object. The error is amplified due to the rotational part of the MTE. The error of a system can be described as the root-mean-square (RMS) error, which is the standard deviation of a measurement. By definition the RMS cannot provide information if there is one direction with a higher variance than the others. Therefore it implicitly models isotropic and independent error. If the error is expected to be anisotropic it can be described with the covariance matrix of the distribution. These descriptions of the error implicitly model the error with a zero-mean Gaussian distributions. In current systems the accuracy of a tracking system is specified statistically by moving fiducials through the specified measurement volume and computing the root-mean-square error [3]. This error is propagated to the point of interest according to the spatial arrangement of the markers. The point of interest may be the tip of an instrument. Fitzpatrick et al. [1] describe a simple formula to predict the target registration error (TRE) from the fiducial location errors (FLE) given as a one-dimensional RMS. The RMS is provided by the tracking system manufacturer. However, it is known that due to the underlying geometrical situation, the tracking errors have non-uniform error distributions throughout their measurement volumes [4,1]. According to Khadem et al. [4] the errors of different commercial systems show a significant anisotropy. Hoff et al. [2] model the TRE with covariances. They measure the root-meansquare deviation of the FLE and propagate it. In our previous work we have proposed a model for the estimation of the fiducial location error from the camera geometry [5]. The FLE is modeled using covariances, since the error is anisotropic (see figure 1).
2
Methods
In our approach the error and its propagation is modeled as a Gaussian distribution. The error is defined by its expected value and its covariance. The error
654
T. Sielhorst et al.
propagation follows the formula cov(Ax) = Acov(x)AT for affine transformations A. The propagation of the expected value is straightforward. Thus we focus on propagating the covariances. In the case of a non-linear propagation of the T error we can use the first order approximation cov(f (x)) = Jf cov(x)Jf with Jf as the Jacobian of f at x, if the function is locally linear around x. The following steps have to be performed: – The estimation of IPE is described in section 2.1. We create a 2D covariance matrix for each camera. – The propagation from IPE to FLE is described in section 2.2. We create a 3D covariance matrix for each fiducial. – The propagation from FLE to MTE is described in section 2.3. We create a 6D covariance matrix for each target. – The propagation from MTE to TRE is computed with the error propagation formula for affine transformations. We create a 3D covariance matrix for the point of interest. 2.1
Measurement of the Image Error
The error of an optical tracking system have different sources: Internal calibration of cameras, imperfect lenses, computer vision algorithm, and image blur. All of these can be modeled as a 2D error in the plane of the camera sensor. In an approximation we regard the error to be constant on the sensor of the camera for a certain size of the fiducial. For the estimation of errors in the sensor plane we used covariance measurements of 3D fiducial in the tracking space. We calculated the noise in the sensor plane of each camera by finding the inverse propagation described in section 2.2 by a numerical optimization using the Levenberg-Marquardt method. The covariance has been constrained to be isotropic and uniform for a numerically stable solution. The measured standard 1 deviation in our setup is approximately 52 pixels for each fiducial. 2.2
From Image Error to Fiducial Error
We model this transformation as in our previous work [5] using the geometry of the cameras. If we use an n-ocular stereo system with pinhole cameras with intrinsic parameters Ki and extrinsic parameters Ti of the i-th camera detecting the same point x at the location u. We get the measurement function for the Triangulation, a set of nonlinear camera equations p: ⎛ ⎞ p: u1 = ρ11 K1 T1 x u .. .. using ρ ⎝ v ⎠ = KTx . . 1 un = ρ1n Kn Tn x as the projection function. ρi denotes the normalization factor needed for homogeneous coordinates. We use linear models throughout the error propagation, so we can safely assume a pinhole camera model here.
Online Estimation of the Target Registration Error
655
RMS mm 2 meter
meter 1.5 0.1
1 0.08
1
0.5 0.06
0
0
0.04
-0.5 0.02 -1
-1 -1
0
1
meter
-0.5
0
0.5
1
0
1.5 meter
Fig. 1. Predicted FLE accuracy of a two camera setup. Covariance ellipsoids (left) and RMS error (right). The RMS is close to constant while the covariances reveal significantly changing directions of the estimated errors. Coordinate systems depict the camera positions.
In order to compute the FLE, we build the Jacobian Jp = backward propagation formula. ⎛ ⎡ ⎤−1 ⎞+ 0 Σu1 ⎜ T⎢ ⎟ ⎥ .. Σx = ⎜ ⎦ Jp ⎟ . ⎝Jp ⎣ ⎠ 0 Σun
δp δx
and apply the
(1)
The resulting equations are analytically computed using a computer algebra system and exported to C code. 2.3
From Fiducial Location Error to Tracking Target Error
We assume that all coordinates and covariances are given in target coordinates, so [R|t] = I. Without loss of generality the origin is at the centroid of the fiducial positions. This function maps the six-dimensional marker target error [ΔR |Δt ] to the three-dimensional fiducial location error Δtc at the point q = (qx , qy , qz ). Δtc = ΔR q + Δt We linearize this function around the zero-mean error Δt : ⎤ ⎡ 1 0 0 0 qz −qy δ 0 qx ⎦ Jf (q) = Δt = ⎣ 0 1 0 −qz δ(Δt ) c Δt =0 0 0 1 qy −qx 0
(2)
(3)
We can now stack the equations for all fiducials together in a single matrix M = [Jf (q1 ), . . . , Jf (qn )]T
(4)
and apply the backward propagation formula: Jf −1 cov(x)JTf−1
(5)
656
T. Sielhorst et al.
which can be easier calculated as according to Hartley et al.[6]: −1 cov(f −1 (x)) = JTf cov(x)−1 Jf ⎛
⎡
⎜ T⎢ Σc = ⎜ ⎝M ⎣
0
Σp1 .. 0
.
⎞−1
⎤−1 ⎥ ⎦
⎟ M⎟ ⎠
Σpn
⎡ ⎢ = M+ ⎣
0
Σp1 .. 0
.
(6) ⎤ ⎥ ⎦ (M T )+
(7)
Σpn
where Σpi is the covariance of the i-th fiducial in target coordinates and Σc represents the covariance matrix of the MTE at the centroid. 2.4
Visualization Setup
The estimated error are visualized in the video and in the following images in an augmented reality system using a second, independent tracking system. We visualize only the tracking error. The overall application error could be estimated by additional error propagation using our model, but is not the scope of this work. Addition of errors and further propagation is well described in the work of Hoff et al.[2]. In the images the tracking error is visualized as the 95% confidence level of the estimated error amplified by 33.
3
Results
The error estimation in this paper is an extension of the estimation by Fitzpatrick et al. [1] and Hoff et al. [2]. Using the same assumptions as in their work, our model provides mathematically the same results. However, these assumptions are too strict according to our computations, because the FLE has a non uniform and anisotropic error. Figure 2 shows the behavior of the different error propagations. This figure depicts the expected error of a four camera setup where the fiducial error is nearly isotropic. Therefore the middle and the right image show almost the same expected error. The difference between the corresponding becomes obvious when the one or more fiducials or cameras are occluded (see section 3.3 and 3.2). 3.1
Inclusion of Visibility Data
Optical tracking systems have the inherent property, that the cameras need to see the fiducials to be able to compute a pose estimation. If this line of sight is blocked between one camera and a target the accuracy of measurement degrades. Figure 3 depicts the RMS in a plane of the same four camera setup using with one camera occluded in each image. The depicted plane is parallel to the plane including the four cameras at 1 meter distance. The error has been computed according to the proposed model in section 2.2. The images clearly show that assuming a constant RMS of fiducials in the volume is not sufficient. It also shows that the visibility plays an important role in the accuracy of each fiducial.
Online Estimation of the Target Registration Error
657
Fig. 2. Estimation of RMS at point of interest using target RMS [1] (left), estimation of covariance using fiducial RMS [2] (middle), estimation of covariance using fiducial covariance (right) - covariance of fiducial (a), covariance at point of interest (b), covariance at center of fiducials (c) RMS mm
meter 1.5
meter 1.5
meter 1.5
meter 1.5
1
1
1
1
0.5
0.5
0.5
0.5
0
0
0
0
-0.5
-0.5
-0.5
-0.5
0.1
0.08
0.06
0.04
0.02
-1 -0.5
0
0.5
1
1.5 meter
-1 -0.5
0
0.5
1
1.5 meter
-1 -0.5
0
0.5
1 1.5 meter
-1 -0.5
0
0.5
1 1.5 meter
0
Fig. 3. Occlusion of cameras changes the expected fiducial location error significantly: The coordinate systems depict the camera positions. The grey level indicates the estimated root mean square error of a fiducial in a plane of 1m distance. The asymmetry is due to asymmetric camera orientation.
3.2
Occlusion of Single Fiducials
Current navigation systems request full visibility of each fiducial in each camera for the maintenance of the required accuracy. Some tracking systems already provide error estimates based on the work of Fitzpatrick et al. [1]. Using our proposed method, we can now provide more accurate error estimates. We can show that in typical configurations, the expected error will be actually larger than expected, while the average error estimate over the whole tracking space still stays the same. Figure 5 shows the effect of occluding one fiducial on the estimation the instrument tip.
Fig. 4. Effects of occlusion: No occlusion (left), fiducial occlusion of four cameras (middle), several fiducials occluded (right), each circles depicts the visibility status of the fiducial in each of the four cameras
658
T. Sielhorst et al.
Fig. 5. Effects of camera occlusion: no camera occluded (left), one camera occluded (middle), two camera occluded (right)
3.3
Occlusion of Cameras
Occlusion of cameras is a problem that appears only in multi-camera setups. In current tracking systems in surgical navigation the pose estimation is stopped in case of camera occlusion. The occlusion of cameras has a significant influence on the tracking accuracy. (see figure 3 and 5). Tracking systems in commercial medical applications need to be accuracycertified. In order to achieve this certification, the manufacturer has to guarantee a certain level of accuracy. Right now, this is done by mounting the cameras in a mechanically rigid configuration and then by calibrating and measuring the error in a predefined working volume. For this reason, up to now all commercially used tracking systems use a configuration with two matrix cameras or three linear cameras. Using multi-camera setups would provide redundant cameras to avoid occlusions. Unfortunately, guaranteeing accuracy for flexible setups is a complex task without taking the visibility into account. The online error estimation presented in this work provides an important step towards accuracy-certified multi-camera setups. If we can guarantee a certain accuracy on the image plane this accuracy can be propagated to provide accuracy estimates at a point of interest. By providing many distributed tracking cameras in the operating room thanks to the proposed method one would be able to allow tracking and navigation even if one or more camera views are occluded. The navigation could continue if the system could guarantee the required level of accuracy.
4
Conclusion
We have proposed a novel way of estimating the error in optical tracking systems. We model the FLE and the TRE as anisotropic non-uniform error. We deduce it from the error in the tracking camera plane. The error is computed online based on the visibility of each fiducial. The effects can be appreciated best in the supplementary material. The interactive propagation of errors in real time based on the geometry allows for a multitude of applications. Engineers designing tracking systems [7] have the opportunity to get direct feedback regarding of the final accuracy. Physicians may benefit from the online generated information in intraoperative visualization.
Online Estimation of the Target Registration Error
659
The accuracy can be visualized in the navigation rather than current approach of not visualizing any pose information in case of an accuracy below a certain threshold. The online estimation of error based on visibility of fiducials allows for creating a multiple camera tracking setup for surgical navigation. In order to guarantee a certain accuracy, which is necessary for navigation systems, we can compute the expected error based on the visibility. The proposed method allows on-line estimation of the final accuracy while one or more fiducials or camera views are occluded. This could allow the computer aided solutions to gain considerable flexibility. They could in fact continue functioning as far as such occlusions do no cause the level of accuracy to decrease below the acceptable limits.
5
Future Work
The model of the error estimation has been improved. However. there are some approximations that we are going to investigate. We estimate the accuracy in the image plane to be constant for a certain fiducial size. The accuracy depends on the fiducial size in the image. We have to investigate whether this approximation is valid for arbitrary setups and how we can integrate the fiducial size into our model. Furthermore we model the computation of the target pose as a least square optimization of 3D fiducials while the fiducials positions are triangulated from the camera. This linear model is not used in tracking system software, but a non-linear optimization of a reprojection of the target pose in the image. This allows for a special weighting favoring rotational values for translational ones. We are going to investigate these effects in our further research.
References 1. Fitzpatrick, J.M., West, J.B., Maurer Jr, C.R.: Predicting error in rigid-body pointbased registration. IEEE Trans. Med. Imag. 14, 694–702 (1998) 2. Hoff, W.A., Vincent, T.L.: Analysis of head pose accuracy in augmented reality. IEEE Trans. Visualization and Computer Graphics 6 (2000) 3. Wiles, A., Thompson, D., Frantz, D.: Accuracy assessment and interpretation for optical tracking systems. SPIE Medical Imaging 5367 (2004) 4. Khadem, R., Yeh, C.C., Sadeghi-Tehrani, M., Bax, M.R., Johnson, J.A., Welch, J.N., Wilkinson, E.P., Shahidi, R.: Comparative tracking error analysis of five different optical tracking systems. Computer Aided Surgery 5, 98–107 (2000) 5. Bauer, M., Schlegel, M., Pustka, D., Navab, N., Klinker, G.: Predicting and estimating the accuracy of vision-based optical tracking systems. In: Proc. IEEE and ACM Int’l. Symp. on Mixed and Augmented Reality (ISMAR), Santa Barbara (CA), USA, pp. 43–51. IEEE Computer Society Press, Los Alamitos (2006) 6. Hartley, R.I.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000) 7. Davis, L., Clarkson, E., Rolland, J.: Predicting accuracy in pose estimation for marker-based tracking, pp. 28–35 (2003)
Assessment of Perceptual Quality for Gaze-Contingent Motion Stabilization in Robotic Assisted Minimally Invasive Surgery George P. Mylonas, Danail Stoyanov, Ara Darzi, and Guang-Zhong Yang Institute of Biomedical Engineering and Department of Computing Imperial College London, London, United Kingdom {george.mylonas, danail.stoyanov, a.darzi, g.z.yang}@imperial.ac.uk
Abstract. With the increasing sophistication of surgical robots, the use of motion stabilisation for enhancing the performance of micro-surgical tasks is an actively pursued research topic. The use of mechanical stabilisation devices has certain advantages, in terms of both simplicity and consistency. The technique, however, can complicate the existing surgical workflow and interfere with an already crowded MIS operated cavity. With the advent of reliable vision-based real-time and in situ in vivo techniques on 3D-deformation recovery, current effort is being directed towards the use of optical based techniques for achieving adaptive motion stabilisation. The purpose of this paper is to assess the effect of virtual stabilization on foveal/parafoveal vision during robotic assisted MIS. Detailed psychovisual experiments have been performed. Results show that stabilisation of the whole visual field is not necessary and it is sufficient to perform accurate motion tracking and deformation compensation within a relatively small area that is directly under foveal vision. The results have also confirmed that under the current motion stabilisation regime, the deformation of the periphery does not affect the visual acuity and there is no indication of the deformation velocity of the periphery affecting foveal sensitivity. These findings are expected to have a direct implication on the future design of visual stabilisation methods for robotic assisted MIS.
1 Introduction One of the main challenges of robotic assisted Minimally Invasive Surgery (MIS) is to perform Totally Endoscopic Coronary Artery Bypass (TECAB) grafting on a beating heart. The complexity of such a delicate task is complicated by the destabilization of the heart due to cardiac and respiratory motion, compounded by a high degree of visual magnification through the use of immersive optics. This significantly affects precise hand-eye coordination and tissue-instrument interaction. In current surgical practices, it is common to use epicardial mechanical stabilizers to dampen the cardiac motion. However, the residual deformation may still be large enough to hinder tasks such as small vessel anastomosis [1][2]. To overcome this problem, a number of techniques have been proposed for introducing virtual N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 660–667, 2007. © Springer-Verlag Berlin Heidelberg 2007
Assessment of Perceptual Quality for Gaze-Contingent Motion Stabilization
661
stabilisation to the surgical scene based on soft-tissue deformation tracking and image warping [3][4][5][6]. The application of these techniques to real-time in situ, in vivo settings, however, still remains a challenge due to the morphological complexity of the tissue. Furthermore, no detailed psychovisual experiments have been performed to assess the effect of virtual stabilization on foveal/parafoveal vision and general visual acuity during MIS. The purpose of this paper is to evaluate the hypothesis of whether foveal motion stabilisation is sufficient for robotic assisted MIS and to examine the effect of peripheral motion on visual acuity. Detailed psychovisual experiments have shown that stabilisation of the whole visual field is not necessary for MIS, and it is sufficient to perform accurate motion tracking and deformation compensation within a relatively small area that is directly under foveal vision. Simple rigid body motion of the camera can therefore be used to provide a perceptually stable operating field-ofview. This also avoids the use of large area 3D tissue deformation recovery, which tends to be error prone and limited by the paucity of reliable anatomical landmarks. The finding is expected to underpin the synergistic use of computer vision based feature tracking and deformation recovery combined with real-time gaze tracking for robotic assisted MIS. Given the complexity of robotic control in surgical environments, this is expected to facilitate the effective hand-eye coordination for improved surgical performance. This research further extends the current work on real-time eye tracking and saccadic eye movement analysis for investigating gaze contingent approaches for robotic control in surgery [7][8].
2 Methods 2.1 Visual Acuity with Foveal and Parafoveal Stabilization In order to simulate gaze-contingent motion stabilization, it is necessary to obtain a robust motion map around the fixation point. To this end, feature-based tracking using stereo laparoscopic images recorded from a daVinci robot were used in this study. Gradient-based landmarks using the Shi and Tomasi operator [11] were implemented for performing feature tracking using a variant of the Lucas-Kanade tracker. The method minimises the squared residual error ε between feature location x with feature template Tn for the left and right images I n [4] 2
ε=
∑ ∑ [ I n ( Wn ( x; p ) ) − Tn
(
x ) ]2
(1)
n =1 x
where p = ⎡⎢X Y Z ⎤⎥ and Wn is the warp function. The camera was calibrated ⎣ ⎦ with one of the standard calibration methods available [13]. The 3D coordinates of the points recovered from the calibrated camera matrices Pn = Kn [ Rn | tn ] describe the internal and external optics of the cameras. They were used directly to parameterize the translational warping function. To obtain a dense motion map, it is possible to combine feature detectors as shown in [4] and this approach can also be used to identify landmarks that are less prone to view dependent reflections. For performance considerations, however, only image derivative-based operators were T
662
G.P. Mylonas et al.
used and a threshold filter was applied to the correlation measurement during the tracking process, combined with the epipolar constraint. The features tracked with the above technique were used as the control points in the subsequent Radial Basis Function (RBF) texture warping. To stabilize the deforming tissue, the video texture was first mapped to a grid of regularly spaced vertices and the grid vertices were interpolated and warped according to RBF so that the target tissue area appeared static. In order to assess the visual acuity with and without foveal and parafoveal stabilization, a pre-recorded video clip of a TECAB surgery with the daVinci surgical robot was used. A total of 13 fixation points (each lasted for 4s) were created to direct the observers’ foveal vision. Motion compensation was then applied to each of the fixation points in sequence for cancelling out tissue deformation. In order to maintain strict visual fidelity of the foveal area, only rigid body motion cancellation was applied. This avoided the use of any image warping that could inadvertently introduce visual artefacts. Since the foveal area only corresponds to an area of 2° visual angle [9], this way of linear motion cancellation is generally sufficient for most in vivo applications. However, the parafoveal area that surrounds the fovea is usually large and non-linear motion compensation must be applied in order to stabilise the tissue motion in this region. To this end, sparse features were first tracked by using the aforementioned stereo tracking algorithm. They were subsequently used as the control points for RBF-based warping. The outer region surrounding the parafoveal area corresponds to the visual periphery. Controlled motion compensation, in terms of tracked sparse features, was not performed within this area. Only a global free warping was applied as the result of rigid transformation of the parafoveal boundary control points with respect to the stationary image boundary. It should be noted that the boundary points defining the fovea and parafovea were also considered as warping control points. In this way, smooth transitions were achieved from the area of translational motion cancellation (fovea), to the area of dampened motion with controlled warping (parafovea), to free transformation (far periphery). For experimental validation, 13 subjects were asked to foveate to the suggested fixation points on the screen with and without the above motion stabilisation scheme being introduced. The videos of the stabilized and the non-stabilized tissue were played back-to-back in random succession. At random intervals and for a period of 8 video frames (320ms), a total of 16 epicardial stimuli were randomly introduced. For the stabilized tissue, the stimulus remained within the foveal region for the entire 320ms period. For the non-stabilized tissue, the stimulus may or may not remain within the fovea region depending on the amplitude and the velocity of the tissue deformation when the stimulus was first introduced. The observers were instructed to signal (press the spacebar) if they spotted a stimulus appearing in or close to the foveal window. Fig. 1 schematically defines the foveal, parafoveal, and peripheral regions used for motion cancellation and image warping. It also illustrates the visual stimuli introduced in the foveal window delineated in dotted circles that correspond to the foveal window. To analyse the underlying visual behaviour of the subjects and to ensure that their actual fixation points corresponded to those suggested, gaze tracking was performed using a Tobii ET-1750 eye-tracker. This is an infrared video-based binocular eyetracking system recording the position of gaze in the work plane (screen) at up to
Assessment of Perceptual Quality for Gaze-Contingent Motion Stabilization
663
50Hz, with an accuracy of 1° visual angle [10]. Fixations from the raw eye gaze data were identified and keyboard events were obtained and time-stamped using the Clearview software (Tobii Technology, Sweden). The minimum fixation duration filter was set to 100ms. Receiving Operating Characteristic (ROC) analysis was used to analyse the effect of the tissue stabilization in the fovea and the performance of the subjects towards identifying the introduced stimuli.
Fig. 1. (left) Illustration of the foveal, parafoveal, and peripheral regions used for translational, controlled warping and free transformation applied to the proposed motion cancellation scheme. (right) Example fovea regions with (bottom) and without (top) stimuli being introduced.
2.2 The Effect of Peripheral Motion on Visual Sensitivity In the previous experiment, the peripheral vision was not stabilised. In order to assess the effect of peripheral motion on visual sensitivity, a further experiment was conducted with controlled velocity of peripheral motion. Images from a robotic assisted TECAB procedure were warped to simulate different amount of respiratory and cardiac induced tissue deformation. For simplicity, the respiratory component was assumed to only cause a rigid translation. The cardiac component was represented by a Gaussian mixture model. The two deformations were combined linearly with different weighing factors. The equations for the deformation applied to a vertex at time T are:
Vt +1
⎡ s −Wxd (Vx −C x ) ⎤ ⎡W rT ⎤ ⎢Wx e ⎥ ⎢ x x⎥ ⎢ ⎥ d ⎢ r ⎥ s −Wy (Vy −C y ) ⎥ ⎢ = V0 + sin(φ) ⎢Wy e ⎥ + sin(ϕ) ⎢⎢Wy Ty ⎥⎥ ⎢ s −W d (V −C ) ⎥ ⎢W rT ⎥ ⎢Wz e z z z ⎥ ⎢⎣ z z ⎥⎦ ⎢⎣ ⎥⎦
(2)
in which the first sinusoidal product represents the cardiac deformation and the second one a time-varying translational motion. In the above equation, Vo represents the original vertex of the surface and C is the centre of deformation. The sinusoidal terms specify the frequency of oscillation in such a way that a full sinusoidal cycle is completed in N steps. A looped-over video from the sequence of N images was
664
G.P. Mylonas et al.
created and used for assessing the effect of peripheral motion on visual acuity. The parameters used include:
C x = 400, C y = 300, Wxs = Wys = 0 Wzs = 2.0 , Wzd = −0.07 r Wx = Wyr = Wzr = 0 , N = 25 The above parameters are designed to eliminate the influence of breathing which is reasonable due to its relatively slow evolution. The resulting warped video effectively stabilises the central area with increasing deformation amplitude towards the periphery. Similar to the previous experiment, 30 suggested fixation points were prescribed and each fixation lasted for 4s. The fixations were presented in random order in such a way that 10 were displayed in the apparently stationary central area of the video and the remaining 20 in the deforming periphery. Profile 1 Profile 2 Profile 3 Profile 4 Profile 5
Normalized Velocity Profiles 1.2
Velocity Amplitude .
1 0.8 0.6 0.4 0.2 0 0
1
2
3
4
5
6
7
8
9
Video Frames (25/sec)
Fig. 2. (left) The prescribed peripheral motion for assessing the effect of different peripheral motion velocity on visual acuity, where three instants over a half sinusoidal cycle are depicted (~12 video frames at 25fps). (right) Five peripheral velocity profiles over which the central fovea stimuli were introduced.
Ten subjects were involved in this study and asked to foveate to each of the suggested fixation points. All subjects were eye-tracked as in the previous experiment. At random intervals a total of 20 visual stimuli were introduced inside some of the 30 parafoveal areas, with one stimulus per simulated fixation. The study was designed in such a way that the central 10 parafoveal areas are presented 5 times with a stimulus and 5 times without. Also, care was taken so that the stimuli appear under different peripheral deformation velocities. A total of five velocity profiles were used as shown in Fig. 2. Considering the apparent size of the video frame and the viewing distance maintained, the amplitude of deformation of a feature point in the far periphery over a full sinusoidal cycle was approximately 3.8o of visual angle. Similarly to the previous study, the subjects were asked to press the spacebar on a computer keyboard when they saw a stimulus. In this study, only the data concerned with the stabilized central area are analyzed. The simulated fixations introduced in the deforming periphery are merely used as distracters. The effect of the deforming periphery in the stabilized fovea and parafovea was analysed for assessing the performance of the subjects towards identifying the introduced stimuli.
Assessment of Perceptual Quality for Gaze-Contingent Motion Stabilization
665
3 Results For the experiment on foveal and parafoveal stabilisation, Fig. 3 depicts the specificity against the sensitivity data for the subjects observing the stabilized and the non-stabilized tissues. It is evident that motion stabilisation improves the foveal sensitivity significantly, which confirms that stabilization facilitates the identification of visual stimuli. On the other hand, it is also evident from the figure that stabilization didn’t improve the specificity greatly, which was already relatively high for most subjects. This indicates that the subjects were generally competent in confirming the absence of a stimulus. Data from Subject 12 was discarded as he failed to adhere to the experiment protocol. Fig. 3 also shows a comparison of each subject’s number of fixations for the stabilized and non-stabilized experiments. The total number of fixations for all the subjects studied is summarised in Table I, in which the average duration and standard deviation are also provided. It is clear that performing the same task on the non-stabilized tissue requires a considerable amount of short fixations. Data from Subject 10 was discarded because eyetracking was not perfectly stable. Stabilized Tissue
Subject 13
Subject 9
Subject 10
Subject 8
Subject 7
Subject 6
Subject 5
Subject 4
Subject 3
Subject 1
Subject 13
Subject 11
Subject 9
Subject 10
0
Subject 8
0.2
0
Subject 7
0.2
Subject 6
0.4
Subject 5
0.6
0.4
Subject 4
0.6
Subject 3
1
0.8
Subject 2
1
0.8
Subject 1
sensitivity specificity
1.2
Subject 11
sensitivity specificity
Subject 2
Non-stabilized Tissue 1.2
T o ta l n u m b e r o f n u m b er o f p er fo rm e d fix a tio n s
80
Stabilized tissue Non-stabilized tissue
70 60 50 40 30 20 10 0 Subject 1
Subject 2
Subject 3
Subject 4
Subject 5
Subject 6
Subject 7
Subject 8
Subject 9
Subject 11 Subject 13
Fig. 3. (top) Sensitivity vs specificity of the subjects in identifying the presence/absence of the visual stimuli under non-stabilized (left) and stabilized (right) views. (bottom) The number of fixations for the stabilized and non-stabilized tissue experiments.
Similar analysis was performed for the second experiment concerning the sensitivity in the fovea and the parafovea as a function of the peripheral deformation velocity. The results are shown in Table II, which clearly demonstrates that all the subjects were consistent in identifying the introduced stimuli irrespective of the deformation velocity of the periphery.
666
G.P. Mylonas et al. Table I. Fixations statistics for all subjects over the stabilized and non-stabilized tissues
Number of fixations Average duration (ms) Standard Deviation
Stabilized tissue 258 2240.99 476.37
Non-stabilized tissue 381 1593.66 511.49
Table II. The score of all subjects for the visual sensitivity study on a stabilized fovea and parafovea during peripheral warping
Subject
1 2 3 4 5 6 7 8 9 10 Totals
True Positive 5 5 5 4 5 4 5 5 4 4 46
False Positive 0 0 1 0 0 0 0 0 0 0 1
True Negative 5 5 4 5 5 5 5 5 5 5 49
False Negative 0 0 0 1 0 1 0 0 1 1 4
Sensitivity
Specificity
1 1 1 0.8 1 0.8 1 1 0.8 0.8 0.92
1 1 0.8 1 1 1 1 1 1 1 0.98
4 Discussion and Conclusions In this study, we have assessed the effect of motion stabilisation in robotic assisted MIS. The results have shown that gaze-contingent soft-tissue stabilization can significantly increase the visual acuity and the method is relatively immune to peripheral motion. It is very interesting to note the reduced number of required fixations and their duration increase by almost 30% when operating on a virtually stabilized tissue. Research in psychophysiology has shown that an increase in the fixation duration is more efficient for performance improvement rather then darting around the visual field [12]. This is particularly relevant for identifying subtle, transient events. The findings have obvious consequences on performing microsurgical tasks such as small vessel anastomosis. This study has also confirmed that under the current motion stabilisation regime, the deformation of the periphery does not affect the visual sensitivity of the stabilized foveal and parafoveal regions. Furthermore, there is no indication of the velocity of the deformation affecting the foveal sensitivity. These results should have a direct implication on the future design of effective visual stabilisation methods in robotic assisted MIS. As a final point, we should mention that the above study does not consider the presence of surgical tools in the visual field. Under the virtual stabilization framework this would require accurate 3D instrument segmentation and AR rendering which is outside the scope of a study on perceptual quality like the one presented here. Acknowledgements. the authors would like to thank Fani Deligianni, Marios Nicolaou and Adam James for their help with this study.
Assessment of Perceptual Quality for Gaze-Contingent Motion Stabilization
667
References 1. Wimmer-Greinecker, G., Deschka, H., Aybek, T., Mierdl, S., Moritz, A., Dogan, S.: Current status of robotically assisted coronary revascularization. Am. J. Surg. 188(Suppl.4A), 76S–82S (2004) 2. Purkayastha, S., Athanasiou, T., Casula, R., Darzi, A.: Robotic surgery: a review. Hospital Medical 65(3), 153–159 (2004) 3. Cuvillon, L., Gangloff, J., de Mathelin, M., Forgione, A.: Toward Robotized Beating Heart TECABG: Assessment of the Heart Dynamics Using High-Speed Vision. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 551–558. Springer, Heidelberg (2005) 4. Stoyanov, D., Mylonas, G.P., Deligianni, F., Darzi, A., Yang, G.Z.: Soft-Tissue Motion Tracking in Robotic MIS Procedures. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 26–29. Springer, Heidelberg (2005) 5. Stoyanov, D., Darzi, A., Yang, G.-Z.: A practical approach towards accurate dense 3D depth recovery for robotic laparoscopic surgery. Comp. Aided Sur. 10(4), 199–208 (2005) 6. Stoyanov, D., Darzi, A., Yang, G.-Z.: Dense Depth Recovery for Robotic Assisted Laparoscopic Surgery. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 41–48. Springer, Heidelberg (2004) 7. Mylonas, G.P., Stoyanov, D., Deligianni, F., Darzi, A., Yang, G.-Z.: Gaze-Contingent Soft Tissue Deformation Tracking for Minimally Invasive Robotic SUrgery. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 843–850. Springer, Heidelberg (2005) 8. Mylonas, G.P., Darzi, A., Yang, G.-Z.: Gaze Contingent Depth Recovery and Motion Stabilisation for Minimally Invasive Robotic Surgery. In: Yang, G.-Z., Jiang, T. (eds.) MIAR 2004. LNCS, vol. 3150, pp. 311–319. Springer, Heidelberg (2004) 9. Yang, G.-Z., Dempere-Marco, L., Hu, X.-P., Rowe, A.: Visual search: psychophysical models and practical applications. Image and Vision Computing 20, 291–305 (2002) 10. Tobii technology. User Manual (2003), http://www.tobii.se 11. Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994) 12. Togami, H.: Effects on visual search performance of individual differences in fixation time and number of fixations. Ergonomics 27, 789–799 (1984) 13. Tsai, R.: A versatile camera calibration technique for high accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robotics Automation 3(4), 244–323 (1987)
Prediction of Respiratory Motion with Wavelet-Based Multiscale Autoregression Floris Ernst, Alexander Schlaefer, and Achim Schweikard Institute of Robotics and Cognitive Systems, University of L¨ ubeck, DE {ernst,schlaefer,schweikard}@rob.uni-luebeck.de
Abstract. In robotic radiosurgery, a photon beam source, moved by a robot arm, is used to ablate tumors. The accuracy of the treatment can be improved by predicting respiratory motion to compensate for system delay. We consider a wavelet-based multiscale autoregressive prediction method. The algorithm is extended by introducing a new exponential averaging parameter and the use of the Moore-Penrose pseudo inverse to cope with long-term signal dependencies and system matrix irregularity, respectively. In test cases, this new algorithm outperforms normalized LMS predictors by as much as 50%. With real patient data, we achieve an improvement of around 5 to 10%.
1
Introduction
To successfully ablate tumors in the body stem using radiosurgery, it is necessary to move the beam source to compensate for the motion of internal organs. This is achieved by recording the motion of the body surface and drawing conclusions about the tumor position [1], [2]. While this method significantly increases the targeting accuracy, the system delay arising from data acquisition and processing and positioning of the treatment beam results in a systematic error. This error can be decreased by predicting the time series of human respiration [3]. We propose to use the wavelet-based multiscale autoregression prediction method introduced in [4] and extend it by introducing an averaging parameter to capture long-term signal dependencies.
2
Prediction Algorithms
Let y be the signal we want to predict. Since the respiration time series can be approximated by sinusoidal models, we assume an autoregressive (AR) property: Definition 1. Let y be a uniformly sampled signal. We say that y is an AR(M )process if there is an integer M such that, when the last M values yn−M+1 , . . . , yn of the signal y are known, for each positive integer k there are weights w = T (w1 , . . . , wM ) such that for all m ≥ n + k, ym = wT um−k . Here, um−k = T (ym−k−M+1 , . . . , ym−k ) . N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 668–675, 2007. c Springer-Verlag Berlin Heidelberg 2007
Prediction of Respiratory Motion
669
If breathing motion was perfectly sinusoidal and the values of the weights were known, it would be possible to exactly predict the time series. Since real data is not perfect, we cannot expect to obtain perfect prediction. At best, we can model the current, underlying sinusoidals and automatically adapt the weights to compensate for changes in breathing behavior. 2.1
The LMS Algorithm
One way of learning the weights of an AR(M )-process is the LMS algorithm [5], which has been used for a long time in time series prediction and signal analysis. LMS = wnT un , yˆn+k
wn+1 = wn + μ (ˆ ynLMS − yn ) un , T
n ≥ k,
T
un = (un,1 , . . . , un,M ) = (yn−M+1 , . . . , yn ) , T yˆ1 = · · · = yˆkLMS = y1 , w1 = (0, . . . , 0, 1) LMS
(1)
yn is the measured signal at step n, wn the corresponding weight vector and un is the signal history used in step n to compute yˆn+k , the prediction for step n+k. For stationary signals, the LMS algorithm is known to perfectly adapt to the system if the learning parameter μ is chosen properly. Difficulties arise from the fact that in the update term for the weight vector w large signal values lead to a larger correction term, i.e. when provided with two differently scaled versions of a signal, the algorithm produces different results. This problem is somewhat alleviated by using a normalized LMS algorithm (nLMS). 2.2
The nLMS Algorithms
To improve the convergence properties of the LMS algorithm (i.e. to make it independent from scaling and increase the rate of convergence), normalized LMS algorithms are used (see [6]). nLMS
T
T
T un , un = (un,1 , . . . , un,M ) = (yn−M+1 , . . . , yn ) , yˆn+k p = wp,n ynnLMSp − yn ) fp,n , wp,n+1 = wp,n + μ (ˆ nLMSp
yˆ1
= · · · = yˆk
nLMSp
= y1 ,
T
wp,1 = (0, . . . , 0, 1)
(2)
Here, the error correction term fp,n is defined as follows: (fp,n )i =
|un,i |p−1 sgn (un,i ) , un pp
(f∞,n )i =
δi,l , un,l
l = max ind|un,j | j=1,...,M
(3)
In our case, we only considered the special case of p = 2 (i.e. normalization with respect to the Euclidean norm), whence the algorithm reduces to the following: nLMS2 T yˆn+k = w2,n un
w2,n+1 = w2,n + μ (ˆ ynnLMS2 − yn ) f2,n = w2,n + μ (ˆ ynnLMS2 − yn )
un (4) α + un 22
To avoid division by zero, a small parameter α (typically 10−4 ) is introduced in the denominator of the error term f2,n (Equation 4). Further improvements are possible by preprocessing the signal. This is done with the wavelet based LMS algorithm.
670
2.3
F. Ernst, A. Schlaefer, and A. Schweikard
The wLMS Algorithm
Renaud, Starck and Murtagh in [4] propose to predict an autoregressive signal using its ´a trous wavelet decomposition. It is computed in a sequential way (Equation 5); therefore, on-line processing of the signal is possible. 1 j + cj,n , c0,n = yn , cj+1,n = c Wj+1,n = cj,n − cj+1,n (5) 2 j,n−2 Computing this decomposition up level J, we get J + 1 bands representing the signal: W1 , . . . , WJ , the wavelet scales, and cJ , the smoothed signal, such that yn = W1,n + · · · + WJ,n + cJ,n . Hence, it is possible to separately predict the individual bands. This is done by selecting regression depths aj for each level Wj and aJ+1 for the band cJ . Assuming the AR(2aj )-property on the individual bands and knowledge of the weight vectors wj , we get the multiscale autoregression (MAR) forecasting formula: MAR = yˆn+k
J
T ˜ n,j + wJ+1 wjT W c˜n
j=1
˜ n,j = Wj,n−2j ·0 , Wj,n−2j ·1 , . . . , Wj,n−2j (a −1) T W j T c˜n = cJ,n−2J ·0 , cJ,n−2J ·1 , . . . , cJ,n−2J (aJ+1 −1)
(6)
˜ n,j and c˜n,j take the role of the vector un (the signal history). Here, the vectors W However, the correct values for the weight vectors wj are not known and need to be adaptively learned. This is done by least mean squares fitting to the last M signal steps, completing the description of the wLMS algorithm: T T ˜ T ,...,W ˜ T , c˜T B = (ln−k , . . . , ln−k−M+1 ) , lt = W t,1 t,J t T T T w = w1T , . . . , wJ+1 , sn = (yn , . . . , yn−M+1 ) solve Bw = sn (by normal equation) to update w (7) We extend this approach: Since the matrix of the normal equation used in (7) might not be regular, we have improved the algorithm to cope with irregularity. −1 This is done by replacing (B T B) B T by the Moore-Penrose pseudo inverse of B. This does not alter the results whenever the rank of B is maximal, it only improves numerical stability. Obviously, for each point in time there is a maximum number of past observations which influence the prediction. However, it is ˜ n,j possible that information not contained in the signal history (the vectors W and c˜n,j ) still influences the future of the time series. To include this information in the prediction, we introduce an exponential averaging parameter μ and modify the method as follows: wLMS = yˆn+k
J
T T ˜ n,j + wJ+1 W wn,j c˜
j=1
T T wn = wn,1 , , . . . , wn,J+1 wn+1 = (1 − μ)wn + μB + sn ,
w1 = · · · = wk+M−1 = (0, . . . , 0)
T
μ ∈ [0, 1],
n≥k+M
(8)
Prediction of Respiratory Motion
671
Here, the symbol + denotes the Moore-Penrose pseudo inverse of a matrix. Small values of μ correspond to a prediction with high confidence in the past while large values correspond to high confidence in recent observations. Note that with the introduction of μ there is no explicit signal history length and that setting μ = 1 will yield the results of the original MAR algorithm.
3
Experiments
The algorithms presented above were applied to several simulated and real respiratory signals to compare their performance. For prediction tests, the algorithms were allowed to stabilize on the first 2,000 sampling points and the prediction result was evaluated on the remaining data, both in absolute numbers [μm] and as a percentage of the nRMS error of the delayed curve. 3.1
Simulated Data
Sinusoidal signals were used to simulate breathing data. The signals are shown in Figure 1. Part (e) was generated from two breathing cycles recorded during a R treatment session. This raw data was then smoothed and repeated. CyberKnife All signals are sampled at 40 Hz and have a length of 10,000 sampling points. In a second step, to simulate real-world data, the signals from Figure 1 were corrupted by additive Gaussian noise of four different amplitudes (with standard deviations of 0.05, 0.10, 0.20 and 0.50 mm). For the wLMS algorithm, M was set to 2 and J was set to 7. The values of the aj were automatically determined: the ´a trous wavelet decomposition of the first 2,000 points was computed and then the coefficients were set according to Equation 9. WjT Wj , aJ+1 = 2 aj = 15 (9) (y − cJ )T (y − cJ ) In case of the uncorrupted signals, for both the LMS and the nLMS2 algorithms, M was set to 6; in case of the noisy signals, M was set to 8. Furthermore, to take advantage of the noise filtering ability of the wLMS algorithm, the coefficients
amplitude [mm]
1 0 −1
0
1 0 −1
5 10 15 20 25 30 35 time [s] (d) − (b) plus linear trend of 9 mm in 250 s
2
5
10
15 20 time [s]
25
30
35
(c) − two frequencies with different phases amplitude [mm]
0 −1 0
amplitude [mm]
(b) − three frequencies amplitude [mm]
amplitude [mm]
(a) − pure sinusoidal signal 1
0
5
0
5
10 15 20 25 30 time [s] (e) − simulated breathing signal
35
2 0 −2 −4 10
15 20 time [s]
25
30
Fig. 1. Simulated data
35
1 0 −1 0
5
10
15 20 time [s]
25
30
35
672
F. Ernst, A. Schlaefer, and A. Schweikard Table 1. nRMS errors on simulated data (optimal values are in bold print)
sig. delay
(a)
(b)
(c)
(d)
(e)
5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20
abs. 0.04 81.84 26.05 0.00 28.05 114.80 216.32 304.25 29.42 65.66 232.36 301.93 92.98 230.80 395.33 521.65 148.84 421.10 687.55 919.61
LMS rel. 0.03 30.88 6.60 0.00 21.05 43.26 54.75 58.36 21.79 24.43 58.06 57.16 69.73 86.93 100.00 100.00 30.52 43.70 48.24 49.22
μ 0.0872 0.0424 0.0260 0.0340 0.0400 0.0184 0.0128 0.0124 0.0388 0.0416 0.0108 0.0132 0.0026 0.0012 0.0000 0.0000 0.0038 0.0016 0.0012 0.0012
nLMS2 wLMS abs. rel. μ abs. [μm] rel. [%] μ 0 0 0.1472 0 0 0.0632 0 0 0.1152 0 0 0.0272 0.00 0.00 0.0960 0 0 0.0552 0 0 0.0592 0 0 0.0264 4.05 3.04 0.0536 2.92 2.19 1.0000 15.42 5.81 0.0336 10.47 3.94 1.0000 36.43 9.22 0.0248 22.29 5.64 1.0000 67.61 12.97 0.0172 38.12 7.31 1.0000 3.95 2.92 0.0488 2.87 2.12 1.0000 14.33 5.33 0.0272 10.46 3.89 1.0000 32.28 8.06 0.0216 22.70 5.67 1.0000 57.45 10.87 0.0192 39.44 7.47 1.0000 29.61 22.21 0.1344 13.96 10.47 1.0000 147.41 55.52 0.0632 51.52 19.40 1.0000 380.35 96.21 0.0056 112.42 28.44 1.0000 520.79 99.84 0.0002 195.42 37.46 1.0000 121.96 25.01 0.0160 99.90 20.48 0.4360 255.05 26.47 0.0136 213.76 22.18 0.3472 395.08 27.72 0.0120 336.87 23.63 0.3128 550.66 29.48 0.0108 258.25 13.82 0.0024
a1 and a2 were, in this case, chosen to be zero. The predictors were tested with μ from 0 to 1 (in steps of 0.0001) and the best result was selected. The absolute [μm] and relative [%] nRMS errors on the signals without noise are shown in Table 1. The computations with a prediction horizon of 5 time steps were repeated on the signals corrupted by noise. The predicted curves were compared to the uncorrupted signals. Noise amplitude at level 1 is lower than at level 2 and so on (see Table 2). Thirdly, the computations for the wLMS algorithm were repeated after setting coefficients a1 through a4 to zero (See Table 3). In all but four cases the wLMS algorithm was able to outperform the nLMS2 algorithm by up to 50%. As a drawback, we see that discarding the first four bands of the wavelet coefficients can lead to a loss of information, since high-frequency parts of the real signal can be present in these bands. This can be seen in Table 3: the nRMS value for noise level 1 of signal (e), in italics, has increased in comparison to the value in Table 2. 3.2
Real Data
The algorithms were also tested on breathing signals of three patients treated with R system. The signals all have a length of 30,000 sampling points the CyberKnife at a rate of approximately 26 Hz, Figure 2. In this test, a grid search for the optimal values of M and μ was performed; the prediction horizon was set to five time steps. The wLMS coefficients a were selected according to Equation 9. The results
Prediction of Respiratory Motion
673
Table 2. nRMS errors on simulated data, corrupted by noise, at a delay of 5 sig.
(a)
(b)
(c)
(d)
(e)
noise level 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
abs. 83.02 150.16 240.66 519.40 83.26 150.19 240.60 519.33 83.41 150.45 241.37 520.43 113.00 147.05 240.67 519.40 161.13 208.63 340.06 707.60
LMS rel. 58.30 90.02 100.00 100.00 58.40 89.96 100.00 100.00 57.90 89.43 100.00 100.00 79.22 88.054 100.00 100.00 32.85 42.03 64.64 100.00
μ 0.0088 0.0020 0.0000 0.0000 0.0084 0.0020 0.0000 0.0000 0.0084 0.0020 0.0000 0.0000 0.0012 0.0012 0.0000 0.0000 0.0016 0.0012 0.0006 0.0000
nLMS2 abs. rel. μ 74.77 52.51 0.0248 131.07 78.58 0.0168 199.88 83.06 0.0136 279.70 53.85 0.0112 75.87 53.22 0.0136 131.40 78.70 0.0152 198.04 82.31 0.0128 280.77 54.06 0.0104 76.06 52.80 0.0240 131.29 78.04 0.0176 200.16 82.93 0.0112 282.72 54.32 0.0104 75.50 52.93 0.1112 116.91 70.00 0.1024 179.44 74.56 0.0880 314.46 60.54 0.0536 147.10 29.99 0.0112 194.32 39.14 0.0104 301.66 57.34 0.0160 590.65 83.47 0.0120
wLMS abs. rel. μ 34.26 24.06 0.0040 97.82 58.64 0.0032 194.31 80.74 0.0032 348.74 67.14 0.0016 51.03 35.79 0.1112 116.67 69.89 0.1144 234.91 97.63 0.1168 389.95 75.09 0.0016 50.69 35.19 0.1096 116.11 69.01 0.1136 236.41 97.94 0.1200 436.60 83.89 0.0896 80.65 56.54 0.2024 139.32 83.42 0.1736 258.48 107.40 0.1648 456.40 87.87 0.1208 129.46 26.39 0.2128 157.95 31.82 0.1472 230.39 43.79 0.1088 503.16 71.11 0.0024
Table 3. wLMS prediction with coefficients a1 through a4 set to 0 signal
(a)
(b)
(c)
noise wLMS level abs. rel. μ 1 5.86 4.12 0.0040 2 18.96 11.37 0.0032 3 56.94 23.66 0.0024 4 172.12 33.14 0.0016 1 41.93 29.41 0.1000 2 66.08 39.58 0.0744 3 123.08 51.16 0.0552 4 237.34 45.70 0.0016 1 41.35 28.70 0.0976 2 65.60 38.99 0.0728 3 121.99 50.54 0.0552 4 306.53 58.90 0.0016
signal
(d)
(e)
noise wLMS level abs. rel. μ 1 77.85 54.58 0.1896 2 116.11 69.53 0.1448 3 182.24 75.72 0.1120 4 418.66 80.60 0.0920 1 130.23 26.55 0.2152 2 156.52 31.53 0.1432 3 200.57 38.13 0.0896 4 279.78 39.54 0.0032
obtained are shown in Table 4. Furthermore, a certain amount of smoothing is permitted, since the real data is inevitably corrupted by measurement noise. Hence, we repeated our computation of the prediction obtained using the wLMS algorithm for two cases: coefficients a1 and a2 set to zero and coefficients a1 through a4 set to zero. These results are shown in Table 5. As with the simulated data, the wLMS algorithm again outperforms the nLMS2 algorithm which,
674
F. Ernst, A. Schlaefer, and A. Schweikard signal (I)
signal (II)
0
10
20 time [s]
30
40
2 amplitude [mm]
0
−5
signal (III)
5 amplitude [mm]
amplitude [mm]
5
0
−5
0
10
20 time [s]
30
40
1 0 −1 −2
0
10
20 time [s]
30
40
Fig. 2. Real data of three different, rather difficult, breathing signals Table 4. nRMS errors on real data at a prediction horizon of 5 sig.
abs. (I) 293.56 (II) 245.77 (III) 415.98
LMS rel. μ 52.33 0.0010 62.36 0.0014 70.22 0.0001
M 5 5 17
abs. 260.64 230.82 398.97
nLMS2 rel. μ 46.47 0.0064 58.57 0.0032 67.35 0.0032
M 4 4 9
abs. 237.11 216.18 383.95
wLMS rel. μ 42.27 0.6184 54.85 0.5456 64.81 0.3544
M 2 2 2
Table 5. wLMS on real data. a1 and a2 set to 0 (left); a1 through a4 set to 0 (right). sig.
abs. (I) 237.87 (II) 218.52 (III) 389.32
wLMS rel. μ 42.41 0.6112 55.45 0.5248 65.72 0.3208
M 2 2 2
sig.
abs. (I) 241.22 (II) 221.01 (III) 391.00
wLMS rel. μ 43.00 0.6080 56.08 0.5208 66.00 0.3200
M 2 2 2
in turn, yields better results than the LMS algorithm. It also becomes clear that this improvement is not as pronounced as for the simulated signals: it lies around 5 to 10%. Furthermore, the smoothing does not lead to further improvement. Unfortunately, we do not know the signal uncorrupted by measurement noise and thus can only compare the predicted signal to the noisy signal. As a result, there is hardly any change in the nRMS error. The predicted signal, however, becomes much smoother and hence is more suitable for determining correlation between chest and tumor motion. Summarizing, smoothing does not noticeably degrade and actually improves the results, whence using it in prediction seems reasonable. Although the improvement was not huge when working with real world data, it is now clear that selecting the correct parameters for LMS-based algorithms is crucial. To underline this, consider Figure 3. Here, the relative nRMS error of the three algorithms are shown as a function of μ and M . Obviously, selecting μ and M for the LMS and the nLMS2 algorithms is very difficult: a slightly off-optimum value can lead to complete breakdown of the prediction. On the other hand, working with the wLMS algorithm is by far easier. All our results show that the optimal value for M is 2 – for both simulated and real signals.
Prediction of Respiratory Motion
675
Fig. 3. Relative nRMS error of the LMS (left), nLMS2 (center) and wLMS algorithms on signal (a) at noise level 1 with a prediction horizon of 5 (logarithmic scale in μ)
4
Conclusions
The presented extension of the wavelet-based MAR algorithm leads to improved results compared with the LMS and the nLMS2 algorithms for both simulated and real signals. For real data, the improvement is less pronounced. However, we have shown that, for the wLMS algorithm, selecting the correct parameters is by far easier. Therefore, the use of the wLMS algorithm in prediction of human respiration is a viable alternative to the nLMS2 algorithm.
References 1. Schweikard, A., Glosser, G., Bodduluri, M., Adler, J.: Robotic motion compensation for respiratory motion during radiosurgery. Journal of Computer-Aided Surgery 5(4), 263–277 (2000) 2. Schweikard, A., Shiomi, H., Adler, J.: Respiration tracking in radiosurgery. Medical Physics 31(10), 2738–2741 (2004) 3. Sharp, G.C., Jiang, S.B., Shimizu, S., Shirato, H.: Prediction of respiratory tumour motion for real-time image-guided radiotherapy. Physics in Medicine and Biology 49, 425–440 (2004) 4. Renaud, O., Starck, J.L., Murtagh, F.: Prediction based on a multiscale decomposition. International Journal of Wavelets, Multiresolution and Information Processing 1(2), 217–232 (2003) 5. Haykin, S.: Adaptive Filter Theory, 4th edn. Prentice Hall, Englewood Cliffs, NJ (2002) 6. Douglas, S.C.: A family of normalized LMS algorithms. IEEE Signal Processing Letters 1(3), 49–55 (1994)
Multi-criteria Trajectory Planning for Hepatic Radiofrequency Ablation Claire Baegert1,2 , Caroline Villard2 , Pascal Schreck2 , and Luc Soler1 1
IRCAD, 1 place de l’Hˆopital, 67000 Strasbourg LSIIT, Pˆole API, Bd Sebastien Brand, 67412 Illkirch {baegert, villard, schreck}@lsiit.u-strasbg.fr, [email protected] 2
Abstract. In this paper, we propose a method based on multiple criteria to assist physicians in planning percutaneous RFA on liver. We explain how we extracted information from literature and interviews with radiologists, and formalized them into geometric constraints. We expose then our method to compute the most suitable needle insertion in two steps: computation of authorized insertion zones and multi-criteria optimization of the trajectory within this zones. We focus on the combination of the criteria to optimize and on the optimization step.
1 Introduction New minimally invasive methods have recently been developed to treat patients with not resectable liver tumors. These methods achieve in situ destruction thanks to chemical agents or temperature. We focus on percutaneous radiofrequency ablation (RFA) that is one of the most used and effective methods, and involves only a short hospital stay and a reduced cost. Guided by CT or ultrasound images, the physician inserts a needle through the patient’s skin. A radiofrequency alternating current flow is then delivered so that tissues close to the needle tip are heated and coagulate above 60◦ C. Experience of the operator highly affects the chances of a complete ablation and the risks of complications [1]. Our work is aiming at providing a planning software based on collected physicians expertises to recommend an optimal strategy for each operation. In the following, we first recall other works that have already been accomplished in the domain of minimally invasive surgery planning. Secondly, we expose how we have formalized some information from the expertise of radiologists to define criteria influencing the strategies. Then we explain our methods to merge and solve them in order to propose an adapted solution for each operation. Finally, we evaluate our results on virtual patients and we discuss the further developments.
2 Related Works We wish our planning software to be used in the operating room, after the CT acquisition and before the intervention in order to work on accurate data. This gives about 30 minutes for both reconstruction and planning processes. The planning process consists of choosing the best strategy among a lot of candidate trajectories, taking into N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 676–684, 2007. c Springer-Verlag Berlin Heidelberg 2007
Multi-criteria Trajectory Planning for Hepatic Radiofrequency Ablation
677
account the whole anatomy of the patient. That is why, for each considered trajectory, the corresponding necrosis zone (that we will call lesion in the rest of the paper) must be predicted as precisely and quickly as possible. Several studies focused on automatically optimizing tools position in order to minimize damage to surrounding tissues. Altrogge et al. [2] proposed an optimization method based on the simulation of temperature within tissues. However the resulting trajectory does not take into account surrounding organs and no computation time is indicated. Butz et al. [3] focused on cryoablation. They proposed an optimization of probes position in a secure window provided by the physician. Other studies related to prostate cryosurgery also have to be mentioned [4] [5]. Both works combine planning with thermal exchange simulation, with the drawback of a long computation time. Moreover one of the models is restricted to 2D and the second requires manual interventions for optimization. Interesting works have been performed on computer planning of robotically assisted minimally invasive surgery for heart interventions [6]. Like in RFA planning, the issue is to propose a strategy that respects several constraints and optimizes several criteria. However, optimization is performed by an exhaustive search within a very limited number of incision sites, that would not be possible in a reasonable time in our case, as we showed in [7], as the search domain is too wide. For this reason, we focused our studies on a multi-criteria optimization process.
3 Characterization of the Constraints Governing RFA Planning In an earlier paper [8] we were taking into account one single constraint: the inclusion of the entire tumor while minimizing the amount of destroyed healthy tissues. However in practice several criteria are considered, some of them being disqualifying, others being to optimize. The rules that motivate the strategy for each operation are not clearly enunciated and may vary between specialists. However, the most essential of them appear recurrently. In this section we selected some of those recurrent criteria from medical literature, that were confirmed by our expert practitioners. Of course the weights of the criteria can be adjusted and extra criteria could be added (see section 3.3). 3.1 Analysis of Medical Literature Although RFA is a recent technique, many medical studies that detail the different aspects of this operation have been published. We focus on liver tumors RFA, but our work could easily be adapted to other cancer location. We consider the percutaneous approach for which the preoperative planning takes an important place because of the limited visibility during the intervention. RFA is generally conceivable for non-resectable small tumors (smaller than 5 cm). The ablation of bigger tumors is possible but often requires multiple needle insertions that raise the risk of misplacement and incomplete tumor destruction [9]. The RF-lesion has to include a 0.5-1cm margin around the tumor. Lesion shape and size vary according to specific material used [10]. The theoretical shape of the lesion is a spheroid, with different small/big axis ratio according to the needle model. In practice shape is influenced by the cooling effect of large vessels in the neighbourhood [11].
678
C. Baegert et al.
The operation is successful if no recurrent tumor is noticed at the original site during the follow-up. Rates of local recurrence vary between studies depending on different parameters [12], but tumor’s size, location and physician’s experience highly influence chances of success. Different kinds of complication can occur [13]. Patient’s organs, vessels, or bile ducts can be damaged during needle placement or thermal ablation. Cancerous cells may adhere to the needle during its removal and result in the development of a new tumor along the needle path. Remaining functional liver may have been overestimated. The needle trajectory must be chosen in order to minimize these risks. 3.2 Selected Constraints The constraints cited by specialists can be classified into 2 categories: strict constraints and soft constraints. A needle trajectory (considered in a first approach as rectilinear) has to fulfill all strict constraints to belong to the solution space. Among all solutions, the proposed trajectory has to satisfy at best the soft constraints. We selected the following four strict constraints that have to be fulfilled: – The trajectory must not cross any vital organ, bone or major blood vessel, – The insertion depth must be below needle size, – The insertion angle in the liver must not be to tangent to liver surface in order to prevent risks of gliding on the surface during insertion, – The trajectory must include a safe portion of the liver in order to enable cauterization of the path. Among all solutions, the final strategy has to optimize soft constraints. We selected the three following ones: – Volume of healthy tissues ablated: the needle should be placed so that the shape of the RF-lesion is as close as possible to target volume: remaining hepatic reserve is maximized and ablation can be done in a minimal number of needle insertions. – Distance to vital organs: a trajectory that is very close to vital organs should often be penalized because in practice a little deviation from the planned trajectory is unavoidable. It is important to minimize risks of fatal injury. – Insertion depth: short trajectories are often privileged because long trajectories increase risks of imprecision. These categories of contraints are different by nature: soft constraints are continuous, whereas strict constraints are boolean. Strict contraints are combined using a simple intersection of their solution spaces (see section 4.1). For soft constraints, we chose a global approach that merges them into a unique function to optimize. In the following section we present this approach and discuss about other combination possibilities. 3.3 Determination of the Global Minimization Function First, let us describe more precisely our optimization problem. The following functions express formally the different constraints we want to optimize:
Multi-criteria Trajectory Planning for Hepatic Radiofrequency Ablation
679
– lesion volume : R5 → R+ , denoting the volume of the minimal lesion including the tumor and margin, according to the 5 degrees of freedom of the needle (3 for the needle tip position and 2 for the angles), – depth : R5 → R+ , denoting the depth of insertion (distance between insertion point and target point), – distance : R5 × O → R+ , denoting the minimal distance from the needle to an organ belonging to the set O of surrounding organs. The rough combination of these functions would be meaningless, since they do not have the same order of magnitude. We then consider pseudo-normalizations performed specifically for each function in adequacy with its semantic before combining them. For function lesion volume, we define function fv by the formula fv (X) =
lesion volume(X) − minx∈D (lesion volume(x)) P.minx∈D (lesion volume(x))
where X represents any needle placement, D the set of eligible placements that correspond to an appropriate access to the tumor. P is a critic proportion of volume above which the volume loss is considered too important (experimental value: 60%). Function depth is also linear, because we think that the penalty increases linearly according to the depth of insertion as well. We consider function fd defined by fd (X) =
depth(X) − minx∈D (depth(x)) needle length − minx∈D (depth(x))
For function distance, we want the measure of the risk to increase more quickly when the needle comes close to an organ. To this end, it is convenient to use a square root function. We also want to add the collision risks for all the organs, so we simply use a sum. Therefore, we obtain function fr defined by dist limito − distance(X, o) max(0, ) fr (X) = dist limito o∈O
where dist limito represents a set of parameters representing a security distance that sould be observed for each organ. In order to eliminate the risk of a negative value inside the square root, we take the max between 0 and the value found. A negative value would occur if the needle is significantly far from the organ, so it is acceptable to consider in that case that the function is minimized and equal to zero. We then define function fall that is a linear combination of the three others: fall (X) = av .fv (X) + ar .fr (X) + ad .fd (X) with av , ar , ad ∈ [0, 1] and av + ar + ad = 1. These three weights represent the importance of each criteria in the final decision. A linear combination has been chosen because it is a predictable and intuitive function, and provides weights to act on each constraint. It can be objected that the minimization of this function could result in a trajectory that satisfies badly one of the constraints. However if one soft constraint must be satisfied more than the others, the corresponding weight should be set in consequence
680
C. Baegert et al.
Fig. 1. Examples of insertion zones: Holes in the skin represent possible needle access to the tumor
or it should be included in strict constraints. Finally, the proposed trajectory respects the strict constraints and minimizes the final function fall . In the following section, we present the method we developed to compute this trajectory.
4 Determination of the Needle Insertion Strategy 4.1 Determination of Possible Solutions First of all, before choosing an optimal path, it is important to determine the set of all possible trajectories with precision. We developed a method that automatically computes the possible insertion zones on the skin and that has been the subject of an earlier publication [14]. A needle insertion in the resulting zone guarantees that the strict constraints presented earlier are satisfied. Then a trajectory is considered as eligible and belongs to D only if it crosses the insertion zone (see examples of zones on Fig. 1). 4.2 Optimization Phase Like in most of optimization problems, we face a large number of possibilities that cannot be entirely studied in a reasonable computation time. Moreover most of existing optimization methods do not completely avoid the problem of local minima. We developed a method in two steps. The initialization step finds, after a rough discretization of the angular space, one or more trajectories that seem close to the minimal. Then these trajectories are used as starting points for a local minimization and resulting values are compared to select the best choice. In a previous paper [7], we showed that this two-step method was fast and efficient on local minima, by comparing with the exhaustive one, with one single criterion. The initialization step of multi-criteria optimization consists in discretizing the angular parameters space, by evaluating fall only for trajectories with a chosen step of 6◦ with a fixed needle tip position. In this discretization we select trajectories resulting in a nearly minimal evaluation (fall (t) = minu∈zone (fall (u)) + )). From each selected
Multi-criteria Trajectory Planning for Hepatic Radiofrequency Ablation
681
trajectory, we perform a local optimization thanks to downhill simplex method that has proven to converge quickly and to provide precise results in our earlier works. 4.3 Results Our method has been tested on 16 tumors in 7 virtual reconstructions of real patient cases and results are shown on Tab. 1. For all these patients, the total optimization phase (initialization + local optimization for each initial trajectory) took around 30 seconds, with a pentium 4 3.2GHz, 2Go RAM with a GeForce 7800GT. Most of the time, one or two trajectories are selected by the initialization step. The trajectory proposed for each case is quite satisfying with respect to the three criteria. If we compare the volume of the lesion of the proposed trajectory with the minimal lesion possible volume in the insertion zone, we notice that the volume loss is low (average 4,4% more than min. vol.). Insertion depth of the optimal trajectories are around 5 cm that should facilitate a precise placement of the needle during the operation. In most of the case no vital structures are approached within 1cm (0.5cm for vessels) of the needle trajectory. In case 13 vena cava is 5mm and in case 15 lung is 6.6mm close to the final trajectory. In both cases, proximity is unavoidable because of the location of the tumors. In all cases it is possible to determine quickly what are the best access points. The results of the initialization can be visualized as color maps on the skin (see Fig 2). The software also provides the possibility to interfere in the proposed solution: the weights of each criteria (default: 13 ) can be modified by the physician if he wishes to privilege one criteria. Once the optimization phase has been performed, a modification of the weights results in a real-time update of the initialization and of the color maps. An additional local optimization is sufficient to update the optimal trajectory. Table 1. Evaluation of the optimal trajectory regarding the soft constraints case 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
no. of traj. 1 2 1 1 2 2 3 2 1 2 1 2 2 3 2 1
tot. opt. time (s) 27 24 20 18 34 32 32 24 17 31 24 24 22 29 21 17
lesion vol. (min. vol.) (cm3) 12.3 (11.7) 3.3 (3.3) 3.2 (3.0) 2.5 (2.5) 2.7 (2.7) 9.4 (9.3) 10.1 (9.6) 5.9 (5.9) 6.6 (6.0) 3.3 (3.1 4.8 (4.6) 5.0 (5.0) 3.9 (3.4) 3.1 (2.8) 8.5 (8.4) 5.7 (5.4)
close vital structures (mm) none none none none none none none none none none none none vena cava: 5.0 none right lung: 6.6 none
insertion depth (cm) 4.2 4.9 4.3 5.9 (5.4) 6.5 5.0 5.32 4.2 6.6 5.3 5.7 7.1 8.7 12.2 6.3
682
C. Baegert et al.
(a) Volume constraint
(b) Risk constraint
(c) Depth constraint
(d) Mix of the 3 weighted constraints
Fig. 2. Proposed trajectory and accuracy regarding the different constraints. Best locations are light-colored and worst locations are dark-colored.
5 Conclusion and Future Works We proposed two kinds of geometric constraints to formalize medical expertise in planning RFA treatment. The resolution of strict constraints results in an insertion zone on the skin representing possible trajectories for the specific case. Among them a trajectory that satisfies at best the soft constraints has to be chosen. We proposed a minimization function that represents the different constraints affected by their respective weights. We solved the optimization problem in two steps: initialization thanks to a global study of the problem then local optimization from interesting needle positions. We showed that our method proposes a satisfying result regarding selected constraints in a few minutes. Our results were shown to radiologists. For a few cases, we compared informally their estimated strategy with our computed trajectory. Although we did not perform numerical measurements, we noticed that we obtained close results, except once where we gave a better proposition according to the clinician. In addition to the theoretical validation exposed in this paper and the assessment by experts, we plan to implement a functionality to compare numerically our result with per-operative images. Further discussions with radiologists will also allow us to continue studying with more precision the implicit rules governing RFA, and even to make new constraints appear.
Multi-criteria Trajectory Planning for Hepatic Radiofrequency Ablation
683
RFA is a recent technique and corresponding rules and devices can evolve in the future. Moreover, the approach to each operation can vary between physicians. For these reasons, we want to have a flexible software that can adapt itself to changing constraints. Constraints definition is currently directly integrated in the software’s code and cannot easily be modified. It could be worthwile to separate it from the software and to make it accessible for enhancements by expert users. We are thinking about using declarative modeling to achieve this. This way, our software could be easily adapted to other cancer location or to other kind of minimally invasive therapies.
Acknowledgment We would like to thank Pr J. Marescaux, Region Alsace and research program IRMC for their financial support. We are also grateful to Pr Gangi, Pr Pereira, and Dr Buy for sharing their expertise.
References 1. Poon, R., Ng, K., Lam, C., Ai, V., Yuen, J., Fan, S., Wong, J.: Learning curve for radiofrequency ablation of liver tumors: Prospective analysis of initial 100 patients in a tertiary institution. Annals of Surgery 239, 441–449 (2004) 2. Altrogge, I., Kr¨oger, T., Preusser, T., B¨uskens, C., Pereira, P., Schmidt, D., Weihusen, A., Peitgen, H.O.: Towards optimization of probe placement for radio-frequency ablation. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 486–493. Springer, Heidelberg (2006) 3. Butz, T., Warfield, S.K., Tuncali, K., Silverman, S.G., Van Sonnenberg, E., Jolesz, R., Kikinis, F.A.: Pre- and intra-operative planning and simulation of percutaneous tumor ablation. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 317– 326. Springer, Heidelberg (2000) 4. Lung, D.C., Stahovich, T.F., Rabin, Y.: Computerized planning for multiprobe cryosurgery using a force-field analogy. Computer Methods in Biomechanics and Biomedical Engineering 7(2), 101–110 (2004) 5. Baissalov, R., Sandison, G.A., Donnelly, B.J., Saliken, J.C., McKinnon, J.G., Muldrew, K., Rewcastle, J.C.: A semi-empirical treatment planning model for optimization of multiprobe cryosurgery. Physics in Medicine and Biology 45, 1085–1098 (2000) 6. Adhami, L., Coste-Mani`ere, E.: Optimal planning for minimally invasive surgical robots. IEEE Transactions on Robotics and Automation: Special Issue on Medical Robotics 19(5), 854–863 (2003) 7. Baegert, C., Villard, C., Schreck, P., Soler, L.: Trajectory optimization for the planning of percutaneous radiofrequeny ablation on hepatic tumors. Computer Aided Surgery (2007) 8. Villard, C., Baegert, C., Schreck, P., Soler, L., Gangi, A.: Optimal trajectories computation within regions of interest for hepatic rfa planning. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 49–56. Springer, Heidelberg (2005) 9. Curley, A.: Radiofrequency ablation of malignant liver tumors. Annals of Surgical Oncology 10(4), 338–347 (2003) 10. Mulier, S., Ni, Y., Miao, Y., Rosi`ere, A., Khoury, A., Marchal, G., Michel, L.: Size and geometry of hepatic radiofrequency lesions. EJSO 29, 867–878 (2003)
684
C. Baegert et al.
11. De Baere, T., Denys, A., Wood, B.J., Lassau, N., Kardache, M., Vilgrain, V., Menu, Y., Roche, A.: Radiofrequency liver ablation: experimental comparative study of water-cooled versus expandable systems. AJR 176(1), 187–192 (2001) 12. Mulier, S., Ni, Y., Jamart, J., Ruers, T., Marchal, G., Michel, L.: Local recurrence after hepatic radiofrequency coagulation. Annals of surgery 242, 158–171 (2005) 13. Rhim, H., Dodd, G.D., Chintapalli, K., Wood, B.J., Dupuy, D.E., Hvizda, J.L., Sewell, P.E., Goldberg, S.N.: Radiofrequency thermal ablation of abdominal tumors: Lessons learned from complications. Radiographics 24, 41–52 (2004) 14. Baegert, C., Villard, C., Schreck, P., Soler, L.: Precise determination of regions of interest for hepatic rfa planning. In: SPIE Medical Imaging (2007)
A Bayesian 3D Volume Reconstruction for Confocal Micro-rotation Cell Imaging Yong Yu, Alain Trouv´e, and Bernard Chalemond Ecole Normale Sup´erieure de Cachan, France {yu,trouve,Bernard.chalmond}@cmla.ens-cachan.fr
Abstract. Recently, micro-rotation confocal microscopy has enabled the acquisition of a sequence of slices for a non-adherent living cells where the slices’ positions are roughly controlled by a dielectric-field biological cage. The high resolution volume reconstruction requires then the integration of precise alignment of slice positions. We propose in the Bayesian context, a new method combining both slice positioning and 3D volume reconstruction simultaneously, which leads naturally to an energy minimization procedure of a variational problem. An automatic calibration paradigm via Maximum Likelihood estimation (MLE) principle is used for the relative hyper-parameter determination. We provide finally experimental comparison results on both conventional z-stack confocal images and 3D volume reconstruction from micro-rotation slices of the same non-adherent living cell to show its potential biomedical application.
1 Introduction Recently, thanks to the combining efforts of both biological and physical research, it emerges a novel specification and design methodology [5][8] for manipulating microscopic objects by a dielectric-field micro-rotation cage. One of its immediate impact is its fruitful application to non-adherent living cell imaging without sticking to a glass capillary [2]. Although there already exists axial tomographic confocal microscopy techniques [4] improving the imaging resolution by physically rotating the objects, the inherent defocus aberration of conventional z-stack imaging is not yet avoided. As a result, a sophisticated deconvolution process with depth-dependant point spread function is needed to remove optical artifacts. The micro-rotation cage yields continuous rotation movement of a captured object while the focal plane position of the confocal microscopy is fixed (there is no z-direction displacement). In such a way, a sequence of high resolution and isotropic (the point spread function is constant for each slice) 2D optical cross-section images called slices is obtained. The arising challenge of this new imaging system is to determine precisely the position of each slice before we can reconstruct a high resolution 3D fluorescence intensity volume. Indeed, its novelty comes from coupling two problems which are intensively surveyed in medical imaging processing domain: If these positions were known, the problem would be similar to the classical interpolation problem in the simplest case, and in more complicated cases to the deconvolution or tomography problems [6]. On the other hand, if the 3-D intensity volume is known, the estimation of the positions of a particular slice, would also reduce to the classical problem of registration [1,2,4]. So, we N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 685–692, 2007. c Springer-Verlag Berlin Heidelberg 2007
686
Y. Yu, A. Trouv´e, and B. Chalemond
need to integrate these two sub-problems into a common formulation which implies performing simultaneously registration (slice positioning in our case) and reconstruction. As a first step, being aware of the computational overhead introduced by any deconvolution procedure for the sequence of micro-rotation slices, we have, for the moment, emphasized the estimation of slices’ geometry parameters without taking the PSF into account in the imaging process. The reconstructed volume is modeled as a Gaussian process [7] to characterize the spatial coherence between slices. As a result, cell reconstruction including parameters estimation and slice positioning is performed in a statistical framework from which derives naturally a variational formulation. We will show in the experiment section that our simplified volume model does not degrade its high resolution reconstruction comparing with the conventional z-stack result on a real cell example. The paper is organised as following: In section 2, a MLE parameter calibration paradigm is proposed to tune automatically the hyper-parameters of the statistical modelling. In section 3, the variational formulation is derived from Bayesian inference. Moreover, The Fast Gaussian Transform (FGT) method is shortly summarized for the sake of its role in our numeric solution. Finally, in section 4, the visualization experiment on real non-adherent living cell is demonstrated with the comparison to conventional z-stack images.
2 Statistical Modelling and MLE Parameter Calibration We fix now the notations. We denote (Ii )1≤i≤N the sequence of N image slices. The . slice positions are coded by N rigid transformations Φ = (ϕi )1≤i≤N = (Ri , bi )1≤i≤N which include pairs of rotation matrix and translation vector acting on a reference plane H0 after choosing a space frame. We denote f , the continuous intensity 3D volume to be reconstructed. In our statistical model, the volume f is modeled as a centered1 Gaussian field with covariance a translation and rotation invariant covariance function k(., .). For the sake of simplicity, we choose a Gaussian kernel so that k(x, y) = σf2 exp(−y −x2/(2λ2f )) where λf plays the role of a scale parameter and σf2 the variance of the induced stationary process. Any observed slice Ii is modeled, given the 3D positioning φi , as a noisy version of the restriction f (ϕi (xs ))xs ∈H0 , i.e., Ii (xs ) = f (φi (xs )) + σ i,s .
(1)
where the i,s is defined as a Gaussian white noise. Since f ◦ φ and f for φ fixed have the same distribution as a Gaussian process, we deduce easily that Ii ∼ N (0, Γ ) with Γ (xs , xt ) = σf2 e 1
(−
||xs −xt ||2 2λ2 f
)
+ σ2 1s=t .
(2)
It is not a strong assumption since the constant gray level of background is easily measured and then subtracted.
A Bayesian 3D Volume Reconstruction for Confocal Micro-rotation Cell Imaging
687
Let θ = (˜ σf2 , λf , σ2 ) where σ ˜f2 = σf2 /σ2 . We have Γ (xs , xt ) = σ2 Γ˜ (xs , xt ) with 2 Γ˜ (xs , xt ) = σ ˜ 2 exp(− ||xs −x2 t || ) + 1s=t . f
2λf
Making the simplifying assumption of the conditional independence the slices Ii ’s given Φ, the log-likelihood of the whole sequence of slices is: N N 1 ˜ −1 1 2˜ Γ Γ |) + Cte (3) log P (I|θ, Φ)= log P (Ii |θ) = − I I + log(|σ i 2σ2 i 2 i=1 i=1 . where I = (Ii (s))1≤i≤N,s∈H0 . and Cte is the constant factor. The MLE estimation θˆ of θ requires maximizing the term log P (I|θ). Note the optimisation on σ2 is straightforward and gives σ ˆ2
N 1 ˜ −1 = I Γ Ii M N i=1 i
(4)
where M is the number of pixels of each slice so that the MLE can be reduced to the optimisation of the two parameters (˜ σf2 , λf ) done by exhaustive search on a grid. To save computation, we perform the estimation on a family of sub-regions of moderate sizes so that the inversion of the Γ˜ ’s are easily feasible.
3 MAP Estimation We recover (f, Φ) given I by maximum a posteriori estimation (MAP). We assume that f and Φ are independent. The distribution of f has been defined before so that we need to precise the prior distribution for Φ. Since it is relatively easy to determine both the mean axis orientation and the angular speed of the micro-rotation movement, we start from an ideal movement trajectory Φ0 and Φ is modelled as a random perturbation of Φ0 . More precisely, PΦ0 (Φ) =
N
Pϕ0i (ϕi )
i=1
Pϕ0i (ϕi ) ∝ exp(−d
2
(5)
(ϕi , ϕ0i ))
where d2 (ϕi , ϕ0i ) =
d2 (Ri , Ri0 ) d2 (bi , b0i ) + , σω2 σb2
(6)
and two variance parameter σω2 and σb2 described the perturbation strength. The distance between two rotation matrices R1 and R2 in 3D space is defined as the common geodesic distance which is invariant to right/left rotation multiplication: trace(R1 R2 −1 ) − 1 −1 d(R1 , R2 ) = cos 2
688
Y. Yu, A. Trouv´e, and B. Chalemond
Moreover, the distance between two translations b1 and b2 is the common Euclidean distance d2 (b1 , b2 ) = b1 − b2 2R3 . Finally, the MAP estimation gives us an equivalent variational problem: 1 2 1 J (Φ, f ) = d (ϕi , ϕ0i ) + f 2H 2 i=1 2 N
N 1 + |f (ϕi (x)) − Ii (x)|2 /σ2 . 2 i=1
(7)
x∈H0
where H is a reproducing kernel Hilbert space (RKHS) associated with covariance function k. 3.1 Gradient Computations To minimize J , we use a gradient-descent based method defined as Φ(t + δt) Φ(t) Φ(t) J δt . = − f (t) J f (t + δt) f (t)
(8)
As known with RKHS (see [9]), one can introduce a finite family (xci )1≤i≤NC of control points defined on a grid and approximate f ∈ H by projection as a linear combination f (x) = αi k(x, xci ) (9) xci
Now the differential of f J is reduced to a finite dimensional expression α J due to Equ. 9: α J = Kα + AT (Aα − I)/σ2 , (10) where
. K = k(xci , xcj ) 1≤i,j≤NC . A = k(ϕi (s), xcj ) 1≤i≤N,s∈H0 ,1≤j≤NC
We decompose the partial gradient ΦJ into the two partial gradients:(Ri J )1≤i≤N and a translation related term denoted by (bi J )1≤i≤N , which are calculated directly: bi J =
(f (ϕi (x)) − Ii (x)) bi − b0 + f (ϕi (x)) . 2 σb σ2
(11)
1 ∂J (Ri ).j ∧ ( ).j ∧ Ri 2 j=1 ∂Ri
(12)
x∈H0
and
3
Ri J =
∂J where ∂R is coefficient-wise differential of J with respect to the Ri ’s coefficients and i operator ().j extracts the matrix’s j th column and ∧ is the common cross product.
A Bayesian 3D Volume Reconstruction for Confocal Micro-rotation Cell Imaging
689
3.2 Fast Gaussian Transform Since the computation of α J and Φ J involved intensive evaluation of f (x) in Equ. 9, a brute force implementation is hopeless for the volume reconstruction. We have used the recently popularized Fast Gaussian Transform method improved by C. Yang et al [10] which exploits the fast decay of the Gaussian kernel based on the work of Greengard and Strain [3]. We mention only its basic principle while more implementation details can be found in their original paper [10]. As all multipole methods, FGT method expands the evaluation function around a chosen pole x into two separate terms: ||x−xi ||2 ||xi −x||2 ||x−x||2 αi e− 2 = e− 2 αi e− 2 e<x−x,xi −x> f (x) = xi
≈ 0≤
xi
||xi −x||2 1 − ||x−x||2 2 e (x − x)a αi e− 2 (xi − x)a a! xi ai ≤p
(13)
term independent of x
. . where we denote the polynomial exponent degrees a = (a1 , · · · , ad ), a! = di=1 ai , . d xa = i=1 xai i (d=3 for 3D points) and p the polynomial approximation order (p = 6 in our case). Using the FTG approach, the complexity of the computation of a product Kα is reduced from O(NC2 ) to O(NC ).
4 Experimental Results We have performed the experiment on the image sequences from both simulation data and real imaging data. But to save the space we don’t report here the result from the simulation data and focus only on that from the real imaging data. The real imaging data shown in Fig. 1 (sampling from one tour of 340 slices) was acquired by our biologist collaborators from a sw13/20 living cell caged and suspended in a CytoconTM chip (Evotec technologies, Germany) to investigate the localization and dynamics of nuclear lamina and green fluorescent protein (GFP). These confocal images were then collected using a Zeiss AxiovertTM 200 typed confocal microscopy. For the optical parameter setting, a 63x water immersion objective is used and numerical aperture (NA) is set to 1.2. Finally, the resolution of each optical section image is 129nm and the chip driver gives us the mean rotation direction projected in 2D optical section (it is y axis or vertical direction in this case study). Before launching the reconstruction-alignment coupling processing, the parameters of θ needed for the variational formula are estimated by the method proposed in section 2 on 100 blocks of 30 × 30 uniformly distributed in all 340 slices (the size of each slice is 156 × 156). The parameters are estimated by MLE criterion then as σf2 = 9.75 × 106 , σ2 = 3.36 × 105 and λf = 3.5. The remaining two variance parameters coding the instability of the movement away from the ideal trajectory determined by the mean rotation movement are set as σω2 = 10.0 and σb2 = 10.0. Then we have run the optimisation procedure determined by Equ. 8 for 5 iterations. Each iteration contains a subroutine of volume reconstruction driven by conjugate gradient method with fixed 20 iterations and a subroutine of slices alignment driven by
690
Y. Yu, A. Trouv´e, and B. Chalemond
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Fig. 1. 12 micro-rotation slices from a real confocal microscopy imaging data sequence
(a)
(b)
Fig. 2. A 3D volume rendering case using OsiriX software projected on a same viewing position. (a) From conventional Z stacks already deconvoluted by SVI Huygens deconvolution software. (b) From the reconstruction result of micro-rotation slices.
Levenberg-Marquardt method with 20 − 200 iterations which depends on the distance between the initial and final values of slices positions. In order to have a fair validation of the reconstruction based on micro-rotation data, we provide also a reconstruction based on the state-of-art z-stack imaging techniques. The Z-stack data have acquired in the suspension mode of the CytoconTM chip now controlled by a piezo motor to displace the whole cage. The step between two planes along z direction is set to 100nm and 181 slices were obtained for the same living cell done as done in micro-rotation mode.
A Bayesian 3D Volume Reconstruction for Confocal Micro-rotation Cell Imaging
691
initial rotation trajectory estimated rotation trajectory
50 45
Initial x−axis coordiante Estimated x−axis coordinate Initial y−axis coordiante Estimated y−axis coordinate Initial z−axis coordiante Estimated z−axis coordinate
40
j
k
35
1 0.8 0.6 0.4
30 z−axis
0.2
25
0 −0.2
20
−0.4 −0.6
15
−0.8
10
−1 1
5
1 0.5
0.5 0
0
0 −0.5
−0.5 −1
−5
0
50
100
150
200
(a)
250
300
350
y−axis
−1 x−axis
(b)
Fig. 3. Rigid transformation estimation of the 340 micro-rotation slices position parameters: (a) translation of each slice (b) trajectory generated from the 340 rotation matrices given by each slice position acting on a unit vector [1 0 0]t
The final volume reconstruction result are rendered in the same viewing direction as that the z-stack data (its volume size is 109 × 109 × 181) which is shown in Fig. 2. The positions of each slice coded by rigid transformation parameters are shown in Fig. 3, which represents an irregular perturbations in agreement with physical models. This irregular perturbation is apparent on Fig. 1, in particular from instant (j) to (k) (upward jump). Note that this jump is well detected on the position parameters in Fig 3. The deeper biological evaluation is beyond the scope of this paper. However, it is clear from the rendering volume viewing shown in Fig. 2 that the reconstruction quality from micro-rotation slices has been improved more than that from deconvoluted z-stack slices: not only it gives the cellular membrane which is missing in the z-stack volume, but also the geometry distorsion caused by aberration effect has been diminished.
5 Conclusion In this paper we have proposed for a novel micro-rotation confocal microscopy imaging system, a high resolution 3D volume reconstruction method with the simultaneous alignment of each rotational slices. The parameter calibration is performed from a statistical framework via MLE principle. The slices’ relative positions are well aligned so that the reconstruction gives more detail than that from the conventional z-stack volume even without the deconvolution refinement. The immediate on-going improvement work is adopting the multi-resolution strategy by replacing the optimal mono-scale kernel with a combination of kernels at different scale spaces. Not only the releasing of computational burden, but also its ability to avoid local minima during the optimization phase benefit from the multi-resolution strategy.
692
Y. Yu, A. Trouv´e, and B. Chalemond
Acknowledgements This research is supported by the European Commission (NEST 2005 Programme) in consortium AUTOMATION, coordinated by S.L. Shorte (Institut Pasteur, http://www. pfid.org/AUTO MATION/ home/) and by the French Ministry of Research (grant ACINIM FLUTOMY 2003 and Postdoctoral Fellowship 2004). We thank FPID of Institut Pasteur for the supply of the micro-rotation images.
References 1. Baheerathan, S., Albregtsen, F., Danielsen, H.E.: Registration of serial sections of mouse liver cell nuclei. Journal of Microscopy 192(1), 37–53 (1998) 2. Bradl, J., Rinke, B., Schneider, B., Edelman, P., Hausmann, M., Cremer, C.: Resolution improvement in 3-d microscopy by object tilting. Microsc. Anal. 44, 9–11 (1996) 3. Greengard, L., Strain, J.: A fast algorithm for the evaluation of heat potentials. Comm. Pure Appl. Math. (43), 949–963 (1990) 4. Heintzmann, R., Cremer, C.: Axial tomographic confocal fluorescence microscopy. Journal of Microscopy 206, 7–23 (2002) 5. Shorte, S.L., Muller, T., Schnelle, T.: Method and device for three dimnesional imaging of suspended micro-objects providing for high-resolution microscopy, European patent, No. 1 413 911 B1 (2002) 6. Pawley, J.B. (ed.): Handbook of Biological Confocal Microscopy. Springer, Heidelberg (2006) 7. Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2005) 8. Schnelle, T., Hagedorm, R., Fuhr, G., Fielder, S., Muller, T.: Three-dimensional electric field traps for manipulation of cells - calculation and experimental verification. Biochemica et Biophysica Acta 1157, 127–140 (1993) 9. Wahba, G.: Advances in Kernel Methods, chapter Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV, pp. 69–88. MIT Press, Cambridge (1999) 10. Yang, C., Duraiswami, R., Gumerov, N., Davis, L.: Improved fast gauss transform and efficient kernel density estimation. In: IEEE ICCV, pp. 464–471. IEEE Computer Society Press, Los Alamitos (2003)
Bias Image Correction Via Stationarity Maximization T. Dorval, A. Ogier, and A. Genovesio Image Mining Group Institut Pasteur Korea 39-1, Hawolgok-dong, Seongbuk-gu, Seoul, 136-791, Korea
Abstract. Automated acquisitions in microscopy may come along with strong illumination artifacts due to poor physical imaging conditions. Such artifacts obviously have direct consequences on the efficiency of an image analysis algorithm and on the quantitative measures. In this paper, we propose a method to correct illumination artifacts on biological images. This correction is based on orthogonal polynomial modeling, combined with stationary maximization criteria. To validate the proposed method we show that we improve particle detection algorithm. Index Terms: Biomedical Image Processing, Image Analysis, Image Enhancement, Object Detection, Biomedical Microscopy.
1
Introduction
Modern microscopy and robotic technologies allow a very large amount of visual information to be collected and analyzed automatically. These new systems make the visual inspection of the pictures totally obsolete, but also give a chance for an objective quantitative measurements on cell experiments. Nuclei, endosomes and other particles detection is a common request for biological image analysis. However illumination artifacts systematically occur on 2D cross-section confocal microscopy imaging platforms. These biases can strongly corrupt a higher level image analysis such as segmentation, fluorescence evaluation or even pattern extraction / recognition [1,2,3]. To overcome this drawback many methods have already been proposed in literature. A reader can refer to [4] for a comparative evaluation of the most common intensity inhomogeneities correction techniques. In this paper, we make the assumption that bias generates a non stationary process which can be corrected by orthogonal polynomial modeling. This paper presents a new fully automated bias correction methodology, which improve low level biological image analysis such as segmentation or particle detection algorithm. A relevant protocol validates the correction algorithm and shows outperforming extraction on corrupted images. This paper deals with 2D fluorescence confocal microscopy, but the framework can be easily extended to
The authors thank T. Christophe, R. Grailhe and P. Sommer from Institut Pasteur Korea for providing valuable confocal microscopic images.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 693–700, 2007. c Springer-Verlag Berlin Heidelberg 2007
694
T. Dorval, A. Ogier, and A. Genovesio
many other biological imaging techniques. In section 2 the stationarity definition is recall. In section 3 we describe the polynomials correction framework. In section 4 we propose to validate this methodology by classical features detection in cell images. Finally, we conclude in section 5.
2
Stationarity Definitions
In signal processing, according to the traditional definition, a time series, Xt , is stationary in the wide sense, if: ⎧ ⎨ ∀t ∈ Z, E(|Xt2 |) < ∞, ∀t ∈ Z, E(Xt ) = m, (1) ⎩ ∀(t, h) ∈ Z2 , Cov(Xt , Xt+h ) = E[(Xt+h − m)(Xt − m)] = γ(h), in which E(.) is the expectation value and Cov(.) is the covariance function. If we refer to the terminologies from Nelson and Plosser [5], two classes of non stationary process exist: the Trend Stationary (TS) process and the Differency Stationary (DS) process of ordre d. For TS process, the non stationarity follows a determinism model and can be written: Xt = g(t) + t , with g(t) a time dependent function and t a stationary stochastic process. A simple example of a TS process is a linear trend disturbed by a white noise. In this case g(t) = a0 + a1 t with a0 , a1 ∈ R and t i.i.d N (0, σ 2 ). The DS process is given by Xt = (1 − L)d Xt , with L the lag operator (LXt = Xt−1 ∀t ∈ Z) and d ∈ Z. Thus, for a DS process the trend is not a determinism one but stochastic. A famous example is the pure random walk defined by Xt = Xt−1 + t . Shading phenomenon is often defined as a smooth intensity variation, leading to a nonuniform illumination of the image. Based on this definition, we assume that the corrupted images can be seen as a TS process. Moreover, we consider that g(t) can be modelled by a polynomial P (t). As we are dealing with 2D images, the time t is replaced by the spatial coordinates (x, y). Thus, for the following, P (t) and t will be respectively noted P (x, y) and x,y where (x, y) is the spatial location of a specific point within the image.
3 3.1
Bias Image Correction Via Stationarity Maximization Legendre Polynomials Approximation
In our context, we consider that each pixel f (x, y) of an image f is a combination of its real intensity u(x, y), an illumination bias artifact b(x, y), and an additive 2 ) [6]. The relation is given by: white Gaussian noise x,y ≡ N (0, σnoise f = ub + .
(2)
Bias Image Correction Via Stationarity Maximization
695
According to (2), to correct each picture, we divide the observed signal f by the estimated bias ˜bm,n . Thus, this equation becomes: f ub = + , ˜bm,n ˜bm,n ˜bm,n where (m, n) are respectively the x and y polynomial orders. We systematically apply a Gaussian filtering to the image prior to estimating the bias. Hence, << ˜bm,n can be omitted and we obtain: f ub ≈ ≈ u. ˜bm,n ˜bm,n To perform this correction, we model the trend ˜bm,n of the intensity distribution by using orthogonal polynomial functions [7]. The orthogonal polynomials p. (x) is computed according to the following recurrence relation [8]: ⎧ ⎨ p0 (x) = 1, p1 (x) = x, ⎩ pm+1 (x) = (am + xbm )pm − cm pm−1 , where the triplet (am ; bm ; cm ) defines a specific polynomial family. In addition to its orthogonality properties, the Legendre polynomial is particularly well suited in our case because of its constant density and its limited interval (x ∈ [−1; +1]). A set of 2D orthogonal polynomials basis can be computed by a linear combination of 1D Legendre polynomials. Thus, polynomial images Pm,n (x, y; A) ((m, n) ∈ N+ ) are computed given the following formula: 1 αi,j pi (x)pj (y), (n + 1)(m + 1) j=0 i=0 n
Pm,n (x, y; A) =
m
(3)
where αi,j ∈ R and (x, y) ∈ [−1, +1] and A is a matrix containing the αi,j values. Figure 1 shows the 2D Legendre polynomial basis from degree 0 to degree 3. For each corrupted image f , the evaluation of the (n + 1)(m + 1) parameters αi,j of (3) is based on a least-square minimization of the functional Em,n (A) given by: 12 2 Em,n (A) = (Pm,n (x, y; A) − f (x, y)) , (4) y
x
using the multi-dimensional Polak-Ribiere conjugate gradient minimization method [9]. The minimization result min(Em,n (A)) gives us the αi,j values corresponding to the estimated bias ˜bm,n (see figures 2 & 3) for a specific (m, n) combination.
696
T. Dorval, A. Ogier, and A. Genovesio i=0
i=1
i=2
i=3
j=0
j=1
j=2
j=3
Fig. 1. 2D Legendre polynomials basis for low degree polynomials X polynomial degree
0
1
σ (µt)=0.358 σ (σ t)=9.085
σ (µt)=0.226 σ (σ t)=6.061
σ (µt)=0.429 σ (σ t)=14.406
2
3
4
σ (µt)=0.222 σ (σ t)=5.648
σ (µt)=0.335 σ (σ t)=11.597
σ (µt)=0.389 σ (σ t)=12.738
σ (µt)=0.192 σ (σ t)=4.321
σ (µt)=0.177 σ (σ t)=4.446
σ (µt)=0.152 σ (σ t)=4.613
σ (µt)=0.183 σ (σ t)=3.879
σ (µt)=0.461 σ (σ t)=16.412
σ (µt)=0.133 σ (σ t)=3.878
σ (µt)=0.145 σ (σ t)=4.003
σ (µt)=0.083 σ (σ t)=2.841
σ (µt)=0.109 σ (σ t)=3.225
σ (µt)=0.472 σ (σ t)=16.825
σ (µt)=0.131 σ (σ t)=3.781
σ (µt)=0.148 σ (σ t)=4.072
σ (µt)=0.103 σ (σ t)=3.008
σ (µt)=0.107 σ (σ t)=3.103
σ (µt)=0.449 σ ( σt)=14.788
σ (µt)=0.132 σ (σ t)=3.857
σ (µt)=0.171 σ (σ t)=4.754
σ (µt)=0.131 σ (σ t)=3.647
σ (µt)=0.127 σ (σ t)=3.487
0
Y polynomial degree
1
2
3
4
Fig. 2. Bias estimations with (m, n) ∈ [0; 4] corresponding to I1 in figure 3. The outlined bias picture (m = 3; n = 2) corresponds to the optimal correction map according to the stationarity criterion.
Bias Image Correction Via Stationarity Maximization
3.2
697
Stationarity Maximization
Obviously, the best correction results (i.e. satisfying as well as possible the stationarity assumptions) are not given by the highest values of the (m, n) couple. Indeed, with a too high polynomial degree, the correction map will not only fit the trend of the illumination artifact but the image details also, which may not be relevant for the correction. Experiments corroborate this assumption and lead us to determine these degrees in an automated way. For this, we extract this optimal result within the optimizations realized for (m, n) ∈ [0; 4] and defined in (4). The first one imposes that the energy of the signal is finite. This property is always verified due to the discretization phenomenon. The second one, the most important in the case of a trend stationary process, imposes that the expectation is not related to the temporal (or spatial) variable. To evaluate the spatial variation of the local mean μt over the image, we compute the standard deviation σ(μt ) of this signal on a sliding window. The size of the window must be large enough to provide a significant statistical representation of the imaged phenomena. Removing the trend in a TS process can introduce an unitary root (i.e a correlation between the variable in time (t) and in (t − h), h ∈ N+ ) and thus turns a TS into a DS process. In order to discriminates these two cases, we verify the independence of the second order moment σ 2 over the time, satisfying the third property of (1) for h = 0.
Biased Images: f
σ (µt)=0.358 σ (σ t)=9.085
σ (µt)=0.249 σ (σ t)=7.583
σ (µt)=0.349 σ (σ t)=28.056
b2,3
b3,2
b4,2
σ (µt)=0.083 σ (σ t)=2.841
σ (µt)=0.157 σ (σ t)=4.791
σ (µt)=0.173 σ (σ t)=17.09
Estimated Bias: b
Corrected Images: u
Fig. 3. Examples of biased images obtained by three different confocal microscopes for three different biological applications. The optimal correction maps are not given by the maximal order polynomials, but really depend of the illumination artifacts trend.
698
T. Dorval, A. Ogier, and A. Genovesio
Finally, the optimal result corresponds to the estimated bias that minimize the variation of this two moments. Figure 2 displays a set of bias estimations corresponding to the image I1 in figure 3. The outlined bias picture gives the best result according to our stationary criteria. Figure 3 displays three different biased images and their optimal correction maps associated. We can noticed that the maximal stationarity is reached for various orthogonal polynomial degrees demonstrating the accuracy of our primary assumption.
4
Biological Applications and Results
This section describes a framework for extracting circular objects within cells in a relevant way. This method can be applied to any kind of spot detection requirement, such as endosomes localization. In our purpose, objects can be considered rotationally invariant. Thus, the Hessian Hσ operator is perfectly appropriate, where the parameter σ is selected to match the spot candidate size. Thanks to this value, we are able to extract a wide variety of biological objects. Hσ allows us to define two curvature maps CM and CG coming respectively from the mean and the Gaussian curvatures as defined in [10]. To discriminate the “dome” topographic class from the other classes, we keep only the positive values of CM and CG . The curvature map ucurv is computed by ucurv = CM .CG (see figure 4e). As it is usual in cytometry imaging, one wavelength is dedicated to nucleus and/or cytoplasm imaging. This channel allows us to create a binary cells mask umask using a simple segmentation method such as K-means clustering (see figure 4d). It is important to notice that the robustness of this step is highly correlated to the restoration step quality. Pixels are weighted by an approximation of the Euclidean distance obtained by computing a connexity distance field. The distance map udist obtained is combined to ucurv by ures = udist .ucurv (see figure 4g). A threshold τmin is then applied to avoid false detections within ures . To conclude the process, a local maxima extraction is done. In our specific application, the minimum distance allowed between two successive extracted spots is given by 3σ. To assess the robustness of our correction method, we have created a ground truth based on three different biological assays. The ground truth was built Table 1. κ index and overlap coefficient for three different experiments. κ and Cover values higher than 0.7 usually represent very good results. κ(f ) Cover (f ) experiment 1 0.683 0.531 experiment 2 0.636 0.467 experiment 3 0.542 0.421
κ(u) Cover (u) 0.898 0.814 0.836 0.747 0.867 0.813
Bias Image Correction Via Stationarity Maximization
699
by manually selecting the right objects corresponding to the expected feature locations. Then to qualify these results, two statistical coefficients are computed: – the kappa index (κ) defined by [11] : κ=2
#(gt ∩ d) ; #(d) + #(gt)
(5)
– the overlapping coefficient defined by: Cover =
#(gt ∩ d) , #(gt ∩ d) + #(f p) + #(f n)
(6)
(a) Original Image: I
(b) Red channel: Ired
(h) Detected particles
(d) Segmented image: Imask
(f) Distance map: Idist
Local maxima selection
Gaussian filtering
where gt is the ground truth, d the detected objects, f p and f n the false positives and false negatives respectively, and # the cardinal of the set. Table 1 shows the results and underlines the accuracy of the correction process for different experiments.
× (c) Green channel: Igreen
(e) Curvature map: Icurv
(g) Weighted picture: Ires
Fig. 4. Step by step particles detection in 2-channel images. (a) original image. (h) detected particles locations.
5
Conclusion
In this paper, we have shown that illumination bias can have important consequences on low level biological image processing quality. To overcome this drawback, we presented a novel shading correction approach based on the image stationary maximization via Legendre polynomials modeling. This image enhancement was used for correcting different cell images. The visual quality of
700
T. Dorval, A. Ogier, and A. Genovesio
the corrected results is high and was confirmed by a ground truth based features detection. As, this method produced significant improvement of traditional biological objects detection under different imaging conditions, it could be used as a pre-processing step for any kind of higher level process. In the future, we plan to propose a 3D implementation of this shading correction framework.
References 1. Jones, T.R., Carpenter, E., Sabatini, D.M., Golland, P.: Methods for high-content, high-throughput image-based cell screening. In: MIAAB 2006 Workshop Proceedings (2006) 2. Dorval, T., Ogier, A., Dusch, E., Emans, N., Genovesio, A.: Bias free features detection for high content screening. In: 4th IEEE International Symposium on Biomedical Imaging, Metro Washington, DC, USA (2007) 3. Ogier, A., Dorval, T., Genovesio, A.: Biased image correction based on empirical mode decomposition. In: IEEE International Conference on Image Processing 2007 (ICIP 2007), San Antonio, Texas, USA (2007) 4. Tomazevic, D., Likar, B., Pernus, F.: Comparative evaluation of retrospective shading correction methods. Journal of Microscopy 208(3), 212–223 (2002) 5. Nelson, C., Plosser, C.: Trends and random walks in macroeconomic time series: Some evidence and implications. Journal of monetary economics 10, 139–162 (1982) 6. Styner, M., Brechbuhler, C., Szekely, G., Gerig, G.: Parametric estimate if intensity inhomogeneities applied to mri. IEEE Transactions on Medical Imaging 19, 153– 165 (2000) 7. Fee, G.: Linear least-squares data fitting with orthogonal polynomials. In: Maple Summer Workshop (2002) 8. Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. ninth dover printing, tenth gpo printing edn. Dover, New York (1970) 9. Polak, E.: 2.3. In: Computational Methods in Optimization, Academic Press, New York (1971) 10. Wilson, R.C., Hancock, E.: Consistent topographic surface labelling. Pattern Recognition Letters 32, 1211–1223 (1999) 11. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurements 20(3), 27–46 (1960)
Toward Optimal Matching for 3D Reconstruction of Brachytherapy Seeds Christian Labat1 , Ameet Jain1,3 , Gabor Fichtinger1 , and Jerry Prince2 1
Department of Computer Science Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, USA 3 Philips Research North America, Briarcliff, NY
2
Abstract. X-ray C-arm fluoroscopy is a natural choice for intra-operative seed localization in prostate brachytherapy. Resolving the correspondence of seeds in the projection images can be modeled as an assignment problem that is NP-hard. Our approach rests on the practical observation that the optimal solution has almost zero cost if the pose of the C-arm is known accurately. This allowed us to to derive an equivalent problem of reduced dimensionality that, with linear programming, can be solved efficiently in polynomial time. Additionally, our method demonstrates significantly increased robustness to C-arm pose errors when compared to the prior art. Because under actual clinical circumstances it is exceedingly difficult to track the C-arm, easing on this constraint has additional practical utility.
1
Introduction
Intraoperative dosimetric quality assurance in prostate brachytherapy critically depends on discerning the three-dimensional (3D) locations of implanted seeds. The ability to reconstruct the implanted seeds intraoperatively will allow us to make immediate provisions for dosimetric deviations from the optimal implant plan. A method for seed reconstruction from pre-segmented C-arm fluoroscopy images has been proposed, among other works, by Jain et al. in [1], where the 3D coordinates of the implanted seeds are calculated upon resolving the matching of seeds in multiple X-ray images. At least three images are necessary to eliminate ambiguities. The resulting optimization problem is NP-hard. Heuristic approaches, such as of Jain et al. [1], have been proposed to approximately solve the optimization problem. However, the use of a heuristics leads to algorithmic error, in addition to physical errors like the inaccuracy in knowing the relative poses of the C-arm shots (pose error). To tackle this issue, we propose to consider all the images simultaneously, instead of suboptimal subsets of two images such as proposed in [1]. The optimization problem has a salient feature: since the images represent a real situation (i.e. an existing object, the set of seeds, is being imaged), the
This work has been supported by DoD PC050042, DoD PC050170 and NIH 2R44CA099374.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 701–709, 2007. c Springer-Verlag Berlin Heidelberg 2007
702
C. Labat et al.
optimal solution has a near-zero cost when the pose error is low, and is actually zero without pose error. We propose to utilize this feature of the problem to reduce its number of variables, thereby allowing to obtain the optimal solution at a reasonable computational cost. This exact dimensionality reduction is only possible when the pose error is sufficiently low. We claim that this is not restrictive in our framework since a high pose error leads to high error in the estimation of the 3D coordinates of the implanted seeds, which is not acceptable. Actually, the idea of dimensionality reduction is not new. For instance, in [1, p. 3480] the original tripartite matching is projected into inspired bipartite matchings, while introducing inaccuracy. In contrast, the proposed method performs dimensionality reduction while ensuring equivalency to the original problem. The MARSHAL method of Jain et al. has demonstrated solid performance [1] and we chose this work as the benchmark and basis of comparison for our work. A comparison between MARSHAL and our method was conducted to evaluate the sensitivity to pose errors on simulated and phantom data. The proposed method shows significant increase in robustness to pose errors.
2
Method
Consider a collection of X-ray images of a constellation of implanted seeds. We assume that the 2D seed locations can be identified on each of the X-ray images, and we consider the problem of identifying corresponding seeds in all the images. Given these matched seed locations, a reconstruction of the seed locations in 3D can be achieved provided there are no ambiguities. It is more likely that such ambiguities are avoided when there are more X-ray images, but this in turn increases the complexity of the problem. Here we do not consider CT or limited angle tomosynthesis, as our work focuses on reconstruction from a very limited number of images. We propose a solution for three images, which is often sufficient in practice, and which is extendable to more images. 2.1
3D Reconstruction as a Matching Problem
The 3D locations of the implanted seeds, modeled as points, can be reconstructed through 3D triangulation from the X-ray images upon resolving the correspondence of seeds, which is the focus of this paper. Let n be the number of points in the clinical work volume. Let slm be the position of lth point in mth image. When three images are used, the matching problem can be formulated as an axial 3D assignment problem (3DAP) [1]: n n n min cijk xijk , where (1) xijk
i=1 j=1 k=1
⎧ xijk ∈ {0, 1} ⎪ ⎪ ⎪ ⎨n n x = 1, ∀k j=1 ijk i=1 n n ⎪ xijk = 1, ∀j ⎪ ⎪i=1 k=1 ⎩ n n ∀i j=1 k=1 xijk = 1,
(2)
Toward Optimal Matching for 3D Reconstruction of Brachytherapy Seeds
703
cijk is the the cost of matching point si1 to points sj2 and sk3 . xijk is a binary variable deciding the correctness of the match i, j, k. (2) force every segmented point to be chosen once. Thus, x represents any feasible global match, with the cost of that correspondence given by cijk xijk . One good choice for a cost-metric c is projection error (PE) [1]. For any given set of poses and correspondence, the intersection of the three lines that join each projection to its respective X-ray source can be computed using a closed form solution that minimizes the L2 norm of the error. PE can be computed by projecting this 3D point in each image and then measuring the distance between the projected location and the observed location of the point. A feasible solution x of the above problem is a 3D permutation array. This problem has (n!)2 feasible solutions. Branch and bound is a classical algorithm for optimally solving the 3DAP. This can be generally achieve only for n small because of the combinatorial explosion. Thus, it has been proposed heuristics that approximately solve the 3DAP, such as MARSHAL in [1]. MARSHAL suboptimally projects the original 3DAP into three distinct 2DAP that can be solved in polynomial time by using the Hungarian algorithm. We point out that the 3DAP has a salient feature that we can exploit. Since the images represent a real situation, the optimal solution has a near-zero cost when the pose error is low and the optimal cost is actually zero without no pose error. In the next section, we use this feature to reduce the number of variables in the problem, thus permitting us to get the optimal solution at a reasonable computational cost. This new method tackles the complete optimization problem without using suboptimal projections, such as in MARSHAL. 2.2
Dimensionality Reduction
Let N = n3 . We rewrite the variables xijk and the costs cijk in a vectorial form such that x, c ∈ N . In the sequel we also make use of the notation u to denote uijk . The 3DAP (1)-(2) reads as the following integer linear program
Ê
P :
min ct x,
(3)
x∈C
t
with the constraint set C = {x : Mx = [1, . . . , 1] , x ∈ {0, 1}}, where Mx = t [1, . . . , 1] is a matrix form of (2). Principle. Since the coefficients of x are either 0 or 1 and there must be n 1’s, an optimal solution of P can be thought of as the selection of n cost coefficients such that the resulting cost is minimum while constraints C are satisfied. Given a feasible solution, Lemma 1 (below) states that all cost coefficients that are greater than the cost associated with this solution cannot be selected in the optimal solution. Since those coefficients can never be selected, the dimension of the problem can be reduced by removing those coefficients from further consideration. This yields an equivalent problem of reduced dimensionality. If the reduction in dimensionality is sufficiently large, then the new problem can be solved exactly in reasonable time even though the original problem is far too costly to solve.
704
C. Labat et al.
Lemma 1. Let us assume that the cost coefficients c are positive. Let x0 ∈ C be a feasible solution. The integer linear problem P defined by (3) is equivalent to the following integer linear problem (ie., they share the same optimal solutions) P : c =
min(c )t x, x∈C
c , ∞,
where
if c ≤ mP (x0 ) if c > mP (x0 )
(4)
and where mP (x) = ct x is the cost of problem P at the feasible solution x. Proof. Let x∗ ∈ C be an optimal solution to problem P . We have for all x0 ∈ C mP (x∗ ) ≤ mP (x0 ).
(5)
Let us consider c > mP (x0 ). From (5) we have c > mP (x∗ ). Since c are positive and x∗ are binary, we have necessarily x∗ = 0. The dimensionality reduction can be illustrated as follows: ⎡ ⎤ ⎡ ⎤ ⎡c ⎤ ⎡ ⎤ ⎡ ⎤ i1 c1 c1 ci1 c˜1 ⎢ ⎥ ⎢ c2 ⎥ ⎢ c2 ⎥ ⎢ ∞ ⎥ ⎢ c˜2 ⎥ ⎢ ci2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ci2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ . ⎥ . ⎥ P :⎢ ⎥ −→ P˜ : ⎢ .. ⎥ = ⎢ .. ⎥ ⎢ .. ⎥ −→ P : ⎢ .. ⎥ = ⎢ ⎢ ⎥ ⎣ . ⎦ ⎣ . ⎦ . ⎣ ⎦ ⎣ ⎦ ⎣ . ⎦ . c˜K ciK cN cN ciK The original problem P is equivalent to the reduced problem P˜ :
˜ min c˜t x, ˜ C˜ x∈
Ê
˜x ˜ c˜ ∈ K (K ≤ N ) and with the constraint set C˜ = {x ˜ : M ˜ = where x, t ˜ [1, . . . , 1] , x˜ ∈ {0, 1}} with M = MR and where R is the dimensionality reduct t ˜2 . . . x ˜K . ˜1 x tion matrix of size N × K such that xi1 0 xi2 0 . . . xiK = R x Once the reduced problem P˜ is solved, the optimal solution to the original prob˜ ∗. lem P is simply given by x∗ = Rx Note that a dimensionality reduction (K < N ) is not guaranteed for the general 3DAP, even in the most favorable case mP (x0 ) = mP (x∗ ). However, a dimensionality reduction occurs in the case of our problem since the range of the cost coefficients is wide and the optimal solution has a near-zero cost when the pose error is low. The practical interest clearly depends on the dimensionality reduction ratio (K/N ). We show next that this ratio actually can be improved. Improving dimensionality reduction. Lemma 1 uses only the integer constraint in C to reach dimensionality reduction. But using the fact that the feasible set C comprises permutation matrices, is is possible to reduce the dimensionality of this problem even further. To demonstrate this, we start with the following Lemma.
Toward Optimal Matching for 3D Reconstruction of Brachytherapy Seeds
705
Lemma 2. The integer linear problem P defined by (3) is equivalent to the following integer linear problem P :
min(c )t x x∈C
where the minimum cost in each row is subtracted from the entire row cijk = cijk − min cijk .
(6)
j,k
Proof. For lack of space, the proof is not detailed. To show that this Lemma permits further dimensionality reduction, let us apply Lemma 1 on the new problem P . From (4), the cost coefficient c is equivalent to ∞ if c > mP (x). According to (6) the former condition is equivalent to n cijk > mP (x) − ( min cijk − min cijk ) i=1
j,k
j,k
The latter condition is less restrictive than c > mP (x) since the cost coefficients are positive. As a result, the dimensionality reduction is higher within the new problem P than within the original problem P . It is then preferable to consider problem P instead of problem P since Lemma 2 ensures that they are equivalent. It is actually possible to reduce dimensionality even further. The operation (6) can also be performed successively for the columns and depths to decrease the cost coefficients, while still ensuring equivalency to the original problem. 2.3
Optimization Strategy
Integer Programming / Linear Programming. The Integer Program (1)(2) can be directly solved with standard techniques such as branch and bound. IP problems are NP hard, however, and may take an exponential amount of computational time. It has been shown that the linear program corresponding to the 2D assignment problem (2DAP) has an integer solution even without integer constraints [2]. As well, this linear program can be solved efficiently in polynomial time using interior point methods for instance [3]. To our knowledge, however, there is no analogous result for the 3DAP. Nevertheless, we have gone ahead and implemented the linear program for the 3DAP problem followed by a test to see if its solution is binary (up to numerical errors). In all of our experiments, we have never obtained a nonbinary solution to this problem, which points to the potential validity of the 2D theoretical result in 3D as well. (We are currently investigating this theoretical issue.) Dimensionality reduction thresholding. From Lemma 1, the degree of dimensionality reduction depends on the cost mP (x0 ) of a feasible solution x0 . It is possible to use a suboptimal algorithm, such as MARSHAL, to find a feasible solution x0 . Unfortunately, when there is high pose error, MARSHAL often provides such a suboptimal solution that very little dimensionality reduction can be
706
C. Labat et al.
be achieved. Therefore, we propose instead to choose a threshold parameter η, which is essentially a “guess” as to what the cost of a feasible solution might be. This permits us to reduce the dimensionality of the problem and run the linear problem on the resultant problem. If the solution of this problem is integer and it has a cost lower than η, then it must be optimal. If the resultant cost is larger than η then the solution might be optimal, but we cannot guarantee it. It is then our option whether to accept a (potentially) suboptimal solution or to increase η and rerun the linear problem until we have a guaranteed optimal solution. In our experiments, we determine η in the following way. Rank order all costs from lowest to highest and pick an integer K. Let η be the value of the K th smallest cost coefficient. The influence of K on the rate of feasibility and optimality of the proposed method is experimentally studied in the next section.
3
Simulation and Phantom Experiments
We present a comparison between the MARSHAL algorithm and the proposed method using simulation and phantom data. We got a copy of the MARSHAL code from the authors for comparison [1]. Both algorithms were implemented in Matlab 7.1 on a Linux PC (Intel EM64T 3.6 GHz, 24GB RAM). 3.1
Evaluation of Pose Errors Sensitivity
Two separate comparisons were performed. One compared the two algorithms to translational errors and the other to rotational errors. These are both common errors in C-arm position calibration in the operating room. Random error was modeled using a uniform probability density function. When we report results for an h error this means that each of the three components of error (in either translation or rotation) were generated as independent random variables uniformly distributed on [−h, h]. Following [1], a statistical bias in translation was incorporated in the generation of the datasets to account for the expected differences in directional errors in fluoroscope tracking using a fiducial. In particular, we assumed that the in-plane error is a factor of five times smaller than the through-plane error h. No analogous bias was used in the rotation errors. Realistic simulations of prostate brachytherapy seeds implants were generated with seeds density of 2 and 2.5 seeds/cc and prostate size of 35 and 45 cc. The number of seeds in the implants were n = {72, 84, 96, 112}. A cone angle of 10o was used for the acquisition of the three simulated X-ray images. Averaged results from a total of 2, 000 datasets are shown in Fig. 1. The proposed method, with 100n cost coefficients remaining after dimensionality reduction, performs significantly better than MARSHAL, as shown in Fig. 1(a)-(d). It still matches correctly 89% of the seeds when rotation error reaches 4o , while MARSHAL drops to 59%. The proposed method still matches correctly 99% of the seeds when the translation error reaches 10 mm, while MARSHAL drops to 72%. The computational time of the proposed method is higher than that of MARSHAL as shown in Fig. 1(e)-(f). We point out that most of the computational time of the proposed method (solid line) is actually used in the
Toward Optimal Matching for 3D Reconstruction of Brachytherapy Seeds Rotation pose error (o )
Translation pose error (mm)
100
100 90
80
60 0
707
MARSHAL New method 1
(a) 2
80 3
70 0
4
(b)
MARSHAL New method 2
4
6
8
10
6
8
10
6
8
10
80
20
MARSHAL New method
(c)
60
MARSHAL New method
(d)
40
10
20
0 0 200
1 MARSHAL New method
2
3
200
(e)
2
4
MARSHAL New method
(f)
100
100 0
0 0
4
1
2
3
4
0
2
4
Fig. 1. Performance comparison between MARSHAL and the new method for pose error. (a)-(b): Mean of matching rate (%), (c)-(d): STD (%), (e)-(f): Computation time (s), dotted line: time required solely for cost minimization (new method). Table 1. Feasibility and optimality of the proposed method for pose error Number of cost coef. Error 1 o Error 2 o Error 3 o Error 4 o Error 2 mm Error 4 mm Error 6 mm Error 8 mm Error 10 mm
20n 0.9 0.9 0.58 0.42 0.96 0.94 0.96 0.77 0.58
Feasibility rate 50n 100n 500n 0.96 1 1 0.98 1 0.98 0.94 0.98 0.94 0.83 0.92 0.92 1 1 1 0.98 1 1 0.96 1 0.96 0.9 0.98 0.94 0.92 0.96 0.94
1000n 0.98 0.94 0.92 0.85 0.94 0.92 0.88 0.88 0.77
Guaranteed optimality rate 20n 50n 100n 500n 1000n 0.88 0.91 0.94 0.98 1 0.35 0.38 0.46 0.6 0.71 0.1 0.13 0.13 0.22 0.27 0.15 0.1 0.07 0.07 0.1 1 1 1 1 1 0.56 0.6 0. 65 0.77 0.89 0.35 0.41 0.46 0.63 0.64 0.14 0.19 0.19 0.31 0.38 0 0.02 0.04 0.07 0.08
computation of the cost coefficients. However, computing all the cost coefficients is not required since most of them will eventually be thrown out by dimensionality reduction. After further code optimization, we expect the computation time to reduce, near to the time required solely for cost minimization (dotted line). The feasibility and optimality rates of the proposed method as a function of the number of remaining cost coefficients after dimensionality reduction are shown in Tab. 1. It is expected that the feasibility rate should increase as a function of the number of cost coefficients. This is true for smaller numbers of cost coefficients but, surprisingly, the feasibility rate decreases when the number of cost coefficients reaches 500n ( 50, 000). This is due to the failure of linprog in Matlab using default parameters because “one or more of the residuals, duality gap, or total relative error has stalled”. These cases were not displayed in Fig. 1, and we are currently investigating how to cope with them. The guaranteed optimality rate increases as a function of the number of cost coefficients. For low errors (1o and 2 mm), all solutions are guaranteed optimal given an acceptable dimensionality ratio. We point out that the guaranteed
708
C. Labat et al.
optimality is only a sufficient condition. As shown in Fig. 1(b), all solutions from 0 to 8 mm translation error of the proposed method correspond to perfect matching (100%), even if they are not all guaranteed optimal. 3.2
Phantom Experiments
A radiographic fiducial was used to track the C-arm (0.56 mm translation; 0.33o rotation accuracy) and was accurately attached to a random point cloud phantom. Phantoms of {40, 55, 70, 85, 100} points with 1.56 points/cc were used. Six X-ray images within a 20o cone around the AP-axis were randomly taken using an Philips Integris V3000 fluoroscope and dewarped. Thus both the seed locations and X-ray pose were not biased/optimized in any way, closely representing an uncontrolled surgical scenario. Each image was hand segmented to establish the true segmentation and correspondence. Both MARSHAL and the proposed method perform very well on phantoms, achieving almost perfect matching, as shown in Tab. 2. Note that the accuracy of the radiographic fiducial ensures here a low pose error. The proposed method is significantly slower compared to MARSHAL. We point out that the proposed method uses here more than 90% of the computational time for the n3 cost coefficients. We expect to reduce significantly this time as explained in Sect. 3.1. Table 2. Performance of MARSHAL and the proposed method on phantoms Number of seeds Mean Match. (%) STD Match. (%) Time (s)
40 97.6 3.6 0.3
MARSHAL 55 70 85 100 98 97.7 0 2.3 3.2 0.6 1 2.5
100 98.2 2.3 3.1
40 98 2.6 12.6
Proposed method 55 70 85 99.4 97.1 100 1.4 0 0 32 64.6 106.6
100 98 0 185
Conclusion and Future Work. In summary, we achieved significant increase in the robustness to pose errors compared to [1] by considering all images simultaneously, instead of using subsets. Experimentally, our method ensured optimality for small pose errors. C-arm tracking is a cumbersome process and easing on this constraint has great practical utility. With our method, a less accurate estimation of the pose may suffice. For example in [4], starting from an initial guess, pose was further estimated iteratively using the current 3D reconstruction, yet the seed matching remained susceptible to pose errors. Applying our method to [4] promises a clinically viable solution without using external tracker or encoder on the C-arm. We are also extending the method to reconstructing overlapping seeds that are occluded in one or more X-ray images [5,6].
References 1. Jain, A., et al.: Matching and reconstruction of brachytherapy seeds using the hungarian algorithm (marshal). Med. Phys. 32(11), 3475–3492 (2005) 2. Papadimitriou, C.H., Steiglitz, K.: Combinatorial optimization: algorithms and complexity. Prentice-Hall, Inc., Englewood Cliffs (1982)
Toward Optimal Matching for 3D Reconstruction of Brachytherapy Seeds
709
3. Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont, USA (1999) 4. Jain, A., et al.: C-arm tracking and reconstruction without an external tracker. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 494–502. Springer, Heidelberg (2006) 5. Su, Y., et al.: Prostate brachytherapy seed localization by analysis of multiple projections: Identifying and addressing the seed overlap problem. Med. Phys. 31(5), 1277–1287 (2004) 6. Narayanan, S., et al.: Three-dimensional seed reconstruction from an incomplete data set for prostate brachytherapy. Phys. in Med. and Bio. 49(15), 3483–3494 (2004)
Alignment of Large Image Series Using Cubic B-Splines Tessellation: Application to Transmission Electron Microscopy Data Julien Dauguet1 , Davi Bock2 , R. Clay Reid2 , and Simon K. Warfield1 1
Computational Radiology Laboratory, Children’s Hospital, Harvard Medical School, Boston, USA 2 Department of Neurobiology, Harvard Medical School, Boston, USA
Abstract. 3D reconstruction from serial 2D microscopy images depends on non-linear alignment of serial sections. For some structures, such as the neuronal circuitry of the brain, very large images at very high resolution are necessary to permit reconstruction. These very large images prevent the direct use of classical registration methods. We propose in this work a method to deal with the non-linear alignment of arbitrarily large 2D images using the finite support properties of cubic B-splines. After initial affine alignment, each large image is split into a grid of smaller overlapping sub-images, which are individually registered using cubic B-splines transformations. Inside the overlapping regions between neighboring sub-images, the coefficients of the knots controlling the Bsplines deformations are blended, to create a virtual large grid of knots for the whole image. The sub-images are resampled individually, using the new coefficients, and assembled together into a final large aligned image. We evaluated the method on a series of large transmission electron microscopy images and our results indicate significant improvements compared to both manual and affine alignment.
1
Introduction
Understanding the three-dimensional organization of complex structures and processes is a challenging topic in structural biology. The alignment of serial histological sections is a powerful way of performing such three-dimensional reconstruction of tissues and structures. Many linear and non-linear methods have already been proposed for alignment of histological images at the microscopical scale, including the alignment of images across medical image modalities (for example MRI, Magnetic Resonance Imaging). Works dedicated to the brain include [1], [2] for affine transformations, and [3], [4], [5] for elastic alignments. The direct use of these methods is usually only possible on a series of images whose size is compatible with the processing capabilities of normal machines in reasonable amounts of time. However, images with very large fields of view at very high resolution are becoming more and more common in structural biology imaging, in both light and electron microscopy. Creation of these images is facilitated by motorized N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 710–717, 2007. c Springer-Verlag Berlin Heidelberg 2007
Alignment of Large Image Series Using Cubic B-Splines Tessellation
711
microscope stages and mosaicing softwares. The processing of series of large images is thus difficult because of the unusual size and resolution of the data. The degree of details contained in these images and the deformations induced by the distortion of the lens, the camera, the montage software (mosaicing), as well as the intrinsic deformation of the tissue from one section to another, necessitate the use of non-linear transformations. Some methods have been specifically developed to process large microscopy images. Kremer [6] and Koshevoy [7] proposed a method to montage large multiple camera images into a single large field of view image. Burt [8] worked on the continuity of intensities when combining two or more images into a large mosaic, using multiresolution splines over transition zones. Thevenaz [9] proposed a mosaicing method to assemble microscopy images. Baronio [10] used parallel splines on sub-domains to restore large images and obtained continuity of the solution by adding an overlapping area. Mikula [11] proposed a framework to share large high resolution histological images using pyramid generation for Web-accessibility, whereas Kumar [12] proposed a framework to process very large images in a cluster environment. This list of dedicated methods to process very large images is not exhaustive. However, for the specific problem of aligning large serial section images, very few solutions have been proposed: Kumar [12] proposed a section-to-atlas warping method, whereas Koshevoy [7] proposed a fiducial-point based approach to align electron microscopy sections, with no parallel implementation. We propose in this work a solution to handle this problem based on the finite support properties of the cubic B-splines (see [13] for a description of spline transformations for image processing). We applied our method to achieve the 3D reconstruction of a series of Transmission Electron Microscopy (TEM) images. The method is well suited for parallel distributed computation on a networked cluster, permitting scalability to arbitrarily large images.
2
Method
General method: To align the series of images, we perform a slice-to-slice registration taking as reference the slice in the middle of the stack. Initially, each pair of 2D images is registered using first a rigid and then an affine transformation. These initial estimates are generated using the blockmatching technique as described in [2] (implemented in BrainVisa: http://www.brainvisa.info). The images are first downsampled (Gaussian smoothing plus bi-linear interpolation) so that they can be processed on a regular machine; typically the resulting downsampled image is less than 1, 000 pixels large in both dimensions. The estimated affine transformation is then applied to the full resolution image. After initial affine alignment, each large image is split into a grid of overlapping smaller sub-images whose size is suitable for the memory capacity of the machines used. The number of desired sub-images, as well as the number of control points in each sub-image (which will be used for the subsequent cubic B-splines registration) is set by the user. These parameters virtually define a grid of control points on each large image.
712
J. Dauguet et al.
An array of sub-images is thus defined both on the target image and on the image to be registered (the “floating” image). The pairs of corresponding subimages from the target and floating images are individually registered using cubic B-splines transformations estimated by optimization of the mutual information criterion as described in [14] (implemented using the Insight ToolKit: http://www.itk.org/). A pyramidal strategy is used both for the number of control points and for the image resolution. Since the cubic B-splines transformations have a finite support, the deformation in a particular location is only influenced by the weights of a limited number of surrounding sub-images’ individual control point grids. For each control point of absolute coordinates (i0 , j0 ) in the virtual global grid for the large image, the new pair of coefficients cmerged = (cmergedx , cmergedy ) (one spline coefficient for each direction x and y in 2D) is estimated by blending the coefficients of all the sub-images whose individual grid contains location (i0 , j0 ) as follows: cmerged (i0 , j0 ) = ωP,Q (i0 , j0 )cP,Q (i0 , j0 ) (1) P,Q∈Ω(i0 ,j0 )
where ci,j (i0 , j0 ) are the old coefficients estimated from the individual registration of sub-image (P, Q) in the global array of sub-images, ωP,Q (i0 , j0 ) are the weights controlling the contribution of the coefficient of the control point (i0 , j0 ) from sub-image (P, Q); and Ω(i0 , j0 ) is the set of indexes of sub-images (P, Q) which include the control point (i0 , j0 ) in their own individual control points grid. The parameters of the cubic B-spline transformation estimated for each subimage are thus corrected to take into account the contribution of all the neighboring images and to guarantee the continuity of the elastic transformation at the border of each sub-image with its neighbors (see Figure 1 just for two images). All the sub-images are then resampled using their local corrected cubic Bspline transformation, and are assembled together into a final global large resampled image. The resampled floating image is then used as the target image for the next slice-to-slice registration of the series. Application to Electron Microscopy image series: We tested the whole alignment method on a series of 57 TEM images of the lateral geniculate nucleus of a ferret. Each image was about 10, 000 × 10, 000 pixels large with a pixel resolution of 3nm and a slice thickness of 60nm. We used blendmont, a utility that is part of the IMOD package ([6]), to reconstruct the big large field of view image from the 5 × 5 mosaic of smaller images coming from the camera1. The TEM images were downsampled by a factor of 10 for the initial rigid+ affine estimation and were then resampled in full resolution. We chose to split the resulting large images into a P × Q grid, where P = Q = 10 of sub-images (100 in total) for the elastic registration. Each pair of sub-images (size about 1
The assembly of mosaic images from the camera is a pre-processing step out of the scope of this work.
Alignment of Large Image Series Using Cubic B-Splines Tessellation
713
1200 × 1300 pixels) was registered using two pyramid levels (level 1: 3 × 3 grid of control points and downsampling by 2 in each direction; level 2: 5 × 5 grid at full resolution) to estimate the B-spline elastic transformation. The weights 1 , where card from Equation 1 were simply chosen as ωP,Q (i0 , j0 ) = card(Ω(i 0 ,j0 )) stands for the number of elements of a set. Aside from the automated registrations, the sections were also manually aligned (affine transformations matching selected points, same reference slice as for the automated alignments) by a TEM expert, and several structures were manually segmented using Reconstruct ([15]). We performed a quantitative comparison between the manual, the affine and the elastic alignments described in this work by estimating the Dice score of superimposition ([16]: the Dice score ranges from 0 (no overlap) to 1 (total overlap) ) between slices for three different segmented structures. Note that we used the original non-aligned slices as input data for both the affine and the elastic alignments.
Fig. 1. The merging of the 8 × 8 grids of control points for 2 neighboring sub-images. With more images, each control point can have up to 9 contributions from neighboring sub-images’ grids (for cubic B-spline functions).
3
Results
To qualitatively estimate the quality of the manual, affine and the elastic alignment, orthogonal views of the stack were created. Since the slices processed were very large, it was not possible to create and display an actual 3D volume. The orthogonal views were thus created by extracting the line of each large slice corresponding to the orthogonal view considered, and concatenating these lines together to create a 2D slice. Although the manual and affine alignment were already good, the smoothness of the reconstruction was improved using our elastic scheme (see Figure 2 ). The slice-to-slice Dice score for the 3 alignment methods strongly varied across the series (see Figure 3). This score depends on the shape of the segmented structures and can drop dramatically at the end of a structure. The Dice score variations followed the same pattern for all the methods, which means no error propagation or particular bias was associated with our elastic method.
714
J. Dauguet et al.
(a)
(b)
(c)
Fig. 2. Orthogonal views of the aligned stack using manual (a), affine (b) and elastic (c) transformations. The Arrow indicates structures which appear smoother on the elastic alignment.
The mean Dice score for the elastic alignment method was always better than the manual and affine methods. Since the Dice score of each method tested was already high (the data we processed did not suffer from very large non-linear deformations), the differences were relatively small. However, Student T-tests performed to compare the distribution of the slice-to-slice Dice scores between manual/elastic and affine/elastic alignment (paired values, two-tailed distribution) were statistically significant, except for segmentation C with manual alignment (see Table 1). Table 1. Mean dice score and percentage of difference relatively to the elastic alignment over all the slices for the segmentations A, B and C for the three alignment methods. Significant differences (p < 0.05) are indicated in bold. Manual Segmentation A Segmentation B Segmentation C
4
Affine
0.773 0.784 -2.69% (p < 10−7 ) -1.24% (p < 10−7 ) 0.872 0.873 -0.60% (p = 0.03) -0.47% (p < 10−7 ) 0.797 0.793 -0.94% (p = 0.5) -1.50% (p < 10−7 )
Elastic 0.794 0.877 0.805
Discussion
We chose B-splines transformations to model the deformations between corresponding sub-images because B-splines are flexible, easy to use and robust. Moreover, B-splines are a particularly powerful tool to control the continuity of the transformation between sub-images since just by operating on a limited
Alignment of Large Image Series Using Cubic B-Splines Tessellation
715
Slice-to-slice dice score for segmentation A 1
0.9
Dice score
0.8
0.7
Manual Affine Elastic
0.6
0.5
0.4
0.3 1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
Slice number
Slice-to-slice dice score for segmentation B 1
0.9
Dice score
0.8
Manual Affine Elastic
0.7
0.6
0.5
0.4 1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
Slice number
Slice-to-slice dice score for segmentation C 1 0.9 0.8
Dice score
0.7 0.6 Manual Affine Elastic
0.5 0.4 0.3 0.2 0.1 0 1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
Slice number
Fig. 3. Dice score for 3 different segmentations comparing manual, affine and elastic alignment for each pair of slices. The corresponding segmentations are displayed in red over an EM slice on the right.
number of discrete values (the control points’ coefficients), the mathematical continuity of the whole image is guaranteed everywhere (see Figure 4 for an example of continuity). Because the global transformation estimate is obtained by merging independently computed transformations of each sub-image, our scheme is well suited for distributed computation. Provided enough disk space and compute nodes, our registration scheme permits registration of arbitrarily large images. It is thus particularly well adapted to the processing of very large field of view, high resolution microscopy images.
716
J. Dauguet et al.
Fig. 4. Assemblage of the sub-images before (left) and after (right) merging the control points coefficients to guarantee continuity of the transformation (detail of a slice)
For very large images, libraries specialized for large image processing (in which the entire image is never completely loaded into memory) must be used to downsample the input image, apply the initial affine transformation and split it into sub-images prior to the actual elastic registration. The freely available package ImageMagick (http://www.imagemagick.org/) is one example of a library with these properties. The tuning of parameters controlling the elastic deformation is particularly difficult when dealing with large data. The simple display of single result images, much less a stack of images, rapidly becomes impossible with classical tools. In our study, we performed not less than 5700 elastic registrations (and as many rigid and affine registrations). The parameters we chose allowed us to guarantee a robust behavior and to get significantly improved results compared to manual and affine alignment methods. However, automated parameter tuning on a cluster environment would probably result in transformations flexible enough to capture and correct the finest deformations. The volume we reconstructed in this work was about 30 × 30 × 3.5 μm3 , which is promising but still small for the purpose of understanding the threedimensional organization of neuronal connectivity. Future efforts will include using our scheme on even larger images and longer series, and developing other processing methods, like automated segmentation, designed to perform on subdomains individually with consistent global results.
Acknowledgements This investigation was supported by The Connectome Project in the Center for Brain Science, Harvard University and in part by NSF ITR 0426558, a research grant from CIMIT, grant RG 3478A2/2 from the NMSS, and by NIH grants R03 CA126466, R01 RR021885 and P30 HD018655.
Alignment of Large Image Series Using Cubic B-Splines Tessellation
717
References 1. Hibbard, L., Hawkins, R.: Objective image alignment for three-dimensional reconstruction of digital autoradiograms. J. Neurosci. Methods 26(1), 55–74 (1988) 2. Ourselin, S., Roche, A., Subsol, G., Pennec, X., Ayache, N.: Reconstructing a 3D structure from serial histological sections. Image and Vision Computing 19(1-2), 25–31 (2001) 3. Kim, B., Boes, J., Frey, K., Meyer, C.: Mutual information for automated unwarping of rat brain autoradiographs. NeuroImage 5(1), 31–40 (1997) 4. Schormann, T., Zilles, K.: Three-dimensional linear and nonlinear transformations: An integration of light microscopical and mri datas. Human Brain Mapping 6, 339– 347 (1998) 5. Chakravarty, M., Bertrand, G., Hodge, C., Sadikot, A., Collins, D.: The creation of a brain atlas for image guided neurosurgery using serial histological data. Neuroimage 30(2), 359–376 (2006) 6. Kremer, J., Mastronarde, D., McIntosh, J.: Computer visualization of threedimensional image data using IMOD. J. Struct. Biol. 116, 71–76 (1996) 7. Koshevoy, P., Tasdizen, T., Whitaker, R., Jones, B., Marc, R.: Assembly of large three-dimensional volumes from serial-section transmission electron microscopy. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 10–17. Springer, Heidelberg (2006) 8. Burt, P., Adelson, E.: A multiresolution spline with application to image mosaics. ACM Transactions on Graphics 2(4), 217–236 (1983) 9. Th´evenaz, P., Unser, M.: User-friendly semiautomated assembly of accurate image mosaics in microscopy. Microscopy Research and Technique 70(2), 135–146 (2007) 10. Baronio, A., Zama, F.: A domain decomposition technique for spline image restoration on distributed memory systems. Parallel Computing 22, 101–110 (1996) 11. Mikula, S., Troots, I., Stone, J., Jones, E.: Internet-enabled high-resolution brain mapping and virtual microscopy. NeuroImage 35 (2006) 12. Kumae, V., Rutt, B., Kurc, T., Catalyurek, U., Saltz, J.: Large image correction and warping in a cluster environnement. In: Super Computing, Tampa, Florida, USA, IEEE (2006) 13. Unser, M.: Splines: A perfect fit for signal and image processing. IEEE Signal Processing Magazine 16(6), 22–38 (1999) (IEEE Signal Processing Society’s 2000 magazine award) 14. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: Application to breast mr images. IEEE Transactions on Medical Imaging 18(8), 712–721 (1999) 15. Fiala, J.: Reconstruct: a free editor for serial section microscopy. J. Microscopy 218, 52–61 (2005) 16. Dice, L.: Measures of the amount of ecologic association between species. Ecology 26, 207–302 (1945)
Quality-Based Registration and Reconstruction of Optical Tomography Volumes Wolfgang Wein2,1 , Moritz Blume1 , Ulrich Leischner3 , Hans-Ulrich Dodt3 , and Nassir Navab1 1
Chair for Computer Aided Medical Procedures (CAMP) Technische Universit¨ at M¨ unchen, Germany {wein,blume,navab}@cs.tum.edu 2 Imaging & Visualization Department Siemens Corporate Research, Princeton, NJ, USA [email protected] 3 Max Planck Institute of Psychiatry, Munich, Germany {leischner,dodt}@mpipsykl.mpg.de
Abstract. Ultramicroscopy, a novel optical tomographic imaging modality related to fluorescence microscopy, allows to acquire cross-sectional slices of small specially prepared biological samples with astounding quality and resolution. However, scattering of the fluorescence light causes the quality to decrease proportional to the depth of the currently imaged plane. Scattering and beam thickness of the excitation laser light cause additional image degradation. We perform a physical simulation of the light scattering in order to define a quantitative function of image quality with respect to depth. This allows us to establish 3D-volumes of quality information in addition to the image data. Volumes are acquired at different orientations of the sample, hence providing complementary regions of high quality. We propose an algorithm for rigid 3D-3D registration of these volumes incorporating voxel quality information, based on maximizing an adapted linear correlation term. The quality ratio of the images is then used, along with the registration result, to create improved volumes of the imaged object. The methods are applied on acquisitions of a mouse brain and mouse embryo to create outstanding three-dimensional reconstructions.
1
Introduction
Ultramicroscopy [1] denotes a microscopical technique where the sample is illuminated from the side, perpendicular to the direction of observation (figure 1). It combines the concept of fluorescence microscopy with a procedure that makes biological tissue transparent [2,3]. The principle of the latter is to replace the water contained in the sample by a liquid of the same refractive index as the proteins and lipids. Therefore scattering effects can be minimized and the transparency of the sample is regained; optical imaging deep inside the biological tissue is then possible. The microscope’s focal plane, arbitrarily placed within the sample, is sideways illuminated with an Argon laser. Only the fluorescent light N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 718–725, 2007. c Springer-Verlag Berlin Heidelberg 2007
Quality-Based Registration and Reconstruction
719
GFP-filter (505-555nm) Zeiss Fluar 2.5x / 0.12
cylinder lens
brain
slit aperture
488nm
Laser
cellular resolution
Fig. 1. Imaging setup of the Ultramicroscopy system
(generated by autofluorescence of the tissue) is measured through the microscope with a GFP-filter (505-555 nm wavelength range), and stored by a digital camera (resolution 1392 x 1024, 12 Bit grayscale). A micropositioning device advances the tray with the sample in steps of 12μm, hence a stack of slices is recorded. The resulting data is a comprehensive three-dimensional reconstruction with approximate isotropic voxel size of 10μm. This imaging modality allows the 3D-recording of large biological samples (> 1mm) with micrometer resolution, where practically no technique existed yet. A great number of biological research projects can benefit from it. Due to tissue inhomogenity, the fluorescent light is still scattered to some extent while passing through the substance. Hence lower slices suffer a blurring effect, in relation to the distance that light travels through the object to the microscope. Our approach to overcome this problem is to acquire volumes with different orientations of the sample, while establishing corresponding volumes with quality information at the same time. This quality information will be used for both spatial registration of the different recordings, as well as the reconstruction of improved volumes disposed of blurring.
2
Quality Function
We want to establish a function Q : Ω → [0..1];
Ω ⊂ R3
(1)
which returns the relative quality at any position in the image space Ω. It is determined by the amount of scattering of the measured light, which in turn depends on the depth that the light is traveling through the object. Assuming that light is only being scattered in the sample and not the surrounding liquid, we can reduce our problem to computing the amount of scattering with respect
720
W. Wein et al. −3
8
6
x 10
3
x 10
7 2.5
2 5
variance
comulated weight
6
4
1.5
3 1 2 0.5 1
0 −8000
−6000
−4000
−2000
0
2000
4000
6000
8000
0 1000
2000
3000
distance/ m
4000
5000
6000
7000
depth/ m
Fig. 2. Results of simulated scattering and function of standard deviation per depth
to tissue depth. This is done using a Monte Carlo simulation of light propagation similar to [4], which we briefly describe in the following. Instead of tracing single photons, we consider photon packets with a certain initial weight for efficiency. We assume that ”centers” where both scattering and absorption occur, are distributed uniformly throughout the tissue. Photon packets are initialized with the emitting position x0 = (0, 0, −z)T and weight w = 1. Then they repeatedly travel from their actual position xi a certain distance si in direction di , until a scattering and absorption event occurs. The photon absorption obeys the classical attenuation relationship N (s) = N0 e−μt s
(2)
where μt is the transmission coefficient, N(s) is the number of photons remaining at distance s from an original number N0 . An adequate generating function g(x) for the probability variable s from a uniformly distributed variable X is g(x) =
1 log(1 − x) μt
(3)
The mean free pathlength is < s >= 1/μt . The scattering in tissue can be characterized by the Henyey-Greenstein phase function, which is a probability density function of the scattering angle, given an anisotropy factor g: fHG (φ) =
1 − g2 3
4π(1 + g 2 − 2g cos φ) 2
(4)
For our simulation it needs to be transformed to a generating function from a uniformly distributed random variable X as well: 2 1 − g2 1 2 1+g − cos φ = (5) 2g 1 − g + 2gX
Quality-Based Registration and Reconstruction
721
In each iteration, the photon position, orientation and weight is then updated: μa xi+1 = xi + si di ; wi+1 = w − ; di = (dx , dy , dz )T μt ⎛ sin θ ⎞ √ 2 (dx dz cos φ − dy sin φ) + dx cos θ 1−dz ⎜ sin ⎟ ⎟ √ θ di+1 = ⎜ (6) ⎝ 1−d2z (dy dz cos φ + dx sin φ) + dy cos θ ⎠ 2 − sin θ cos φ 1 − dz + dz cos θ The simulation is terminated if the photon packet reaches the top of the object, x3 ≥ 0, where xi = (x1 , x2 , x3 )T . If the weight falls below a threshold wi < wT , a roulette approach decides if the photon packet is terminated. With a defined probability pt the photon packet is discarded, otherwise it is reinserted in the simulation with a new weight w0 = wi /pt . This makes sure that the energy conservation is not violated. We are interested in the distance of the virtual point, where the light seems to come from assuming a straight line through the image plane, to the point where the simulation was started. The variance of this distance for many photon packets directly relates to the amount of blurring, i.e. our sought-after quality. Figure 2 depicts the deviation results for a simulation at particular depth, as well as the function of quality versus depth. The latter is approximately a linear relationship, which we accordingly use for assembling volumes of quality information Q(x).
3
Quality-Based Registration and Merging
For multiple acquisitions, the preparation is carefully re-oriented within the test tube. No significant deformations occur in this context, however the coordinate system of the second acquisition has to be mapped onto the first one with very high precision, in order to use the combined information for reconstruction. This alignment is hence performed using an automatic rigid intensity-based registration method [5]. Such methods conduct a non-linear optimization of the transformation parameters, in order to maximize a similarity criterion defined on the voxel intensities of the reference and template volumes R and T , respectively: φreg = arg max S ({(R(xi ), T (φ(xi ))) |xi ∈ Ωφ }) φ
(7)
where {xi } are all discrete voxel positions of the reference volume, φ is a 6-DOF rigid transformation, and Ωφ is the volume overlap region for a given φ. We use Normalized Cross-Correlation (NCC) as similarity criterion, which has been used extensively for registration of 3D volumes arising in medical imaging [5]. ri = ri − r;
t = ti − t
i r ti S = i2 i 2 i ri i ti
(8)
For all voxels ri = R(xi ) of the reference volume, the corresponding voxel ti = T (φ(xi )) is trilinearly interpolated from the template volume. r and t are the mean values of all reference and template intensities, respectively.
722
W. Wein et al.
(a) Reference Volume
(b) Registered Template Volume
(c) Reference Quality Volume
(d) Registered Template Quality
Fig. 3. Vertical slice of intensity and quality information from registered brain data
In order to incorporate the voxel quality information QR and QT , we do not need to alter the registration algorithm itself. Only an adapted insertion of the voxel values with a weight wi ∈ [0..1] into equation 8 is needed: wi = QR (xi )QT (φ(xi )) ri∗
t∗i = wi (ti − t)
∗ ∗ r ti ∗ S = i∗2i ∗2 i ri i ti
= wi (ri − r);
(9)
Using this weighting, voxels with high quality in both volumes affect the individual sums of the NCC equation more. We denote this similarity measure Weighted Normalized Cross-Correlation (WNCC). Note that a simple, approximative alternative is to use a limited joint volume of interest Ω, where the quality is sufficiently high in both volumes. This is in our case a manually defined slab from the center slices. However, we would like to provide a general framework for incorporating quality information into registration rather than a quick specialized solution. In addition, the precision and especially robustness (as large portions of the images have to be omitted) of this center-slab approach is not convincing, as we experienced in an early registration study. Eventually, when the registered datasets are to be combined, the quality information is a prerequesite in order to allow a smooth transition. For merging two registered volumes, we consider the quality information in the following way: M (xi ) =
R(xi )QR (xi ) + T (φ(xi ))QT (φ(xi )) QR (xi ) + QT (φ(xi ))
(10)
Quality-Based Registration and Reconstruction
(a) Vertical Slice, NCC
(b) Vertical Slice, WNCC
(c) Horizontal Slice, NCC
(d) Horizontal Slice, WNCC
723
Fig. 4. Difference of reference and template volumes of the brain preparation after registration. The background gray value indicates no error, dark and bright regions have larger intensity differences.
4 4.1
Results Registration Accuracy
An in-vitro preparation of a mouse brain was imaged from the top and bottom side (figure 3). Its total length is 9mm, the volume was downsampled to size 256x256x189 for registration. Figure 4 shows a vertical and horizontal difference slice for the two registration methods. The standard method results in larger errors on all borders, and especially a wrong displacement in vertical direction, as the blurred regions, located in opposite directions in both images, are fully considered for the similarity measure. The robustness of the registration was assessed with a randomized study: 236 registration computations were executed with initial transformations randomly displaced up to 1mm and 6◦ from the manually defined starting estimate. Both methods perform equally stable, the standard deviation of the resulting translational parameters is 6.4μm, which corresponds to the parameter abortion criteria of the used Hill-Climbing optimizer. The mean translations of the two methods are 0.1mm displaced (figure 5). This confirms the systematic bias due to blurring in opposite directions, which is compensated for by the weighted method.
724
W. Wein et al.
y
1320 1310 −1240 1300
−1260
−810 −820 −830 −840 −850
−1280 −1300 −1320 −1340
z
x
Fig. 5. Translation vectors of repeated registration from randomly displaced starting estimates. Blue=NCC, Red=WNCC.
(a) single slice
(b) quality slice
(c) VRT of single acquisition
(d) VRT of reconstruction result
Fig. 6. (a) and (b) show the intensity and quality of a slice in the center of one of the embryo volumes. Volume rendering (VRT) of this volume is shown in (c), the red line indicates the approximate location of the slices (a) and (b). Volume rendering of the reconstruction result from two flipped acquisitions of a whole mouse embryo is depicted in (d).
Quality-Based Registration and Reconstruction
4.2
725
Merging
Figure 6 shows the result of merging two volumes of a mouse embryo, the preparation was flipped sideways (approx. 180◦ ) between the two acquisitions. Precise image registration is crucial, as the resulting voxels are taken from both volumes. Each single data set is heavily blurred on one side, while the final reconstruction in 6(d) depicts very sharp and detailed features throughout the whole volume without any visible reconstruction artifacts.
5
Conclusion
We presented an algorithm to reduce artifacts arising from a novel optical tomographic imaging modality. Depth-wise degradation of image quality can be overcome by registering multiple volumetric acquisitions. A physical simulation of the light scattering in the object allows us to derive additional volumes of relative voxel quality information. These are both used in an adapted registration algorithm, and for weighting multiple intensities during merging of the volumes. We believe that this straight-forward extension can be easily applied to other modalities where quality-related information is available. We demonstrated the increased precision of our quality-based registration on an optical tomography volume. The subsequent merging of registered data produces continuously high quality throughout the whole image space. The result are three-dimensional reconstructions of in-vitro biological tissue samples, with a resolution and quality which, to our knowledge, has never been achieved before.
References 1. Dodt, H.U., Leischner, U., Schierloh, A., J¨ ahrling, N., Mauch, C., Deininger, K., Deussing, J., Eder, M., Zieglg¨ ansberger, W., Becker, K.: Ultramicroscoy: threedimensional visualization of neuronal networks in the whole mouse brain. Nature Methods 4, 331–336 (2007) ¨ 2. Spalteholz, W.: Uber das Durchsichtigmachen von menschlichen und tierischen Pr¨ aparaten und seine theoretischen Bedingungen. 2nd extended edn. S. Hirzel Leipzig Verlag (1914) 3. Williams, D.J.: The history of Werner Spalteholz’s Handatlas der Anatomie des Menschen. Journal of Audiovisual Media in Medicine 22, 164–170 (1999) 4. Prahl, S.A., Keijzer, M., Jacques, S.L., Welch, A.J.: A Monte Carlo Model of Light Propagation in Tissue. SPIE Institute Series 5 (1989) 5. Maintz, J., Viergever, M.: A survey of medical image registration. Medical Image Analysis 2, 1–36 (1998)
Simultaneous Segmentation, Kinetic Parameter Estimation, and Uncertainty Visualization of Dynamic PET Images Ahmed Saad1,2 , Ben Smith1,2 , Ghassan Hamarneh1 , and Torsten M¨ oller2 1 Medical Image Analysis Lab Graphics, Usability, and Visualization Lab, School of Computing Science, Simon Fraser University, Canada {aasaad, brsmith, hamarneh, torsten}@cs.sfu.ca 2
Abstract. We develop a segmentation technique for dynamic PET incorporating the physiological parameters for different regions via kinetic modeling. We demonstrate the usefulness of our technique on fifteen [11 C]Raclopride simulated PET images. We show qualitatively and quantitatively that the physiologically based algorithm outperforms two classical segmentation techniques. Further, we derive a formula to compute and visualize the uncertainty encountered during the segmentation.
1
Introduction
Positron emission tomography (PET) is a functional imaging technique that measures local concentration of a radioactive tracer inside the body. In dynamic PET imaging, a series of 3D images are reconstructed from list-mode data obtained by Gamma coincidence detectors. Kinetic modeling is the process of applying mathematical models to analyze the temporal tracer activity, in order to extract clinically or experimentally relevant information [1]. A set of kinetic parameters, resulting from the solution of the inverse problem described by the kinetic model, can describe the tracer behavior in a homogeneous region of a tissue, such as the myocardium, or quantify the densities of the neuroreceptors in the brain. Traditionally, the kinetic modeling process begins by marking a region of interest (ROI) around different functional regions. The PET activity is then averaged over the ROI at each time frame, and a single set of kinetic parameters is estimated by fitting a single kinetic model to the time sequence of average activities. ROI delineation of functional regions is a tedious, time-consuming, and errorprone task. Further, these delineations may vary depending on the quality of the PET image data and suffer from inter- and intra-operator variability, which may lead to inaccuracies in the estimated average time activity curve (TAC) and the estimated kinetic parameters. Several approaches have been proposed to identify various functional regions in dynamic PET images. Barber extracted the principal components of a Gamma camera dynamic study followed by factor analysis to identify the fundamental functional changes [2]. Di Paola et al. extended Barber’s work by applying N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 726–733, 2007. c Springer-Verlag Berlin Heidelberg 2007
Simultaneous Segmentation, Kinetic Parameter Estimation
727
oblique rotation for the loading factors in order to be physiologically relevant [3]. Lin et al. used a Markov random field (MRF) model to differentiate cancerous from normal tissues. They calculated the diagnostic hypoxia fraction and applied spatial constrains to reduce the effect of noise in 2D images [4]. Gou et al. utilized the activity histogram of the last frame, or the time-integration of TACs, in order to remove inactive TACs prior to performing hierarchical TAC clustering [5]. Liptrot et al. used cluster analysis in order to extract the blood vessel TAC as an alternative to blood sampling [6]. Recently, Kamasak et al. simultaneously clustered and estimated each cluster’s TAC directly from the projection (sinogram) data, without the need for tomographic reconstruction [7]. All the previous methods consider the PET segmentation and the kinetic parameter estimation as two independent processes, although, in reality, they are very tightly coupled. In other words, the dynamic PET data is first clustered into different homogenous regions and then the kinetic modeling is performed based on the average TAC for each region. Ideally, the segmentation of the dynamic PET data into different functional regions should be based on the physiological processes underlying each region. However, since the physiological processes are captured via the kinetic parameters, and the estimation of kinetic parameters relies on first defining the regions, this creates a dilemma: accurate kinetic parameter estimation requires a segmentation; ideal segmentation requires knowledge of the underlying physiological parameters. In this paper, we simultaneously segment the PET data and estimate the physiological kinetic parameters in each region. An iterative procedure allows us to deal with the dilemma stated above, where, upon convergence, the segmentation will indeed be based on the underlying kinetic parameters and the kinetic parameters will model the physiological process of each segment. Further, an uncertainty visualization technique is presented to validate the segmentation process. We first briefly review kinetic parameter modeling (Section 2) and the basic building blocks for clustering TAC data (K-means, MAP-MRF) (Section 3). We then detail our approach for simultaneous PET segmentation and kinetic parameter estimation (Section 4) followed by the uncertainty visualization technique (Section 5). In Section 6, we describe the dataset used in our experiment. We validate our approach and compare it to other standard approaches in Section 7. In Section 8, we conclude with a discussion and future directions.
2
Kinetic Modeling
The PET tracer kinetics can be described using compartmental models [1]. Each compartment describes a different state of the tracer molecule inside the body. In brain PET imaging, the tracer molecule is injected through the blood and then binds to brain receptors specific to this tracer, or metabolizes inside the tissue. Once tracer paths between compartments are specified, the mass balances between compartments are modelled using a set of ordinary differential equations (ODEs). The ODEs for a corresponding two-tissue compartmental model are described in Figure 1.
728
A. Saad et al. K1
k3
Cp
Cb
Cf k2
k4
Fig. 1. Two-tissue compartmental model described by 2 ODEs: dCb (t)/dt = k3 Cf (t) − k4 Cb (t); dCf (t)/dt = K1 Cp (t) − k2 Cf (t) − k3 Cf (t) + k4 Cb (t)
Cp , Cf and Cb are the plasma, intracerebral non-displaceable and specifically bound receptor compartments, respectively. K1 is the delivery rate constant (mL/min/g), and k2 , k3 and k4 are the first order kinetic rate constants (min−1 ). Since there is no radioactivity prior to scanning, the initial conditions Cf (0) = Cb (0) = 0 are used. The estimated kinetic parameters (K1 and ks , s ∈ {2, 3, 4}) are the solution of the inverse problem capturing the relationship of the output function (Cf (t) + Cb (t)) to the input function (Cp (t)) through the unknown parameters [8]. The most common parameter estimation method is least-squares estimation, which seeks the ks that, when inserted in the model’s ODEs, produce the best fit to measured data. This can be obtained by minimizing the following objective function: T (μ(t) − μf it (t))2 (1) t=1
where T is the number of time activity measurements at a particular voxel location. μ(t) is the mean TAC for the functional region under consideration(i.e., mean Cf (t)+Cb (t) for each functional region). μf it (t) is the model’s prediction of tissue activity. μf it (t) can be estimated using an ODE solver based on numerical differentiation formulas [9].
3
TAC Clustering: K-Means and MAP-MRF
We base our proposed methods on two basic techniques for segmenting TAC into functional regions: Kd tree-based K-means (KMN) [10] and Maximum a Posteriori MRF [11]. The KMN algorithm iteratively minimizes the following energy function N L xi − μj 2 (2) J1 = i=1 j=1
where xi is the TAC at each voxel i (e.g. a time-sampled Cf (t) + Cb (t)). The time index t is dropped for clarity. N is the number of voxels, L is the number of classes and μj is the mean TAC for class j. The initial values of μj are provided by user-defined seeds into different classes. We use the L2 norm as the similarity metric between TACs. In MRF, beside the image likelihood term in Equation 2, a regularizer for contextual information can be added to discourage assigning different labels to
Simultaneous Segmentation, Kinetic Parameter Estimation
729
neighboring pixels. This can be done in the MRF paradigm by the following energy function: J2 =
N L R ( xi − μj 2 + β λr (xi , xr ))
(3)
r=1
i=1 j=1
where xr is a TAC neighboring xi and R is the size of the neighborhood set. The parameter β determines the influence of the regularizer. λ is a binary function, returning zero if its arguments belong to the same class and one otherwise.
4
Simultaneous Segmentation and Estimation of the Kinetic Parameters
By applying the kinetic modeling process for the means μj of all L classes prior to solving Equation 2 or 3, we ensure that the resulting means will be the means capturing the physiological phenomena under consideration, and not the observed TACs from the PET data. To this end, we replace Equations 2 and 3 with two new objective functions described by Equations 4 and 5, respectively, taking into consideration the kinetic model, as follows. J3 =
N L
xi − μf itj 2
(4)
i=1 j=1
J4 =
N L R ( xi − μf itj 2 + β λr (xi , xr ))
(5)
r=1
i=1 j=1
μf it is the activity TAC produced by solving the kinetic model for each region mean activity μ using Equation 1. The ks , resulting from the solution of the model’s ODEs, are the kinetic parameters for each region. The original versions of the KMN and MRF algorithms Equations 2 and 3 are extended to KMN-KM and MRF-KM Equations 4 and 5, where the suffix ‘KM’ indicates the incorporation of the kinetic model. The algorithm is summarized in Alg.1
5
Uncertainty Visualization
In order to qualitatively evaluate our method, we visualize the uncertainty encountered during the segmentation. We highlight regions where the measured TAC is dissimilar from the kinetic model-based activity. The result of the segmentation technique is assigning each TAC xi to a certain cluster j, each of which has a class mean μj . For each xi , we consider ij as a measure of uncertainty: ij =
dij 1/L
xi − μj 2 dij = L 2 l=1 xi − μl
(6) (7)
730
A. Saad et al.
Algorithm 1. KMN-KM and MRF-KM Initialize μj , j = 1..L. μf itj ⇐ find ks producing μf itj that is closest to μj in the LS sense Equation 1. repeat new labels for each xi ⇐ applying KMN-KM Equation 4 or MRF-KM Equation 5 ∀ xi . new μj , j = 1..L ⇐ recalculate each region activity means with the new labeling. μf itj ⇐ find ks producing μf itj that is closest to μj in the LS sense Equation 1. until Convergence(No significant change in the μf itj ) Report the final values ksj , s = 1..4 and j = 1..L.
This uncertainty measure ij satisfies the two requirements: – if xi is 100% certain to belong to class j with mean μj , then we expect xi = μj and therefore dij = 0, which gives ij = 0, i.e., no uncertainty, as desired. – if xi has equal probability to belong to all the clusters we would expect the distance xi − μj to be equal, independent of the cluster j. Therefore dij = L1 , which gives ij = 1, i.e., completely uncertain, as desired.
6
Materials and Implementation
We used the publicly available simulated PET dataset PET-SORTEO [12]. It provides the ground truth labeling for each TAC. Fifteen [11 C]Raclopride simulated PET Brain studies accounting for inter-subject anatomical variability have been used. Each data set has dimensions of 128 × 128 × 63 with 26 time steps: 6 with 30s interval, 7 with 60s interval, 5 with 120s interval and finally 8 with 300s interval with voxel size of 2.11×2.11×2.42 mm3 . Each dataset is clustered into 6 different regions (i.e. L = 6): Background BG, Skull SK, Grey matter GM, White matter WM, Cerebellum CM, Putamen PN. We used a two-tissue compartmental model for its suitability for the modeling of [11 C]Raclopride datasets [13]. The implementation of our algorithms relies on the Insight Toolkit (ITK) 1 . The kinetic modeling process is based on the compartment model kinetic analysis tool (COMKAT) [9].
7
Results and Discussion
Figure 2.left shows the input function simulated with PET-SORTEO which represents Cp (t) in Figure 1. Figure 2.middle shows the average TAC for each region calculated using the ground truth labeling. Figure 2.right shows the model fitting average TAC for each region after applying the kinetic modeling process to the 1
www.itk.org
Simultaneous Segmentation, Kinetic Parameter Estimation
600
400
200
120
120
100
100
Voxel Intensity
800
Voxel Intensity
Activity (nCi/ml)
1000
80 60 40 20
10
20
30
40
Time (minutes)
50
60
0 0
80 60 40 20
BK 0 0
10
SK
GM 20
WM 30
CM 40
Time (minutes)
PN 50
60
731
0 0
BG 10
SK 20
GM 30
WM 40
CM 50
PN 60
Time (minutes)
Fig. 2. Left: Input function. Middle: Average TAC for each functional region in the ground truth. Right: Average TAC for each functional region after applying the kinetic modeling process to the ground truth.
ground truth. The difference between Figure 2.middle and Figure 2.right is due to the physiological assumptions about the interaction between the tracer and different tissues in the kinetic modeling process. We used the same initial seeds for each functional region and the same β matrix in Equations 3 and 5 when evaluating each of the four methods KMN, MRF, KMN-KM and MRF-KM. Figure 3 shows qualitatively the comparison between the four algorithms for slice #40 for patient01. It can be clearly seen that the WM is totally missing in the KMN and MRF algorithms, but KMN-KM and MRF-KM were able to capture it. Further, we see that PN is totally misclassified as WM but it is better captured using the physiological information in KMN-KM and MRF-KM. In order to compare between the algorithms quantitatively for the whole volume, we applied the Dice metric [14] to measure the overlap between different regions as shown in Figure 4. It shows how the KMN-KM and MRF-KM algorithms outperform the classical algorithms KMN and MRF especially in the WM and PN regions. The BG region is excluded from Figure 4 in order to compare between the rest of the active regions, as all the algorithms perform quite well to capture that region as seen in Figure 3. It can be seen from the Dice metric that the physiological based algorithms can capture the PN region, but the value is still low (∼0.21). The reason for that can be shown using the uncertainty visualization technique. Figure 5.left shows the result of applying an uncertainty threshold value of = 0.5 to the ground truth labeling after normalizing for each region between 0 and 1. Voxels with ≤ 0.5 are selected as “certain” and are rendered in normal colors (as in Figure 3). The remaining “uncertain” voxels are rendered in black. The area around the PN region in the ground truth image is completely uncertain which explains why it is completely misclassified in the classical algorithms and partially misclassified in the physiologically based algorithms. The uncertainty visualization guides the user to regions of high uncertainty in the segmentation process as shown in Figure 5.right. It shows the uncertainty image of the KNM-KM algorithm with 50% uncertainty. Doctors are then able to intervene, utilizing their expert knowledge, to provide corrections and iteratively improve and validate the results. The uncertainty metric failed to explain why there is higher confidence in the PN voxels in the main arteries region in our segmentation result
732
A. Saad et al.
Fig. 3. Comparison between the four algorithms after 10 iterations, from left to right: Slice # 40 from the axial view of the last time frame of the original PET dataset, ground truth labeling, KMN, MRF, KMN-KM, MRF-KM 0.8 KMN MRF KMN−KM MRF−KM
0.7
Dice Coeff
0.6 0.5 0.4 0.3 0.2 0.1 0
SK
GM
WM
CM
PN
Functional regions
Fig. 4. Performance evaluation between the four algorithms using the Dice metric
Fig. 5. Uncertainty visualization. Left: Uncertainty image for the ground truth with 50% uncertainty showing complete uncertainty around the PN. Right: Uncertainty image for the KMN-KM algorithm with 50% uncertainty. The background voxels are colored in white to emphasize the uncertainty values in black.
in Figure 5.right. It resulted from the fact that the PN and main arteries have very similar activities based on the TAC L2 distance used.
8
Conclusion and Future Work
In this paper, we showed qualitatively and quantitatively that incorporating the physiological model that describes the kinetics of the radioactive tracer into the segmentation techniques produces better results over the classical segmentation techniques. We also showed how we can visualize the uncertainty encountered during the segmentation process. This provides an efficient way to incorporate the user interaction and validate the segmentation results. Our algorithm depends on the presence of the input function which describes the amount of tracer into the plasma to fully solve the kinetic modeling process. In clinical settings, invasive methods of extracting the input function are the golden standard [1]. We plan to include the non-invasive kinetic models into
Simultaneous Segmentation, Kinetic Parameter Estimation
733
the segmentation technique. Further, we need to investigate the performance of our algorithm with real datasets with different noise levels. Our uncertainty visualization technique needs to be incorporated into an efficient user interaction model for editing the segmentation results instead of the standard relabeling on a per-voxel basis. Acknowledgements. This work has been supported in part by NSERC.
References 1. Morris, E.D., Endres, C.J., Schmidt, K.C., Christian, B.T.: Kinetic modeling in Positron Emission Tomography. In: Wernick, M., Aarsvold, J.N. (eds.) Emission Tomography: The Fundamentals of PET and SPECT, Academic, San Diego (2004) 2. Barber, D.C.: The use of principal components in the quantitative analysis of gamma camera dynamic studies. Phys. Med. Biol. 25, 283–292 (1980) 3. Di Paola, R., Bazin, J.P., Aubry, F., Aurengo, A., Cavailloles, F., Herry, J.Y., Kahn, E.: Handling of dynamic sequences in nuclear medicine. IEEE Trans. Nucl. Sci. NS29, 1310–1321 (1982) 4. Lin, K.P., Lou, S.L., Yu, C.L., Chung, B.T., Wu, L.C., Liu, R.S.: Markov random field method for dynamic PET image segmentation. In: Proceedings of the SPIE Image Processing, pp. 1198–1204 (1998) 5. Guo, H., Renaut, R., Chen, K., Reiman, E.: Clustering huge data sets for parametric PET imaging. Journal of Biosystems 71, 81–92 (2001) 6. Liptrot, M., Adams, K., Martiny, L., Pinborg, L., Lonsdale, M., Olsen, N., Holm, S., Svarer, C., Knudsen, G.M.: Cluster analysis in kinetic modelling of the brain: a noninvasive alternative to arterial sampling. NeuroImage 21(2), 483–493 (2004) 7. Kamasak, M.E., Bayraktarb, B.: Unsupervised clustering of dynamic PET images on the projection domain. In: Proceedings of the SPIE Medical Imaging, pp. 1539– 1548 (2006) 8. Carson, R.E.: Tracer kinetic modeling in PET. In: Bailey, D.L., Townsend, D.W., Walk, P.E., Maisey, M.N. (eds.) Positron Emission Tomography, Basic Sciences, Springer, Heidelberg (2005) 9. Muzic, R., Cornelius, S.: COMKAT: compartment model kinetic analysis tool. J. Nucl. Med. 42(4), 636–645 (2001) 10. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: Efficient k-means clustering algorithm: Analysis and implementation. IEEE PAMI 24, 881–892 (2002) 11. Besag, J.: On the statistical analysis of dirty pictures. Journal of Royal Statistics 48, 259–302 (1986) 12. Reilhac, A., Batan, G., Michel, C., Grova, C., Tohka, J., Costes, N., Evans, A.C.: Validation of PET SORTEO: a platform for simulating realistic PET studies and development of a database of simulated PET volumes. IEEE Trans. Nucl. Sci. 52, 1321–1328 (2004) 13. Lammertsma, A., Bench, C., Hume, S., Osman, S., Gunn, K., Brooks, D., Frackowiak, R.: Comparison of methods for analysis of clinical [11C]raclopride studies. J. Cereb. Blood Flow Metab. 16, 42–52 (1996) 14. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945)
Nonlinear Analysis of BOLD Signal: Biophysical Modeling, Physiological States, and Functional Activation Zhenghui Hu and Pengcheng Shi Medical Image Computing Group Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong {eezhhu,eeship}@ust.hk
Abstract. There is an increasing interest in exploiting the biophysical plausible models to investigate the physiological mechanisms that underlie observed BOLD response. However, most existing studies do not produce reliable model parameter estimates, are not robust due to the linearization of the nonlinear model, and do not perform statistics test to detect functional activation. To overcome these limitations, we developed a general framework for the analysis of fMRI data based on nonlinear physiological models. It performs system dynamics analysis to gain meaningful insight, followed by global sensitivity analysis for model reduction which leads to better system identifiability. Subsequently, a nonlinear filter is used to simultaneously estimate the state and parameter of the dynamic system, and statistics test is performed to derive activation maps based on such model. Furthermore, we investigate the change of the activation maps of these hidden physiological variables with experimental paradigm through time as well.
1
Introduction
A major problem in the interpretation of the fMRI BOLD signal is that the measurements are only indirectly related to the neural activity and interregional interactions from which they derive. Most current approaches to fMRI analysis uses linear convolution models that relate experimentally designed inputs, through an empirical hemodynamic response function (hrf), to observed BOLD signals. Such approaches, however, are blind to the mechanisms that underlie physiological changes, while it is important to have a quantitative understanding of those factors that are more directly related to the neural activity, such as changes in flow, oxygen extraction, blood volumes and their combined effects. These physically meaningful measures are needed to clarify the relationship between neural activation and experimental paradigm, and the significance of the observed transients in the BOLD signal. The Balloon Model has been developed as a comprehensive biophysical model of hemodynamic modulation, and provide a possible platform to understand the changes of physiological variables during brain activation [1]. It combines N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 734–741, 2007. c Springer-Verlag Berlin Heidelberg 2007
Nonlinear Analysis of BOLD Signal
735
the coupling mechanism of manifold physiological variables, and has successfully simulated pronounced transients in BOLD signal, including initial dips, overshoots and a prolonged post-stimulus undershoot. This model has been extended to include the relationship between evoked neural activity and blood flow [2], and transformed into to a nonlinear state space representation [3]. There have been several attempts applying such a biophysical model to the analysis of fMRI data, with some limitations. First, typical fMRI response for brief neuronal events lasts 12-16s (with 6-8 observations), while the balloon model includes six unknown constant hemodynamic parameters and a neural parameter varying to trial. It is thus difficult to estimate all seven variables at the same time ( at worst, the model can be underdetermined if one has few than seven image volumes. ). Thus, it is of great benefit to reasonably reduce the number of parameters needed to be estimated in order to increase the system identifiability. Some of the existing efforts discard such concern, and attempt to estimate all parameters [2] [4] [3], while others try to reduce the parameters through a regional linear analysis [5] or arbitrary assignment [6]. Secondly, The hrf typically possesses strong nonlinear characteristics. The linearized approximation method [2] [4] [3] [5] only reliable when time scale is discretized to sufficiently small that system are almost linear, and it can result in nonstable estimates, if the assumptions of local linearity is violated. Thus, a nonlinear estimation algorithm should be considered as a natural choices to against approaches of linearized approximation. Thirdly, Neuroimaging had been concerned predominantly with the localization of function. Most existing studies assume that the locus of a functional activation is known, which is determined by other ways. The estimation is then limited in a known activated voxel, not build a statistic to detect activation [2] [3] [6]. To overcome these shortcomings, we present a general framework for using Balloon model in the analysis of fMRI data. Firstly, we propose a quantitative model reduction method, relying on global, variance-based sensitivity analysis (SA) method, to assess the relative importance of system parameters in Balloon model. Subsequently, we present a nonlinear filter approach for joint state and parameter estimation of nonlinear dynamics system and provide a corresponding statistical test to derive activation maps based on such model. Furthermore, attribute to such innovation estimation process, we provide a possible approach for detecting the change of the activation maps of these hidden physiological variables with experimental paradigm through time as well.
2
Hemodynamics Balloon Model
The Balloon model consists of three subsystem linkings: (1) neural activity to changes in flow; (2) changes in flow to changes in blood volume and venous outflow; (3) changes in flow, volume and oxygen extraction fraction to changes in deoxyhemoglobin (dHb). It describes the dynamics intertwinement between
736
Z. Hu and P. Shi
the blood flow f , the blood venous volume v and the veins dHb content q, can be given as the following [1]: ⎧ f˙ f −1 ¨ ⎪ ⎪ ⎨f = u(t) − τs − τf v˙ = ⎪ ⎪ ⎩q˙ =
1 1/α ) τ0 (f − v 1−(1−E0 )1/f 1 τ0 (f E0
(1) −
v 1/α vq )
where is neuronal efficacy; u(t) is the neuronal inputs; τs reflects signal decay; τf is the feedback autoregulation time constant; τ0 is the transit time; α is the stiffness parameter; and E0 represent the resting oxygen extraction fraction. All variables are expressed in normalized form, relative to resting values. Eq. (1) has a second-order time derivative, and we can write this system as a set of four first-order ODEs by introducing a new variable s = f˙. Furthermore, the BOLD signal can be expressed as: y(t) = V0 (k1 (1 − q) + k2 (1 − vq ) + k3 (1 − v)), (2) k1 = 7E0 , k2 = 2, k3 = 2E0 − 0.2, appropriate for a 1.5 Tesla magnet [1], where V0 is the resting blood volume fraction. Statistical models usually can be explained as the fixed effects, which capture the underlying pattern, plus the error term. Thus, we rewrite Eqs. (1), (2) as: x˙ = f (x, β, u, v), y = h(x, β, w)
v ∼ N (0, Q)
(3)
w ∼ N (0, R)
(4)
where f and h are nonlinear equations, x(t) = [s(f˙), f, v, q] is the state of the system, β = {, τs , τf , τ0 , α, E0 , V0 } ∈ Rl is system parameters, the neuronal inputs u represents system input, v is the noise process caused by disturbances and modeling errors, y is the observation vector, and w is measurement noise. Eqs (3) and (4) constitute a state-space representation of fMRI BOLD responses to given stimulation T
3
Dynamics in State Space
A state space (x(t) = [s(f˙), f, v, q]T ) and a rule (Eqs. 1.) for following the evolution of trajectories starting at various initial conditions constitute a dynamical system. It is interesting to build some intuition for its dynamics. The fixed point of the system evolution can be found, by setting the four ˙ x=x0 = f (x0 ) = 0. Thus, we have a equilibrium state, x0 = time derivatives x| 1/(τf u0 +1)
[0, τf u0 + 1, (τf u0 + 1)α , (τf u0 + 1)α 1−(1−E0 )E0 ]T . The nature of the fixed point is determined by the characteristic values of the Jacobian matrix of
Nonlinear Analysis of BOLD Signal 1
737
1.6
0.9
1.4
0.8
Main effects ( Si )
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
10
12
14
16
18
20
1.2
Total effects ( STi )
V0 E0 α τ0 τf τs
0.7
1
0.8
0.6
0.4
0.2
0
0
2
4
6
8
Time
10
12
14
16
18
20
Time
Fig. 1. Area plot of the main effects (left) and total effects (right) sensitivity indices (obtained with N (d + 2) simulations, where N = 200, d is the dimensionality of parameters) to a 2s stimulated stimulus for the balloon model
partial derivatives evaluated at the fixed point. The Jacobian matrix for the set of equation is defined as: ⎛
− τ1f 0
− τ1s ⎜ 1 ⎜ J=⎜ ⎝ 0 0
1 1−(1−E0 ) τ0 ( E0
1/f
0 0
1/α−1 1 − v ατ0 τ0 1/f − (1−E0 ) E0 fln(1−E0 ) ) τq0 (1 − α1 )v 1/α−2
0 0 0 −v
⎞ ⎟ ⎟ ⎟ ⎠
1/α−1
τ0
(5) Its eigenvalues evaluated λ at the fixed point x0 are:
− τ1s + τ12 − τ4f − τ1s − τ12 − τ4f (τf u0 + 1)1−α (τf u0 + 1)1−α s s , ,− ,− } { 2 2 ατ0 τ0 All λ are negative, dictates the volumes contract along all directions of the coordinates of the phase space, and mean that the volume will shrink to a point in time. Since the sum of the eigenvalues, T r(J) = λ1 + λ2 + λ3 + λ4 < 0, it is a dissipative system. The system has not any long-terms dynamic behavior.
4
Model Reduction
We want to investigate the contribution of parameter factor to the output variation of model. A parameter that does not contribute to the output variance neither singular nor in combination with other parameters can be frozen to any value within its range of variation. This direct us to sensitivity analysis (SA). SA study how the uncertainty in the output of a model can be apportioned to different sources of uncertainty in the model input [7], and is consider as a imperative [8] or recommendation [9] in any field where models are used.
738
Z. Hu and P. Shi
Firstly, we introduce a global, variance-based SA method [7]. The total output variance V (Y ) for a model with d parameters can be decomposed as: V (Y ) = Vi + Vij + Vijk + · · · + V12...d (6) i
i<j
i<j
where Vi = V (E(Y |Xi )), Vij = V (E(Y |Xi , Xj )) − Vi − Vj and so on. They called partial variances and are orthogonal each other. Then, main effects terms for parameter βi can define as: Si =
V [E(Y |βi )] V (Y )
(7)
This measure indicates the relative importance of an individual parameter βi , in driving the uncertainty. Total total effects terms can be defined as: ST i =
E[V (Y |β−i )] V (Y )
(8)
β−i are all the parameters except βi . This measure indicates the sum of all terms in the variance decomposition that include βi . Thus, given input u(t), an typical parameter set β = {0.54, 1.54, 2.46, 0.98, 0.33, 0.34, 0.02} and their uncertainty range σ 2 = {0.12, 0.252 , 0.252 , 0.252, 0.0452 , 0.12 , 0.0052} [2], we exploit SA for model simplification. The area plot for the main indices is shown on left in Figure 1. The parameter with high first-order effects are V0 and τf . A high value for Si give a consistent contribution to the model output variance and indicates a good candidate for output uncertainty reduction through new research. Furthermore, we also display the total indices, ST i , on right in Figure 1. This measure indicates the total contribution to the variance of Y due to βi singularly or combination with others. Thus it can be employed to identify unessential parameters. We can see that the parameter α are non-influential at any time point. It can be frozen to any value within its range of uncertainty. Therefore we assume that α = 0.33 in the following parameters estimation. Furthermore, it should be noted that the simultaneous estimation to parameter V0 and other factors would be impossible, but only their product (see Eq. 2.). Thus, we impose a physiological plausible value V0 = 0.02 throughout brain.
5
Nonlinear System Identification
We addressed this nonlinear state-space estimation (Eqs. 3, 4) using unscented kalman filter (UKF) [10] to maintain the nonlinearities present in the biophysical model. UKF propagates variables mean and covariance through the unscented transformation (UT), and is high accuracy and robustness for nonlinear models. UT deterministically chose a set of weighted sigma points so that first two moments of these points match the prior distribution, and propagates them through the actual nonlinear function. Then, the properties of the transformed
Nonlinear Analysis of BOLD Signal
739
set can be recalculated from these propagated points. It can capture the posterior mean and covariance accurately to the 3rd order (Taylor series expansion) for any nonlinearity. The standard UKF implementation for state-estimation can be find in [11]. In the joint filtering approach (state estimate and parameter identification), the hidden system state and parameters are concatenated into a single higher-dimensional joint state vector, that is joint state vector x = {f˙, f, v, q, , τs , τf , τ0 , E0 }T in here, then, a standard UKF is now run on the joint state space to produce simultaneous estimates of the states and the parameters. Since the differential equations in Eq.3. are not soluble analytically, we employ a fourth order Runge-Kutta method to investigate the information about the trajectory, where step length h set as 0.2s to make the truncation error involved sufficiently small. Furthermore, while the initial input u0 = 1 (section 7), the state necessarily converge to their equilibrium points x0 = [0, τf + 1, (τf + 1/(τf +1)
1)α , 1−(1−E0E)0 (τf + 1)α ]T (Sec 3), thus the initial condition was set as x(0) = [0, 2.328, 1.322, 0.635, 0.54, 1.54, 2.46, 0.98, 0.34]T .
6
Statistical Test
We want to establish a statistical test to detect activations voxelwise. It seem is difficult to establish probabilities on individual parameters in such a nonlinear case, since single factor effects can not be partitioned due to the nonlinear correlation between parameters. However, we can assess the nonlinear model hypotheses, and compare models in a hierarchy, where inference is based on a F -statistic. Suppose we have a nonlinear model with parameter β, and we wish test H : h(x, β) = 0. The full model is y = h(x, β) + w, which reduce to the reduced model: y = const + w, when H is true. Denote the residual sum-of-squares for the full and reduced models by Sf and Sr respectively. Under H, Sf and Sr are independent, Sf ∼ χ2p and Sr ∼ χ2q . The following F-statistic expresses evidence H versus the alternative h = 0: F =
Sf p Sr q
∼ Fp,q
(9)
that is the random variable F has Snedecor’s F distribution with p and q degrees of freedom. The larger F gets the more unlikely it is that F was sampled under the null hypothesis H. Significance can then be assessed by comparing this statistic with the appropriate F -distribution.
7
Experiment
2 distinct texture photograph were presented in 2s for touch perception followed by a 14s rest, staring with stimulus. Total 96 acquisitions were made (RT=2s), in periods of 16s, giving 12 16-second circles. We chose the largest activation blob using SPM2, as region of interest.
740
8
Z. Hu and P. Shi
Result and Discussion
Figure (2). shows the estimated behavior of the state functions of the hemodynamic approach for the largest activation blob in SPM2. All these predictions of
Fig. 2. The time series of the estimated states functions of hemodynamic response to touch figure perception tasks. (a-d): BOLD signal y (the measured signal (red plus sign) and the filtering process (blue line)), the blood flow f , the blood venous volume v and the veins dHb content q. Each stimulus event, which was simulated by rectangular pulse of width 2s, is shown as strips in green. These red bounds are given by the square root of the diagonals of the covariance matrix.
Balloon model
GLM
Fig. 3. Activation maps obtained with balloon model and GLM (p < 0.001)
Nonlinear Analysis of BOLD Signal
741
the Balloon model concur with the known physiological effects in fMRI BOLD signal. The values of these parameters all are in the range of previously report [2] [3]. These physiological plausible parameter estimated may provide valuable information to evaluate activation. The same joint estimation approach then deal with all voxels throughout the brain in a voxel-specific fashion to investigate the localization of function, that is, where in the brain mediate a cognitive process of interest. Figure 3 shows the resulting activation detection overlaying on anatomical image for balloon model and general linear model (GLM). Two methods yielded the similar activation map, however, have different maximum activation. In this paper, we developed a general framework for using physiological models in the analysis of fMRI data. It includes dynamical analysis of system, model reduction, system identification and activation detection. The method is also efficient to deal with the fMRI data in others physiological model and experimental paradigms, including block-design and event-related.
References 1. Buxton, R.B., Wong, E.C., Frank, L.R.: Dynamics of blood flow and oxygenation changes during brain activation: The balloon model. Magn. Reson. Med. 39, 855– 864 (1998) 2. Friston, K.J., Mechelli, A., Turner, R., Price, C.J.: Nonlinear responses in fmri: The balloon model, volterra kernels, and other hemodynamics. NeuroImage 12, 466–477 (2000) 3. Riera, J.J., Watanabe, J., Kazuki, I., Naoki, M., Aubert, E., Ozaki, T., Kawashima, R.: A state-space model of the hemodynamic approach: nonlinear filtering of bold signals. NeuroImage 21, 547–567 (2004) 4. Friston, K.J.: Nonlinear responses in fmri: Bayesian estimation of dynamical systems: An application to fmri. NeuroImage 16, 513–530 (2002) 5. Deneux, T., Faugeras, O.: Using nonlinear models in fmri data analysis: model selection and activation detection. Neuroimage 32, 1669–1689 (2006) 6. Johnston, L.A., Duff, E., Egan, G.F.: Partical filtering for nonlinear bold signal analysis. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 292–299. Springer, Heidelberg (2006) 7. Saltelli, A., Ratto, M., Tarantola, S., Campolongo, F.: Sensitivity analysis practices: strategies for model-based inference. Reliability Engineering & System Safety 91, 1109–1125 (2006) 8. EC: European commission’s communication on extended impact assessment brussels, com (2002) 276 final. IEEE (05/06/2002), http://europa.eu.int/comm/governance/docs/index en.htm 9. EPA: The us environmental protection agency science policy council, white paper on the nature and scope of issues on adoption of model use acceptability guidance IEEE, http://www.epa.gov/osp/crem/library/whitepaper 1999.pdf. 10. Julier, S.J., Uhlmann, J.K.: Unscented filtering and nonlinear estimation. Proceeding of The IEEE 92, 401–422 (2004) 11. Merwe, V., Wan, E.R.: The square-root unscented kalman filter for state andparameter-estimation. In: ICASSP. vol. 6, pp. 3461–3464 (2001)
Effectiveness of the Finite Impulse Response Model in Content-Based fMRI Image Retrieval Bing Bai1 , Paul Kantor2 , and Ali Shokoufandeh3 1
2
Department of Computer Science, Rutgers University [email protected] Department of Library and Information Science, Rutgers University [email protected] 3 Department of Computer Science, Drexel University [email protected]
Abstract. The thresholded t-map produced by the General Linear Model (GLM) gives an effective summary of activation patterns in functional brain images and is widely used for feature selection in fMRI related classification tasks. As part of a project to build content-based retrieval systems for fMRI images, we have investigated ways to make GLM more adaptive and more robust in dealing with fMRI data from widely differing experiments. In this paper we report on exploration of the Finite Impulse Response model, combined with multiple linear regression, to identify the “locally best Hemodynamic Response Function (HRF) for each voxel” and to simultaneously estimate activation levels corresponding to several stimulus conditions. The goal is to develop a procedure for processing datasets of varying natures. Our experiments show that Finite Impulse Response (FIR) models with a smoothing factor produce better retrieval performance than does the canonical double gamma HRF in terms of retrieval accuracy.
1
Introduction
As a method for watching “how the brain works”, fMRI has become a powerful research tool in many aspects of neuroscience studies in the past decade [1]. More recently, classification of fMRI images, based on similarity between activation patterns, shows promising transition to clinical diagnosis [2,3,4]. These methods usually select features (that is to say, voxels or areas in the brain and their activation levels) and train models to best distinguish uncommon cases from so-called “typical” ones. We investigate content-based indexing of fMRI images. For any “query” fMRI image that is presented, we ask whether we can retrieve images that represent the same or similar cognitive processes (“success”). The potential applications include, but are not limited to, the following: 1) helping doctors to diagnose brain disorders, by looking at the clinical history of persons with similar fMRI
This work is supported by National Science Foundation Grant ITR-0205178. We thank Sven Dickinson, Deborah Silver and Nicu Cornea for significant discussions.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 742–750, 2007. c Springer-Verlag Berlin Heidelberg 2007
Effectiveness of the Finite Impulse Response Model
743
patterns; 2) helping researchers to find similar studies and related research work; 3) helping researchers to discover hidden similarities among superficially different cognitive activities. Our experimental studies are performed in the framework of information retrieval (IR) [5]. This framework is best known for applications such as search engines, which usually have a huge database of documents and images. In an IR framework, as in classification tasks, a dataset is represented in terms of a set of features. However, the IR framework is usually built to retrieve similar datasets from a very large database, in which it is generally difficult to assign class labels to each dataset. In contrast to seeking “class boundaries” optimized for specific classes, IR techniques use a more general “distance” measure. The IR framework is more extensible, and thus is preferable for an anticipated large database of fMRI datasets from miscellaneous sources. In recent few years, a number of papers have been published on content-based fMRI retrieval [6,7,8]. These papers present matching methods based on features selected using General Linear Model (GLM) [9] t-maps. In this paper, instead of testing matching methods with given features, we explore the possible ways to provide better features. Particularly, we find that inaccuracies in the assumed Hemodynamic Response Function (HRF), or in the associated stimulus time series may increase error in feature selection, and undermine the precision of subsequent processing. For example, in an experimental study of morality and decision-making [10], the subject presses a button when he/she thinks there is a moral issue to be resolved. In the reported analysis of this data, the beginning of the process of “moral reasoning” is set to be 8 seconds before the button is pressed, and the duration of this “stimulus” is set to 16 seconds. This approach works well with the specific method used in [10], but we find it can not be used in conjunction with general linear model in other typical settings [7]. In dealing with large heterogeneous data collections, we would not be able to generate either specialized HRF or stimulus configurations. Instead, we seek an adaptive HRF model, robust in handling cognitive processes with poor time definition, and efficient enough to allow large scale data processing. The contributions of this paper can be summarized as follows. Firstly, we investigate the smoothing given by the Maximum A Posteriori (MAP) FIR model [11] as an adaptive HRF model for feature selection. This model exhibits better results in our experimental evaluations on real data than does the canonical HRF model. Secondly, we have extended this MAP FIR model to support multiple stimulus conditions, and propose a bilinear regression approach. This work has potential to be developed in a number of ways and the preliminary results show that it merits further study.
2 2.1
Method GLM Based Feature Selection Schemes
In the GLM, observations (the time dependence of the signal at each voxel) y are to be explained by an intermediating variable X as y = Xb. X denotes the
744
B. Bai, P. Kantor, and A. Shokoufandeh
design matrix, and every one of its columns is an “Explanatory Variable”(EV) generated by convolving a “condition Stimulus” time series with an HRF. A popular choice for the HRF is the so-called “Canonical HRF” [1], which should be represented as the difference of two gamma functions: H(t) = f (t; 6, 1) − 1 1 α−1 −t/β e for t > 0. For each voxel, a t6 f (t; 16, 1), where f (t; α, β) = β α Γ (α) t value can be calculated, which indicates the significance of the voxel’s activation by the corresponding condition. A 3D image of these t-values for all brain voxels will be referred to as a “t-map”. The first step of our feature selection scheme is based on the construction of t-maps and will be followed by selection of subset of the voxels with highest t-values. One straightforward idea is to set a threshold for t-values themselves, and take all voxels above this threshold as the features. Despite its superficial attractiveness, large variation of t-values of fMRI images for a large database of diverse experiments will make this mechanism unusable. In our database, for example, the maximum t-value is only about 3 for some experiments, while others can have t-values larger than 10, making it hard to set a reasonable threshold for all experiments. In our experiments, therefore, we uniformly select 1% of the voxels with the most significant t-values. We choose this “magic number” of 1 percent for two reasons: 1) indexing large databases calls for small feature sets, and 2) our main objective is to construct a robust HRF for information retrieval purposes, not to set some optimal threshold. In fact, we tried several different thresholds which resulted in similar relationships among different HRF models. 2.2
Finite Impulse Response (FIR) Model
Despite its simplicity, the canonical HRF model obviously fails to allow variations across multiple subjects or multiple brain regions. Temporal derivatives of EVs are sometimes included in design matrix to address very minor time shift [12], but the timing errors in real data may be much larger. As an alternative, more flexible models such as the Finite Impulse Response (FIR) have been proposed [11]. In these models, the activation of a certain voxel at time t is the weighted sum of the stimulus values (si , i ∈ [t − n + 1, t]) at the preceding n time points, n i.e., yˆt (w) = i=1 wi st−(i−1) + w0 . The optimal estimate of w = [w0 , w1 , w2 , ...wn ]T is taken to minimize the total squared error between the observations and the model. To avoid overfitting problems, Goutte et al. [11] adopted a maximum a posteriori (MAP) parameter estimation similar to ridge regression, wMAP = (ST S + σ 2 Σ −1 )−1 ST y, where Σij = v exp(− h2 (i − j)2 ), h is a smoothing factor, v is the strength, and σ 2 is the variance of noise [11]. Such a smoothing induces a correlation among parameters and prevents sudden changes (spikes) in the local form of the HRF. We shall refer to this model as “MAP FIR” in the rest of this paper. 2.3
FIR Model for Multiple Conditions at the Same Time
The aforementioned FIR model can only deal with a single stimulus condition. However, it is quite common that several conditions occur in a single fMRI run.
Effectiveness of the Finite Impulse Response Model
745
Although we could deal with this by using each condition separately in single regression, there is a potential problem with that approach. Suppose several conditions have similar effects on one voxel. If we consider only one condition, then the residual sum of squares RSS will be greater in comparison to considering all conditions simultaneously, and this results in a smaller t-value. In other words, voxels whose time series are in fact just noise have a better chance to be selected. An example is shown in Figure 1. 8
stimulus 1
6
stimulus 2
4
stimulus 3
2 0
0
50
100
150
100
150
20 10 0 V 1 : 1 ∗ (stimulus 1 +stimulus 2 +stimulus 3 ) +N(0, 4)
−10
V 2 : 0.5 ∗ (stimulus 1 ) +N(0, 4) −20
0
50
Fig. 1. Upper: 3 conditions, marked as stimulus1 ,stimulus2 and stimulus3 . Lower: voxel V1 responds to all 3 stimuli with strength 1, while voxel V2 to only the first stimulus, with strength 0.5. Both of them are subject to the same noise time series of N(0,4).
In Figure 1, we apply the GLM with multiple conditions or single condition to these two voxels, and inspect their t-values for condition stimulus1. As shown in Table 1, the two methods select different voxels. For multiple regression, the tvalue of V1 is greater than V2 ; for single regression it is the other direction. This is because, in single regression, the two other stimuli are considered as noise, lowering the confidence level associated with stimulus1. Table 1. t-values for stimulus1 on voxels V1 and V2 , with multiple regression and single regression respectively. (Generated with SPSS 11.5).
Multiple regression Single regression
V1 V2 6.983 3.636 3.570 3.698
Based on these observations, we propose to combine FIR model with multiple regression, and explore its effect in retrieval performance. This, in turn, allows us to simultaneously compute estimates for the HRF and for the activation levels. Specifically, we will assume that the shape of HRF is the same for different stimuli, at a given voxel, because the HRF describes a physiological feature of
746
B. Bai, P. Kantor, and A. Shokoufandeh
certain brain region, and that should not depend on how much the region is engaged in a process, nor on why it is engaged. Specifically, suppose we have c conditions, whose stimulus time series are: sij , i ∈ [1, c], j ∈ [1, N ]. Then an estimate for the activation at time t can be written in the following parametric form: yˆt =
c j=1
aj
n i=1
wi sjt−(i−1) = aT St w ⎛
s1t ⎜ s2t = (a1 , a2 , ...ac ) ⎜ ⎝ ... sct
s1t−1 s2t−1 ... sct−1
⎞⎛ ⎞ ... s1t−(n−1) w1 ⎜ ⎟ ... s2t−(n−1) ⎟ ⎟ ⎜ w2 ⎟ ⎠ ⎝ ... ⎠ ... ... wn ... sct−(n−1)
(1)
For clarity, we omitted the constant terms from Eq. 1. The optimal value for the entries of weight vector a and the HRF w can be found using an “alternating” regression. That is, we fix a and w alternately, and calculate the other using linear regression until the process converges, as shown in Algorithm 1. Algorithm 1. BilinearRegression(S,Y ,M AP ) Iteratively find HRF and weights of regressors using alternating regression 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:
a ← 0; aold ← −∞; w ← 1 iterations ← 0 Build SS from S while a − aold 2 > NormThres and iterations < IterThres do /* Estimate a using w*/ U ← (S1 w, S2 w, . . . , SN w)T aold ← a a ← (U T U )− U T Y /* Estimate w using a*/ V ← (S1 T a, S2 T a, . . . , SN T a)T if M AP then w ← (V T V + varΣ −1 )− V T Y else w ← (V T V )− V T Y end if iterations ← iterations + 1 end while return w
This algorithm is guaranteed Nto converge, because linear regressions always reduces the least square error t=1 (yt − yˆt )2 , which is non-negative. With respect to landscape of local and global minima, the convergence behavior is not completely clear at this moment. However, in our validating experiments, we found that longer voxels time series and fewer conditions yield fewer local minima.
Effectiveness of the Finite Impulse Response Model
3
747
Results
Our testing scheme is built on a standard information retrieval framework, in which every image is used as a query, and performance is evaluated by checking the returned ranked lists. A retrieved image is considered “relevant” to the query only if they are both for the same type of condition. It is possible, of course, that data with different labels may contain similar brain process. In this case, the hidden similarity across conditions will increase the rank of items considered irrelevant, and lower the retrieval performance metric. Thus, the metric that we calculate should be a lower bound for the retrieval based on real similarity (including similarities not yet known to cognitive scientists). See [7] for more details about this framework. Since the number of examples for each condition may be quite different, we choose a metric insensitive to data size. We use the “Area Under the ROC Curve” (AUC) to evaluate each retrieval method. If the AUC is 0.5, then the retrieval method is no better than random selection. An AUC of 1 is a perfect retrieval. We use each of the datasets as a query against the rest (excluding the same subject), calculate AUC for each ranked list, and report the average AUC of all queries as the performance indicator. The similarity measure we use between two thresholded t-maps is the Jaccard distance. Specifically, the similarity between two sets of selected voxels is simply the size of their overlap divided by the size of their union, similarity(A, B) = A ∩ B/A ∪ B. The hyper parameters in MAP FIR model are h = .3, v = .1, and σ 2 = 1. We have gathered 430 real fMRI datasets from different institutions. Table 2 shows details of this testing database. These data are preprocessed (motion correction, spatial smoothing, high pass filtering, and registration to standard brain space) with the software package FSL [13]. To eliminate artifacts introduced by the fact that different brain regions are scanned in different experiments, we specifically consider only those parts of the brain that were scanned in all images. This is similar to the approach used in Mitchell et al. [4]. Our study explores the combination of single or multiple regression, with the canonical or finite impulse models for the HRF. Table 3 shows the average AUC for four different combinations of these two aspects. “CAN”, “MAP”, “SIN”, Table 2. Experiments Experiment Oddball: Recognition of an out of place image or sound Event perception: Watching either a cartoon movie of geometric shapes or real film of a human being [14] Morality: Making decisions about problem situations having or lacking combinations of moral and emotional content [10] Recall: Study and recall or recognition of faces, objects and locations [15] Romantic: People in love seeing pictures of their significant others, or of non-significant others [16]
Conditions auditory, visual studyActive, houseActive
TR(s) Size 2.0 8 1.5 53
M+E+, M+E-, M-e-
2.0 150
{S,T,R}{Face, Obj, Loc}
1.8 189
neutralFace, positiveFace
5.0
30
748
B. Bai, P. Kantor, and A. Shokoufandeh
and “Mul” denoting “Canonical HRF”, “MAP HRF”, “Single regression”, and “Multiple regression”, respectively. “AAUC (raw)” is the Average AUC for all 430 queries. Since this metric tends to be dominated by conditions with many samples, we also calculate the mean AUCs for each condition, and refer to the mean value of those as the “(Macro-)adjusted AUC”. Table 3. Average AUC for 430 datasets (Mean/Standard Error of the mean) CAN MUL MAP MUL CAN SIN MAP SIN .662/.007 .719/.006 .677/.007 .715/.007 .658/.006 .711/.005 .665/.007 .715/.006
AAUC (raw) AAUC (adjusted)
We test two hypotheses using these results. H1: “the FIR model performs better than canonical HRF in retrieval”. The hypothesis is clearly accepted since the differences are very significant for both single-variate and multi-variate approaches. H2: “for series of brain scans with multiple conditions, one multiple regression with all conditions performs better than multiple simple regressions”. The conclusion for this hypothesis is not clear yet. Figure 2, which shows the AAUC for separate conditions (see Table 2), provides further detail on this. Each method is better for some of the conditions. We return to this point in the discussion. 0.95 MAP_MUL MAP_SIN CAN_MUL CAN_SIN
0.9 0.85
Average AUC
0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 ace SF
bj SO
oc SL
ace
TF
bj
TO
c
TLo
ace
RF
RO
bj
RL
oc hou
seA
e e ctiv ctiv dyA stu
e−
M−
E−
M+
E+
M+
y
itor
aud
ual
vis
ce
eFa
itiv
pos
ace
tralF
neu
Fig. 2. Average of area under ROC (RAW) curve for 4 methods
4
Conclusions and Discussions
The results of this study are: confirmation of one hypothesis (H1) , and some tantalizing clues regarding the other. Specifically, the FIR model, with MAP smoothing, which seems to be a more realistic way to describe the variations, across the brain, in the anatomy supplying blood, does also yield significantly better performance in the retrieval setting. This suggest that it may be worth the added effort to use smoothed FIR analysis when preparing data for retrieval across different experiments, and different laboratories. On the other hand, the anticipated superiority of using multiple independent regressors to select the voxels characteristic of several cognitive conditions
Effectiveness of the Finite Impulse Response Model
749
occurring in the same run, is not confirmed. This lead us to a more detailed examination of why it was expected to be better, and a new hypothesis. Our argument in favor of using multivariate regression relied on the assumption that an individual voxel may be activated by several conditions, all occurring in the same experimental run. Using all but the condition of interest as a contrast has the effect of making the estimates of correlation with the signal less accurate. This makes the t-value smaller, and makes the voxel less likely to be selected as a feature. On the other hand, conditions that activate same voxels are harder to tell distinguish the same run. One or the other of these two contradictory factors may dominate in different experiments. As shown in Figure 2, for some types of experiments the multivariate regression (e.g., M+E+, M+Eand M-e-) is more effective, while for some of them (e.g. SFace, SLoc and SObj) it is not. This relationship will be further investigated in future work. Another interesting topic is the distribution of estimated FIR weights. It can not be included here due to page limit. Please see [17] for a brief report.
References 1. Frackowiak, R., Friston, K., Frith, C., Dolan, R., Price, C., Zeki, S., Ashburner, J., Penny, W.: Human Brain Function, 2nd edn. 2. Ford, J., Farid, H., Makedon, F., Flashman, L., McAllister, T., Megalooikonomou, V., Saykin, A.: Patient classification of fMRI activation maps. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, Springer, Heidelberg (2003) 3. LaConte, S., Strother, S., Cherkassky, V., Anderson, J., Hu, X.: Support vector machines for temporal classification of block design fMRI data. NeuroImage 26, 317–329 (2005) 4. Mitchell, T., Hutchinson, R., Pereira, R.N., Wang, F.: Learning to decode cognitive states from brain images. Machine Learning 57, 145–175 (2004) 5. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513 (1988) 6. Bai, B., Kantor, P., Cornea, N., Silver, D.: IR principles for content-based indexing and retrieval of functional brain images. In: Proceedings of the CIKM 2006 (2006) 7. Bai, B., Kantor, P., Cornea, N., Silver, D.: Toward content-based indexing and retrieval of functional brain images. In: Proceedings of the RIAO 2007 (2007) 8. Zhang, J., Megalooikonomou, V.: An effective and efficient technique for searching for similar brain activation patterns. In: Proceedings of the ISBI 2007 (2007) 9. Friston, K., Jezzard, P., Turner, R.: Analysis of functional MRI time-series. Human Brain Mapping 1, 153–171 (1994) 10. Greene, J., Sommerville, R., Nystrom, L., Darley, J., Cohen, J.: An fMRI investigation of emotional engagement in moral judgment. Science 293 (2001) 11. Goutte, C., Nielsen, F.˚ A., Hansen, L.K.: Modelling the haemodynamic response in fMRI with smooth FIR filters. IEEE Trans. Med. Imaging 19(12), 1188–1201 (2000) 12. Smith, S.: Overview of fMRI analysis. The British Journal of Radiology (77), S167– S175 13. Smith, S., Bannister, P., Beckmann, C., Brady, M., Clare, S., Flitney, D., Hansen, P., Jenkinson, M., Leibovici, D., Ripley, B., Woolrich, M., Zhang, Y.: FSL: New tools for functional and structural brain image analysis. In: Seventh Int. Conf. on Functional Mapping of the Human Brain. NeuroImage vol. 13, p. S249 (2001)
750
B. Bai, P. Kantor, and A. Shokoufandeh
14. Zaimi, A., Hanson, C., Hanson, S.: Event perception of schema-rich and schemapoor video sequences during fMRI scanning: Top down versus bottom up processing. In: Proceedings of the Annual Meeting of the Cognitive Neuroscience Society (2004) 15. Polyn, S., Cohen, J., Norman, K.: Detecting distributed patterns in an fMRI study of free recall. In: Society for Neuroscience conference (2004) 16. Aron, A., Fisher, H., Mashek, D., Strong, G., Li, H., Brown, L.: Reward, motivation, and emotion systems associated with early-stage intense romantic love. J. Neurophysiol. 94, 327–337 (2005) 17. Bai, B., Kantor, P.: A shape-based finite impulse response model for functional brain images. In: Proceedings of the ISBI 2007 (2007)
Sources of Variability in MEG Wanmei Ou1 , Polina Golland1 , and Matti H¨ am¨al¨ ainen2 1
Department of Computer Science and Artificial Intelligence Laboratory, MIT, USA 2 Athinoula A. Martinos Center for Biomedical Imaging, MGH, USA
Abstract. This paper investigates and characterizes sources of variability in MEG signals in multi-site, multi-subject studies. Understanding these sources will help to develop efficient strategies for comparing and pooling data across repetitions of an experiment, across subjects, and across sites. In this work, we investigated somatosensory MEG data collected at three different sites and applied variance component analysis and nonparametric KL divergence analysis in order to characterize the sources of variability. Our analysis showed that inter-subject differences are the biggest factor in the signal variability. We demonstrated that the timing of the deflections is very consistent in the early somatosensory response, which justifies a direct comparison of deflection peak times acquired from different visits, subjects, and systems. Compared with deflection peak times, deflection magnitudes have larger variation across sites; modeling of this variability is necessary for data pooling.
1
Introduction
Magnetoencephalography (MEG) is a noninvasive technique for investigating neuronal activity in the living human brain [4]. In contrast to functional magnetic resonance imaging (fMRI) which measures the hemodynamic changes associated with neuronal activity, MEG is directly related to the electric currents in neurons and thus has an excellent temporal resolution of milliseconds. Because of its potential in revealing the precise dynamic of neuronal activations, MEG is popular in neuroscience research, and it has started to move toward clinical applications such as presurgical planning for epileptic patients [8]. Testing interesting neurophysiological hypotheses often require a large number of subjects. However, the number of subjects or patients with a particular disease available at a certain location is often limited. Pooling data from multiple imaging centers is clearly helpful to overcome this limitation. At present, there are three different MEG systems, employing different sensor coil geometries. Therefore, it is important to assess possible variability in the data obtained from different systems. In this work, we examine the data collected in a multi-site MEG study administered by the MIND institute. The goals of this study parallel the analogous projects in fMRI [14,15]. Before pooling the MEG data, one must study the degree of consistency in the data generated from different systems and model the N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 751–759, 2007. c Springer-Verlag Berlin Heidelberg 2007
752
W. Ou, P. Golland, and M. H¨ am¨ al¨ ainen
system bias in the combined data set. The MIND multi-site MEG project includes a calibration program to assess inter-trial, inter-visit, inter-subject, and inter-site variability, which are quantitatively explored in this paper. The sources of variability can be studied either in the signal space (MEG sensor measurement) or in the source space (after solving the inverse problem). H¨am¨al¨ ainen et al. [5] focused on the inter-scanner variability in the signal space using a minimum-norm estimate based extrapolation method. Closely matched extrapolated and true data demonstrated excellent reproducibility of MEG data across the three systems. On the other hand, the source estimates relate more directly to the neuronal phenomenon of interest. Weisend et al. [13] reported consistent source localization when using data from the same subject on different MEG systems. These two approaches were tailored to study inter-system variability only. Zou et al. [15] have performed an in-depth study in a multi-site fMRI project. Framed as a detection problem, they applied an expectationmaximization algorithm to access the sensitivity and specificity from run, subject, and scanner. Because of its excellent temporal resolution, we focus on the timing and the magnitude of deflections when analyzing MEG data. The main contribution of our work is the investigation of many possible sources of variability of the estimated current sources underlying the early somatosensory MEG responses. Due to limitation in our current registration algorithms, we defer the spatial characterization for a future study. The rich temporal information in MEG data enables us to extend the comparison to the single-trial level. Compared with the results of the prior work based on the averages of hundreds of trials [5,13], our results reveal stronger consistency between the systems, and within each subject. We employ two approaches in characterizing the source of variability: the variance component analysis (VCA), which assumes a Gaussian model, and the nonparametric Kullback-Leibler (KL) divergence analysis to directly measure the differences between two sets of data. Our results show that the inter-subject difference is the strongest cause of variance. We also conclude that the peak time of early deflections is directly comparable across visits, subjects, and sites, but the variation in deflection magnitude across sites needs to be modeled for data pooling. In the next section, we describe the multi-site MEG data and possible sources of variability. We then present the analysis methods in Section 3 and results in Section 4, followed by conclusions.
2
Data and Sources of Variability
In this work, we analyze MEG data acquired by the MEG Consortium supported by the MIND Institute. Six normal subjects were scanned at three different MEG sites with two visits to each site1 . Each visit comprised experiments with three different types of stimuli: auditory, somatosensory, and visual. The three MEG systems employed were the 306-channel Neuromag VectorView system at Massachusetts General Hospital (Boston, MA), the 248-channel 4D Neuroimaging 1
Subject 2 and 4 had scans in two out of the three sites.
Sources of Variability in MEG 80
80
60
60
60
40
40
nAm
nAm
20 0 −20 −40
20 0 −20
0
20 40 time (ms)
subj 3
60
0
20 40 time (ms)
subj 5
P35m
20 0 −20
N20m
−40
N20m
80
P35m
nAm
P35m
40
753
N20m
−40 60
0
20 40 time (ms)
60
subj 6
Fig. 1. SI dipole timecourses in three subjects over two visits in three sites estimated from the average timecourse for each visit. Solid line denotes visit 1 and dashed line denotes visit 2. Curves with different colors present signals obtained from different MEG systems: MGH (black), MIC (blue), and UNM (red).
Magnes 3600 WH system at University of Minnesota (Minneapolis, MN), and the 275-channel VSM MedTech Omega275 system at the MIND Imaging Center (Albuquerque, NM). We will subsequently refer to the three MEG systems as MGH, UMN, and MIC, respectively. Anatomical images were collected for each subject with a Siemens Avanto 1.5 T scanner at the MGH site. We analyze the data from somatosensory median-nerve simulation, with on average Ntot = 300 trials per visit after rejecting trials with eye-movement and other artifacts. It has been shown that this simple stimulus activates a complex cortical network [6]. The first activation of the contralateral primary sensory cortex (SI) peaks around 20 ms and continues over 100 ms; then the secondary sensory cortex (SII) activates bilaterally around 70 ms and lasts up to 200 ms, during which the posterior parietal cortex may also activate. Whether SI and SII form a sequential architecture or parallel architecture is still a topic of active debate [7,11]. Although the SI-SII network exhibits robust activation, there is significant variation from trial to trial especially for SII due to physiological noise. In this initial study, we focus on two prominent and stable early deflections in SI: N20m and P35m, illustrated in Fig. 1. Due to the structure of the data, variation can be assessed across different trials within a single visit, between visits at a single site, among subjects, and among MEG systems. MEG system variation is a result of hardware differences (number of sensors, sensor type/position, and magnetically shielded rooms) and software differences (methods of noise cancellation and filtering parameters). Subject variation reflects differences in neuronal mechanisms [6] and brain anatomy. Changes in environmental noise and in the relative head position contribute to variation between data obtained from two visits. Variations in the neuronal state and the subject’s movement often lead to variation across trials. Assessing the contributions of these different sources of variability will help to improve design and analysis of future multi-site studies. As an illustration, Fig. 1 presents example SI dipole timecourses, estimated from the averages signals of all trials, 300 on average, over a visit. We will describe how to obtain these timecourses in the next section. The peak time and magnitude of N20m and P35m in Subject 3 match across visits and sites. Subject 5’s responses match except for one site where the subject received stimulus of
754
W. Ou, P. Golland, and M. H¨ am¨ al¨ ainen
different strength due to different stimulation electrodes used. We believe the magnitude mismatch in Subject 6 is due to physiological variation in the signal. Since the physiological variation has different effect in each site for this subject, it will be summarized as site variation in VCA. Future studies are needed to separate the physiological variation from site variation by including a control group being scanned at the same site with a long time interval.
3
Analysis
This section describes preprocessing and two analysis methods used in this paper: variance component analysis (VCA) and the nonparametric KL divergence analysis. After registering the MEG data to the MRI scan with help of fiducial on the scalp surface, we fitted the average signals from 18 to 35 ms after the stimulus onset with a single equivalent current dipole (ECD) using the NelderMead simplex algorithm [10] (Fig. 1). In all data sets, the goodness of fit at major deflections was 70∼98%, which is above the standard threshold. We chose dipole fitting rather than distributed source estimation [3,4,12], because it is reasonable to assume a single focal source, SI, in such an early response period. Rich temporal information and consistent deflection timing in the average dipole timecourses encouraged us to investigate the degree of consistency at the single trial level. We extracted the SI response from each trial by projecting the single-trial data onto the field pattern of the dipole fitted to the averaged data. The resulting single-trial responses were similar to the mean responses shown in Fig. 1. However, their low signal-to-noise ratio (SNR) caused ambiguity in identifying N20m and P35m. To enable reliable automatic detection of N20m and P35m, we employed a random sampling approach: we averaged N randomly sampled timecourses before applying the detector. We inspected the detection results with varying N , and found that N ≥ 4 provided sufficient SNR in the average response for accurate detection. Only a minor distortion in timing and amplitude was introduced from averaging of such small number of trials. In this work, we set N = 5, and refer to the averages of randomly sampled sets of five timecourses as “single-trial” experiments. This is in contrast to the commonly used approach of averaging over hundreds of trials, which loses much of temporal detail in the resulting timecources. All the analysis results presented in this paper are based on 105 such “single-trials” per visit, subject, and site. Our peak detection algorithm searches for extrema in the SI responses. To improve robustness of the method, we employ high-order derivatives estimated over a broad support. We experimented with several different robust detectors, including wavelet decomposition, arriving at qualitatively similar conclusions. Due to space limitation, we omit the details of the peak detection. 3.1
Model-Based Variance Component Analysis (VCA)
VCA is a common approach to quantifying sources of variability in data [2]. It models the observation as the sum of an unknown true mean μ and errors
Sources of Variability in MEG
755
introduced by each source. Each error term is assumed to be independently generated by a zero-mean Gaussian distribution with an unknown variance. The variance estimates provide a measure of how much each source contributes to the total variance in the data. Moreover, we compute the relative variability, by normalizing the variance estimates to sum to one, to compare results from different characteristics, such as peak time and magnitude. We set the observation tijkl to be the peak time or the magnitude of N20m or P35m, corresponding to the observation from site i, subject j, visit k, and trial l2 . Due to the structure of the data, we model tijkl as a cross-hierarchical combination of sources of variation from trials, visits, subjects, and sites: tijkl = μ+ai +bj +cijk +dijkl where μ is the true but unknown value of the observation, and ai , bj , cijk , and dijkl quantify the deviations from site i, subject j, visit k, and trial l, respectively; they are assumed to be samples of independent Gaussian random variables. For example, N (0, σb2 ) describes the distribution of bj , and σb2 indicates inter-subject variability. To improve the robustness of the estimation, we take the Bayesian approach with weak priors on the parameters. The prior distributions are N (0, 1010 ) for μ (a common weak prior for the mean) and independent Gamma(0.01, 0.01) for the precision parameters: σ12 , σ12 , σ12 , and σ12 . We estimate the model parameters as a c b d medians of the corresponding posterior distributions. Due to the complex crosshierarchy in the model, we employ Gibbs sampling, as implemented in BUGS software [1], to perform inference for parameters of interest. We use 105 burn-in samples, and inferences are based on another 105 samples. 3.2
Nonparametric KL Divergence Analysis
While VCA is powerful in quantifying variability, it cannot capture variation beyond second order statistics due to the Gaussian assumption. To overcome this limitation, we directly compare the distributions of the extracted parameters of the N20m and P35m deflections. We then employ the symmetrized KL divergence3 [9] to quantify differences between the two distributions directly from the histogram. Results are presented as a distance matrix. In this work, we construct the distributions separately for peak time and magnitude of a deflection. We defer the nonparametric KL divergence analysis to joint distributions of peak time and magnitude for future exploration. To summarize, while VCA provides a generative model to quantify the variability of each source, it is limited by the Gaussian assumption. On the other hand, the nonparametric KL divergence analysis captures differences between distributions beyond second order statistics. However, it does not separate the variation due to different sources. Applying both approaches better characterizes the variability in the data set. 2 3
Each “trial” here again refers to an average of 5 randomly selected trials. Dsym (p1 ||p2 ) = 12 (D(p1 ||p2 ) + D(p2 ||p1 )), where p1 and p2 are two probability distributions, and D denotes the KL divergence.
756
W. Ou, P. Golland, and M. H¨ am¨ al¨ ainen
0.8
0.8 1.88
0.6
0.8 2.15
0.6
0.8
0.6
0.6
22.51
11.56 0.4
0.4 0.76
0.2
0.2 0.21
0
0.4
tr
0.4
9.19
0.95
vi
su
si
(a) N20m time
0
0.2
0.42
0.15 tr
vi
0.2
4.19
0.27 su
si
(b) P35m time
10.24 6.55
6.32
1.66 0
tr
vi
su
si
(c) N20m magnitude
0
tr
vi
su
si
(d) P35m magnitude
Fig. 2. Relatively variability (bar) and estimated variance (top of the bars) for N20m peak time (a), P35m peak time (b), N20m magnitude (c) and P35m magnitude (d), respectively. The sources of variability are trial (tr), visit (vi), subject (su), and site (si).
4
Results
This section presents results obtained from VCA and the nonparametric KL divergence analysis. 4.1
Model-Based Variance Component Analysis
Fig. 2 presents VCA results for peak time and magnitude for N20m and P35m. The numbers on top of the bars denote the estimated variances of the corresponding sources, and the height of the bars is proportional to relative variability. The relative variability of the peak time is similar for N20m and P35m. While the relative variability for subject and trial are about 60% and 25%, the relative site variability is less than 5%. This is not entirely surprising. Due to high temporal resolution, different MEG systems can precisely capture when the deflections occur. Therefore, peak timing is directly comparable for data generated from the three systems. Little or no adjustment is required in pooling data across different systems and visits. Both the estimated variance and the relative variability for N20m magnitude suggest that the site variability is small. On the other hand, the relatively large site variability in P35m magnitude suggests adjustment is needed for data pooling. If the error distributions from each source closely follow a Gaussian distribution, a simple approach is to subtract the estimated site bias bj obtained from the current calibration study. The variance estimates are larger in P35m than N20m for both peak time and magnitude. This observation agrees with the general understanding that deflections tend to vary more as they are further away from the stimulus onset because a more complex network is often involved in their generation and several connections can affect the signal timing and magnitude. 4.2
Nonparametric KL Divergence Analysis
To investigate components of variability that are not captured by the Gaussian model, we applied the nonparametric KL divergence analysis. Fig. 3 presents the normalized histograms of N20m peak time each composed of 105 random samples for data obtained from each visit, subject, and site. The consistency of
Sources of Variability in MEG subj1
subj2
subj3
subj4
subj5
757
subj6
0.5
MGH1
0 0.5
MGH2
0 0.5
MIC1
0 0.5
MIC2
0 0.5
UMN1
0 0.5
UMN2
0
18
20
22
24
26
18
20
22
24
26
18
20
22
24
26
18
20
22
24
26
Fig. 3. Normalized histograms of N20m time from 105 random subsamples of the data
single trial responses is reflected by the matched distributions across sites and visits, which is a much stronger evidence than consistent deflection peak time in the average response of hundreds of trials. For example, subject 2 exhibits consistently skewed distributions. Due to some presently unknown experimental problem, inconsistency occurs in subject 1’s second visit to MIC. This data is removed from further analysis4 . There was a change in stimulus strength for subject 5 during his/her visit to MIC. We can observe small delay in one of the histograms, and note that further investigation is needed to understand the relationship between stimulus strength and deflection peak time. The symmetrized KL divergence between the histograms is depicted in Fig. 4(a). Each row or column corresponds to one visit of a subject to a particular site. There are four or six rows in a sui-suj block depending on whether subject i has MEG scans in two sites or all three sites5 . Small KL divergence in the highlighted blocks along the diagonal further confirms that the N20m peak time is consistent within each subject and is independent of visit days and MEG systems. By capturing higher order statistics, the KL divergence analysis conveys a stronger message than VCA: N20m peak time is directly comparable across sites and visits. P35m’s peak time exhibits very similar behavior (not shown). The site effect is much more pronounced when we consider the deflection magnitude. Change in the stimulus strength is clearly reflected by the larger KL divergence between subject 5’s data obtained from MIC and all other sites. In 4 5
This data was also excluded in VCA. Su1-suj block has five rows because subject 1’s second visit to MIC was discarded.
758
W. Ou, P. Golland, and M. H¨ am¨ al¨ ainen 1
su1
1 su1
1 su1
su2
0.8
su2
0.8
su2
0.8
su3
0.6
su3
0.6
su3
0.6
su4
su4
su4
0.4 su5
0.4 su5
0.2 su6
0.2 su6
su1 su2 su3 su4 su5
su6
(a) N20m time
0
0.4 su5 0.2 su6
su1 su2 su3 su4 su5
su6
(b) N20m magnitude
0
su1 su2 su3 su4 su5
su6
0
(c) P35m magnitude
Fig. 4. (a) Symmetrized pairwise KL divergence of histograms in presented Fig. 3. (b) and (c) Symmetrized KL divergence for N20m magnitude and P35m magnitude, respectively.
general, there is more site variability in P35m magnitude than in N20m magnitude for a single subject, with subject 1 being the most prominent example. While these results agree with the general trend observed in VCA, they imply site variability may be larger than that estimated by the model-based analysis. This suggests a refined VCA that relaxes the Gaussian assumption is necessary to accurately capture the variability in this MEG data.
5
Conclusions
Our study demonstrated that inter-subject effect is the largest contributor to the variability of the MEG data. We analyzed variability due to site, subject, visit, and trial effects in MEG data using variance components analysis and nonparametric KL divergence analysis. The two analysis methods established that we can directly compare deflection peak time across systems and visits. However, system effects on the deflection magnitude should be modeled for data pooling. Subject 6’s data suggests that the site effect may originate from that subject’s physiological variability. Our random sampling approach illustrated that the timing of deflections is highly consistent even at the single-trial level. Hence, average across a large number of trials is not necessary. Histograms built upon averages of a small number of single trials can better capture the intersubject differences. The increased sensitivity of such an approach can be helpful in studying differences in MEG responses between normal subjects and clinical populations. Acknowledgments. This work was supported in part by NIH NAMIC U54EB005149, NCRR mBIRN U24-RR021382, P41-RR13218 and P41-RR14075 grants, by the NSF CAREER Award 0642971, and by U.S. DOE Award Number DE-FG02-99ER62764 to the MIND Institute. Wanmei Ou is partially supported by the NSF graduate fellowship.
Sources of Variability in MEG
759
References 1. Bayesian inference using Gibbs sampling, http://www.mrc-bsu.cam.ac.uk/bugs/ 2. Box, G., Tiao, G.: Bayesian inference in statistical analysis. Wiley, Chichester (1992) 3. Dale, A., Sereno, M.: Improved localization of cortical activity by combining EEG and MEG with MRI cortical surface reconstruction: a linear approach. J. Cog. Neurosci. 5, 162–176 (1993) 4. H¨ am¨ al¨ ainen, M.S., et al.: Magnetoencephalography - theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics 65, 413–497 (1993) 5. H¨ am¨ al¨ ainen, M.S., et al.: Comparison of signals acquired with different MEG systems using an extrapolation method based on minimum-norm estimates. In: Proc. Biomag. (1994) 6. Hari, R., Forss, N.: Magnetoencephalography in the study of human somatosensory cortical processing. Phil. Trans. R. Soc. Lond. B 354, 1145–1154 (1999) 7. Kass, J., et al.: Multiple representations of the body within the primary somatosensory cortex of primates. Science 204, 521–523 (1979) 8. Knake, S., et al.: The value of multichannel MEG and EEG in the presurgical evaluation of 70 epilepsy patients. Epilepsy Res. 69, 80–86 (2006) 9. Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Machematical Statistics 22, 79–86 (1951) 10. Nelder, J.A., Mead, R.: A Simplex Method for Function Minimization. Comput. J. 7, 308–313 (1965) 11. Rowe, M., et al.: Parallel organization of somatosensory cortical areas I and II for tactile processing. Clin. Exp. Pharmacol. Physiol. 23, 931–938 (1996) 12. Uutela, K., H¨ am¨ al¨ ainen, M.S., Somersalo, E.: Visualization of magnetoencephalographic data using minimum current estimates. NeuroImage 10, 173–180 (1999) 13. Weisend, M.P., et al.: Paving the way for cross-site pooling of magnetoencephalography (MEG) data. In: Proc. Biomag. (2006) 14. Yendiki, A., et al.: Multi-Site Characterization of an fMRI Working Memory Paradigm: Reliability of Activation Indices. In: Proc. HBM (2006) 15. Zou, K.H., et al.: Reproducibility of functional MR imaging: preliminary results of prospective multi-institutional study performed by Biomedical Informatics Research Network. Radiology 237, 781–789 (2005)
Customised Cytoarchitectonic Probability Maps Using Deformable Registration: Primary Auditory Cortex Lara Bailey1 , Purang Abolmaesumi1,3 , Julian Tam1 , Patricia Morosan4, Rhodri Cusack7 , Katrin Amunts4,5,6 , and Ingrid Johnsrude2 1
Department of Computer Science, Queen’s University, Canada 2 Department of Psychology, Queen’s University, Canada 3 Department of Electrical and Computer Engineering, Queen’s University, Canada 4 Institute of Medicine, Research Center Juelich, Juelich, Germany 5 Department of Psychiatry and Psychotherapy, RWTH University Aachen, Germany 6 Brain Imaging Center West (BICW), Germany 7 MRC Cognition and Brain Sciences Unit, Cambridge, England [email protected]
Abstract. A novel method is presented for creating a probability map from histologically defined cytoarchitectonic data, customised for the anatomy of individual fMRI volunteers. Postmortem structural and cytoarchitectonic information from a published dataset is combined with high resolution structural MR images using deformable registration of a region of interest. In this paper, we have targeted the three sub-areas of the primary auditory cortex (located on Heschl’s gyrus); however, the method could be applied to any other cytoarchitectonic region. The resulting probability maps show a significantly higher overlap than previously generated maps using the same cytoarchitectonic data, and more accurately span the macroanatomical structure of the auditory cortex. This improvement indicates a high potential for spatially accurate fMRI analysis, allowing more reliable correlation between anatomical structure and function. We validate the approach using fMRI data from nine individuals, taken from a published dataset. We compare activation for stimuli evoking a pitch percept to activation for acoustically matched noise, and demonstrate that the primary auditory cortex (Te1.0) and the lateral region Te1.2 are sensitive to pitch, whereas Te1.1 is not.
1
Introduction
The increased spatial resolution of functional magnetic resonance imaging (fMRI) permits researchers to localize neural activity with respect to an individual subject’s anatomy. In order to draw general conclusions about the relationship between brain structure and function, it is generally necessary to pool results across several subjects. This is usually done by performing an additional image registration, which aligns each subject’s data to a standard brain template (e.g., the ICBM152 template). This additional ‘normalisation’ step also allows results to be compared among studies. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 760–768, 2007. c Springer-Verlag Berlin Heidelberg 2007
Customised Cytoarchitectonic Probability Maps
761
Analysing functional data across subjects usually involves assessing the magnitude of brain activity at the voxel level, but high inter-subject variability makes it difficult to accurately achieve a good match from subject to subject. Consequently, the region of pooled activation across subjects is diffused, and for small regions, statistical analysis may not reveal significant activation. The common solution is to apply spatial smoothing to each subject’s functional data. Smoothing not only increases the overlap of activation among subjects and thus the significance of the results, but also decreases the spatial resolution. Improving registration for gross anatomical features across subjects is not enough, since such features are not a reliable indicator of functionally distinct areas ([2], [3]). Instead, microanatomical features such as cytoarchitecture are more directly linked to functional regions ([4], [5]) should be used to increase the registration accuracy and thus decrease the need for spatial smoothing. Unfortunately it is currently not possible to observe microanatomical structures in routine magnetic resonance (MR) images acquired from human volunteers for most cortical regions. A solution to this dilemma may be found in the use of probability maps of microanatomical regions, derived from histological measurements in postmortem samples. In such maps, the value at each voxel indexes the likelihood of that voxel belonging to a particular cytoarchitectonic region. Such maps, coregistered with the anatomy of each fMRI volunteer, can be used as weighted filters on fMRI data, obviating any need for traditional inter-subject spatial averaging and isotropic smoothing, and permitting functional differentiation of small, adjacent, brain regions. 1.1
Previous Work
Quantitative cytoarchitectonic analysis was conducted on brain sections stained for cell bodies, from ten postmortem human brains. An observer-independent method was used to determine areal borders [1], [7], and [8]. The results were digitized and mapped onto high-resolution structural magnetic resonance images, which had been previously acquired on each postmortem specimen, creating 3D cytoarchitectonic volumes. This work reveals three ‘primary-like’ auditory regions, Te1.0, Te1.1, and Te1.2, overlapping with Heschl’s gyrus in each hemisphere; these are distinguished on the basis of cell architecture across the six layers of cortex (see Fig 1; [6], [9]). By 1) registering the postmortem structurals and cytoarchitectonic datasets to match a normal brain template (see Fig. 2(a) step 1), and then 2) averaging the registered cytoarchitectonic datasets (see Fig. 2(a) step 2), a probability map was created for each cytoarchitectonic area [6].
Fig. 1. Topography of cytoarchitectonically defined areas Te 1.0, 1.1, 1.2, Te2, TI1, and Te3 upon the (left) superior temporal plane in humans (From [6])
762
L. Bailey et al.
Fig. 2. Overview of (a) original and (b) proposed probability map generation process for auditory cortex (areas Te1.0, Te1.1, Te1.2)
1.2
Procedure
Probability maps tailored to the anatomy of individual fMRI subjects can be created by registering each cytoarchitectonic dataset directly to each fMRI subject’s anatomy, and then averaging the registered datasets. We can use deformable registration algorithms, which produce highly detailed alignment among subjects (but which are computationally intensive) if we restrict our registration efforts to a small volume surrounding the region of Heschl’s gyrus (HG) (Fig. 3(a)). The presented method of probability map generation can be used for any of the areas for which there is detailed cytoarchitectonic information coregistered with structural MR data from the same postmortem subjects, such as the data from the J¨ ulich/D¨ usseldorf series, which is the most comprehensively studied, and largest, dataset of its kind in the world (see [2], [8], for review). We first compare the anatomical precision and extent of the previously published probability maps [6] registered to a standard brain (Colin27) to maps created by using our high-dimensional ROI registration to the same standard brain. We then use fMRI data taken from a published study (Patter(a) HG (b) Te 1 son et al. 2002) and use both the custom and published probability maps as weighted filters to investigate fMRI activation in each Fig. 3. (a) Grey matter of Hesof the three cytoarchitectonic auditory re- chl’s Gyrus from gross anatomy, and (b) cytoarchitectonically degions defined in Section 1.1. We compare acfined Te1 from histological analysis tivation when subjects hear a noise stimulus with sufficient temporal structure to yield a pitch (which is fixed at one value for the duration of a particular stimulus, but varies over the course of the experiment), with an acoustically
Customised Cytoarchitectonic Probability Maps
763
matched noise stimulus without temporal structure (and without pitch). This contrast is known to yield bilateral activity in lateral HG [1], but previous analysis did not assess whether this pitch-related activation was located in a specific cortical field, or whether it is laterally asymmetric.
2
Method
The proposed method of custom probability map generation is demonstrated in Fig. 2(b). Section 2.1 (steps A and B in Fig. 2(b)) describes how the region of Heschl’s gyrus was preprocessed and extracted from both the high-resolution (1x1x1 mm) postmortem and the live subject’s (1x1x1 mm or 2x2x2 mm) structural MR data, while Section 2.2 (step 1) describes the registration step used to align the extracted postmortem cubes to the subject’s cubes. Finally, Section 2.3 (step 2) describes how the registered cytoarchitectonic data is averaged to create the subject’s custom microanatomical probability map, which can then be used for fMRI analysis, as described in Section 2.4. 2.1
Preprocessing
Heschl’s gyrus (HG) was ‘painted’ on each of the postmortem and fMRI subject’s structural volumes using the criteria of Penhune et al. [10]. The grey-matter (GM) segmentation for the in-vivo structurals was obtained using SPM’s ‘segment’ function [11], while the intersection of the cytoarchitectonic data with the painted gyrus defined the postmortem GM segmentation. The segmented volumes in standardized stereotaxic space were then thresholded, downsampled, and a cube surrounding the segmented HG in each hemisphere was extracted. Cubes at the same coordinates were also extracted from the three corresponding cytoarchitectonic volumes defining the sub-areas of Te1. 2.2
Registration
Each of the extracted postmortem structural ROIs were registered to the invivo structural ROIs using Insight ToolKit’s (ITK) BSplineDeformable registration, with a grid size of 10x10x10. The generated non-linear transformation parameters were then applied to the cytoarchitectonic datasets of each subject, producing cytoarchitectonic datasets all registered to the in-vivo subject. 2.3
Probability Map Generation
A probability map for each of the three sub-regions of Te1 (Fig. 1) was created by summing the registered cytoarchitectonic cubes from each of the ten postmortems for that sub-region: p(q, ˙ i, v) p(i, v) =
q
nI
(1)
764
L. Bailey et al.
where p(i, v) is the probability of region i containing voxel v, p(q, ˙ i, v) is the probability of region i containing voxel v for postmortem q (grey-level of cytoarchitectonic volume), and n is the number of postmortems. Since the warping is non-linear, it is possible for multiple voxels to be transformed to the same destination voxel. Thus, the three previously independent Te regions have the potential to intersect as a result of registration; this was corrected using the inp(k, v). The probabilistic ROIs were then re-inserted tersection factor I = k
into stereotaxic space to create tailor-made probability maps for the in-vivo subject. 2.4
fMRI Analysis
The methods described in Sections 2.1-2.3 were used to create custom probability maps for each of nine fMRI participants [1], and then combined with their recorded fMRI activation using the following summary measure s(i): s(i) =
p(i, v)a(v)
v
(2)
C(i)
where p(i, v) is the probability of region i containing voxel v, a(v) is the activation at voxel v, and C(i) = p2 (i, v) is the normalisation constant for region i. v
Normalisation is necessary to ensure the summary statistic is scaled into units of “activation per voxel” and is no longer biased by the degree of variability among regions. This weighted summary measure was computed for each region (left and right hemisphere separately) in each subject, using both the custom probability maps and the previously published probability maps [6] normalized to a standard brain.
(a) Te 1.0
(b) Te 1.1
(c) Te 1.2
Fig. 4. Probability maps generated (top) and corresponding section of previous [6] probability maps (bottom), superimposed on the Colin27 structural
Customised Cytoarchitectonic Probability Maps
3
765
Results
We compared the accuracy of our registration method with previous results [6] also in Colin27 space, by comparing the area covered by the two sets of probability maps. The maximum span of the voxels in the x, y, and z direction was calculated for each region in each hemisphere (see Table 1). Our registration method provides significantly more focused maps (F (1, 5) = 21.176, ρ = .006, η 2 = .809), spanning an average of 8 mm less in each dimension, with no significant difference among the dimensions. 3.1
Validation
Probability maps of Te1.0, Te1.1, and Te1.2 were Table 1. Estimated mean created for each of the fMRI subjects, and shown be- spanning distance (mm) of low is subject one (Fig. 5). Summary statistics (Eqn. previous and custom proba2) derived from the original [6] and custom probabil- bility maps in each axial direction ity maps were evaluated using a repeated-measures ANOVA with three factors: probability map type x y z (original or custom), hemisphere (left or right), and previous 28 29 30 subregion of Te1 (Te1.0, Te1.1, or Te1.2). Of the custom 24 20 19 main effects, only the effect of subregion was significant (F (2, 7) = 18.43, ρ = .002, η 2 = .84) OR (F (2) = 28.016, ρ = .000, η 2 = .778), although the effect of hemisphere was marginally significant (ρ = .07). A significant interaction between probability map type and subregion (see Fig. 6) was observed; no other interactions reached statistical significance. Sidak-corrected pairwise comparisons were used to examine the main effect of subarea. This revealed that lateral area Te1.2 and middle area Te1.0 did not differ in activity levels, but that both were significantly more active than medial area Te1.1 (ρ < .005). The interaction between subregions of Te1 and probability
(a) Te 1.0
(b) Te 1.1
(c) Te 1.2
Fig. 5. Probability maps generated (top) and corresponding section of previous [6] probability maps (bottom), superimposed on subject one’s structural
766
L. Bailey et al.
Fig. 6. Estimated marginal means of summary statistics for previous and custom probability maps
map type was also investigated using pairwise comparisons. The custom maps yielded significantly greater activity in area Te1.0 than did the published maps (ρ = .035); the difference between map types was not significant in the other two areas. Thus, the custom maps do appear to give increased sensitivity to signal change. Furthermore, this analysis demonstrates that Te1.2 and Te1.0 are both sensitive to the presence of temporal structure and pitch in sound, while Te1.1 is not sensitive to this feature. Finally, there is no evidence for hemispheric asymmetry in sensitivity to pitch in these data.
4
Discussion and Conclusions
The probability maps we generated for the standard Colin27 brain were significantly more focused than the published ones, indicating that our registration method is superior when precise hypothesis concerning structural-functional relationships in a well defined brain region should be tested. Furthermore, this registration method can be applied to any high-resolution structural image, allowing customised registration of any cytoarchitectonic data with the anatomy of individual fMRI volunteers. As evidenced by Fig. 4 and [12], deformable registration is superior to affine registration techniques used by Morosan et al. [6], yet the limitation of the technique is speed. The increased number of parameters results in significantly increased registration time: it requires 2.5 days to register both hemispheres of each postmortem to the target fMRI subject, using a 2.6 GHz P4 with 1 GB RAM. Future work will involve parallelization and optimization of the registration parameters. The success of the registration does not depend solely on the method and dimension of parameters used, but also on the inherent variability in structural morphology [12]. The use of probability maps registered to a standard brain
Customised Cytoarchitectonic Probability Maps
767
increases the effect of inter-subject variability compared to the use of maps customised for a given subject’s anatomy. We show that such custom probability maps can be used for the analysis of fMRI data. This has several advantages. First, the fMRI data need not be spatially smoothed; essentially, the probability maps themselves acted as filters, yielding a probability-weighted estimate of average activity over a given cytoarchitectonic region. This permits analysis of functional specialization within small, physically adjacent, cortical fields. Our method also permits quantitative comparison of activity in homologous regions in the two hemispheres. Given hemispheric morphological asymmetries, it is not enough to assess signal in homologous voxels, which is how such comparisons are usually accomplished. Our method accounts for morphological asymmetries automatically, and can therefore be used to measure functional laterality accurately and objectively. The results of the fMRI analysis indicate that lateral region Te1.2 and medial region Te1.0 are sensitive to the presence of pitch in a noisy stimulus. This is highly consistent with neurophysiological investigations in marmosets, which reveal that neurons in a region on the border between A1 and RT (the cytoarchitectonic homologues of human Te1.0 and Te1.2, respectively) are sensitive to pitch [13]. However, contrary to previous authors who have suggested a righthemisphere dominance for pitch sensitivity [14], we find no differences between hemispheres in activation levels in homologous regions. One shortcoming of the fMRI dataset analysed here is that the structural resolution is 2x2x2 mm, which is relatively low. This may have had a negative impact on the resulting probability maps (Fig. 5) when compared to the probability maps for Colin27 (Fig. 4) which has a higher spatial resolution (1x1x1mm). Future work will include probability-map analysis of auditory fMRI datasets with structural acquisitions of 1x1x1 mm.
References 1. Patterson, R., Uppenkamp, S., Johnsrude, I., Griffiths, T.: The processing of temporal pitch and melody information in auditory cortex. Neuron 36, 767–776 (2002) 2. Amunts, K., Zilles, K.: Advances in cytoarchitectonic mapping of the human cerebral cortex. Neuroimaging Clin N Am 11, 151–169 (2001) 3. Zilles, K., Palomero-Gallagher, N., Grefkes, C., Scheperjans, F., Boy, C., Amunts, K., Schleicher, A.: Architectonics of the human cerebral cortex and transmitter receptor fingerprints: reconciling functional neuroanatomy and neurochemistry. Euro. Neuropsychopharmacology 12, 587–599 (2002) 4. Felleman, D., Van Essen, D.: Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991) 5. Passingham, R., Stephan, K., Kotter, R.: The anatomical basis of functional localization in the cortex. Nat. Rev. Neurosci. 3, 606–616 (2002) 6. Morosan, P., Rademacher, J., Schleicher, A., Amunts, K., Schormann, T., Zilles, K.: Human primary auditory cortex: Cytoarchitectonic subdivisions and mapping into a spatial reference system. NeuroImage 13, 684–701 (2001) 7. Schleicher, A., Amunts, K., Geyer, S., Morosan, P., Zilles, K.: Observerindependent method for microstructural parcellation of cerebral cortex: A quantitative approach to cytoarchitectonics. Neuroimage 9, 165–177 (1999)
768
L. Bailey et al.
8. Zilles, K., Schleicher, A., Palomero-Gallagher, N., Amunts, K.: Quantitative analysis of cyto- and receptorarchitecture of the human brain. In: Brain Mapping: The Methods 2nd edn., pp. 573–602 (2002) 9. Rademacher, J., Morosan, P., Schormann, T., Schleicher, A., Werner, C., Freund, H., Zilles, K.: Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13, 669–683 (2001) 10. Penhune, V.B., Zatorre, R.J., MacDonald, J.D., Evans, A.C.: Interhemispheric anatomical differences in human primary auditory cortex: Probabilistic mapping and volume measurement from magnetic resonance scans. Cereb. Cortex 6, 661–672 (1996) 11. http://www.fil.ion.ucl.ac.uk/spm/software/spm2/ 12. Crivello, F., Schormann, T., Tzourio-Mazoyer, N., Roland, P.E., Zilles, K., Mazoyer, B.M.: Comparison of spatial normalization procedures and their impact on functional maps. Human Brain Mapp. 16, 228–250 (2002) 13. Bendor, D., Wang, Q.: The neuronal representation of pitch in primate auditory cortex. Nature 436, 1161–1165 (2005) 14. Zatorre, R.J.: Finding the missing fundamental. Nature 436, 1093–1094 (2005)
Segmentation of Q-Ball Images Using Statistical Surface Evolution Maxime Descoteaux and Rachid Deriche Odyss´ee Project Team, INRIA/ENS/ENPC, INRIA Sophia Antipolis, France
Abstract. In this article, we develop a new method to segment Q-Ball imaging (QBI) data. We first estimate the orientation distribution function (ODF) using a fast and robust spherical harmonic (SH) method. Then, we use a region-based statistical surface evolution on this image of ODFs to efficiently find coherent white matter fiber bundles. We show that our method is appropriate to propagate through regions of fiber crossings and we show that our results outperform state-of-the-art diffusion tensor (DT) imaging segmentation methods, inherently limited by the DT model. Results obtained on synthetic data, on a biological phantom, on real datasets and on all 13 subjects of a public QBI database show that our method is reproducible, automatic and brings a strong added value to diffusion MRI segmentation.
1
Introduction
We would like to segment white matter fiber bundles in which diffusion properties are similar and ultimately compare their features to those in other ROI in the same subject or on multiple subjects. Existing DTI-based segmentation techniques [1,2,3,4,5] are inherently limited by the DT model and most often blocked in regions of fiber crossings where DTs are oblate and isotropic. This is why recent high angular resolution diffusion imaging (HARDI) techniques such as QBI [6] have been proposed to aid the inference of crossing, branching or kissing fibers. New methods have thus started to appear to segment bundles in fields of ODFs [4,7]. In [4], the ODF map is reconstructed according the time consuming diffusion spectrum imaging (DSI) scheme and the segmentation problem is developed using a level set approach in a non-Euclidean 5-dimensional (5D) position-orientation space. This extension from 3D to 5D space leads to work with huge 5D matrices and there are important problems with data handling and storage. In [7], the main contribution is to model the ODF with a mixture of von Mishes-Fisher distributions and use its associated metric in a hidden Markov measure field segmentation scheme. Thus, both the ODF modeling and segmentation technique are different from our proposed method. In this paper, we answer the following three questions: 1) How can the segmentation problem be formulated and solved efficiently on a field of diffusion ODFs? 2) What is gained by the ODF with respect to the DT? 3) Is it possible to validate the segmentation results and make the segmentation automatic? To N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 769–776, 2007. c Springer-Verlag Berlin Heidelberg 2007
770
M. Descoteaux and R. Deriche
do so, we propose an efficient region-based level set approach using a regularized and robust spherical harmonics (SH) representation of the ODF [8]. We first show that a better local modeling of fiber crossings improves segmentation results globally. Then, we show that our ODF segmentation is more accurate than the state-of-the-art DTI segmentation [5] in regions of complex fiber configurations from synthetic data, from a biological phantom and from real data. Finally, we show that our Q-ball segmentation is reproducible by segmenting the corpus callosum (CC) of the 13 subjects of a public QBI database [9] automatically.
2
ODF Estimation from QBI
QBI [6] reconstructs the diffusion ODF directly from the N HARDI measurements on a single sphere by the Funk-Radon transform (FRT). The ODF is intuitive because it has its maximum(a) aligned with the underlying population of fiber(s). However, computing statistics on a large number of discrete ODF values on the sphere is computationally heavy and infeasible to integrate into a segmentation algorithm of the whole brain. A more compact representation of the ODF is thus needed. [8,10,11] proposed a simple analytic spherical harmonic (SH) reconstruction of the ODF. Letting Ym denote the SH of order and degree m (m = −, ..., ) in the standard basis and Yj (j(, m) = (2 + + 2)/2 + m) be the SH in the modified real and symmetric basis, the final ODF is Ψ (θ, φ) =
L j=1
2πP(j) (0)cj Yj (θ, φ),
(1)
fj
where L = (+1)(+2)/2, cj are the SH coefficients describing the input HARDI signal, P(j) is a Legendre polynomial of order (j)1 and fj the coefficients describing the ODF Ψ . Here, we use our solution [8] with a Laplace-Beltrami regularization of the SH coefficients cj to obtain a more robust ODF estimation.
3
Statistical Surface Evolution
We want to find a global coherence in the Q-ball field of ODFs. We denote the image of ODFs by F : Ω → L so that for all x ∈ Ω, F (x) is an ODF of order represented by a vector of L real SH coefficients, F (x) := {f1 , ..., fL } ∈ L . Now, the question is what is a good metric to compare ODFs? Distances between ODFs. We want to capture similarities and dissimilarities between two ODFs, i.e two spherical functions Ψ, Ψ ∈ S2 that can be represented by real SH vectors f, f ∈ L , as shown in the previous section. Since the ODFs come from real physical diffusion measurements they are bounded and form an 1
(j) is the order associated with the j th element of the SH basis, i.e. for j = 1, 2, 3, 4, 5, 6, 7, ... (j) = 0, 2, 2, 2, 2, 2, 4, ...
Segmentation of Q-Ball Images Using Statistical Surface Evolution
771
open subset of the space of real-valued L2 spherical functions with an inner product , defined as ⎛ ⎞ L L ⎝ Ψ, Ψ = Ψ (θ, φ) · Ψ (θ, φ) dσ = fi Yi (θ, φ) fj Yj (θ, φ)⎠ dσ. σ∈S2
σ∈S2
i=1
j=1
(2) Because of the orthonormality of the SH basis, the cross terms cancel and the ex L2 norm giving pression is simply Ψ, Ψ = L j=1 fj · fj . Therefore, the induced L 2 the distance metric between two ODFs is simply ||Ψ − Ψ || = j=1 (fj − fj ) . The Euclidean distance was also used successfully for DTI segmentation in [5] even though more appropriate metrics exist such as the J-Divergence [3,5] and Riemannian geodesic distances [5]. Similarly, one can think of choosing another metric to compare ODFs. For instance, since the ODF can be viewed as a probability distribution function (pdf) of fiber orientations, one can use the KullbackLeibler distance between two pdfs, as done in [6]. However, in that case the problem quickly blows up computationally because one needs to use all N discrete data on the sphere instead of the L SH coefficients (L << N ).2 Segmentation by Surface Evolution. Inspired by general works on image segmentation [12] , we search for the optimal partition S in two regions S1 and S2 of the image Ω. We maximize the a posteriori frame partition probability p(S|F) of obtaining the desired segmentation for the observed image of ODFs F . The major difference in our approach is that we use order-4 ODFs, with L = 15 real coefficients whereas in [5] DTs represented by 6D vectors3 are used as input to the region-based segmentation. We use the level set framework to represent the optimal partition S as the zero-crossing of the level set function φ. Hence, using Bayes rule, the optimal partition is obtained by maximizing p(φ|F) ∝ p(F |φ)p(φ). At this point, the main assumption is that probability distributions p1 and p2 of SH coefficients in regions S1 and S2 are Gaussians.4 Hence, we consider a parametric model with a L-dimensional Gaussian. Letting Fr ∈ L be the mean SH ODF vector and Λr be the L x L covariance matrix of the ODF vectors in region r = 1, 2, the likelihood of the ODF F (x) to be part of region r is defined as
1 1 −1 (F (x) −F exp − )Λ (F (x) −F ) , (3) pr (F (x)|Fr , Λr ) = r r r 2 (2π)3 |Λr |1/2 The optimal segmentation is then obtained by maximizing p(F |φ)p(φ) or by minimizing of the negative logarithms. Hence, the final energy minimization is log p1 (F (x))dx− log p2 (F (x))dx+ ν δ(φ)|∇φ|dx, (4) E(φ, p1 , p2 ) = − Ω 2 3
4
Ω
Ω
For example, one needs to process N = 200 values instead of L = 15 SH coefficients. The order-2 SH estimation of the ODF has six coefficients and is related to the DT. The Euclidean DTI segmentation [5] is thus a special case of the ODF segmentation. This is in fact a reasonable assumption because we observed “bell-shaped” histograms of each of the L coefficients of the ODFs in the CC of our real data.
772
M. Descoteaux and R. Deriche
where the first two terms are the region-based terms and the last term allows to impose a smoothness constraint ν on the evolving surface. The Euler-Lagrange equations can then be computed and discretized as in [5] to derive the implicit surface evolution of φ. The most coherent partition is thus obtained in an efficient and simple level set implementation.
4
Q-Ball Data Generation and Acquisitions
Synthetic Data We ngenerate synthetic Q-ball data using the multi-tensor model [6], S(ui ) = k=1 n1 exp(−buT i Dk (θ)ui ) + noise, for N encoding directions i ∈ {1, ..., N }. We use N = 81 from a 3rd order tessellation of the icosahedron, b = 3000 s/mm2 , n = 1 or 2 and Dk (θ) the diffusion tensor with standard eigenvalues [300, 300, 1700]x10−6 mm2 /s oriented in direction θ [6,8]. The noise is generated with a complex Gaussian noise with a standard deviation of 1/35, producing a signal with SNR 35. We generate two synthetic data example, one with a 2-fiber 90◦ crossing (Fig. 1) and the other with a 2-fiber branching configuration (Fig. 2). DTs and ODFs are visualized as spherical functions colored according to the Fractional Anisotropy (FA), with colormap going from red to blue for anisotropic to isotropic profiles. Biological Phantom Data. We obtained the biological phantom from [13]. It was created from two excised rat spinal cords embedded in 2% agar. The acquisition was done on 1.5T scanner using 90 encoding directions, with b = 3000 s/mm2 , TR= 6.4 s, TE= 110 ms, 2.8 mm isotropic voxels and four signal averages per direction. We compare the DT Euclidean and Riemannian [5] and ODF surface evolutions on this dataset. Human Brain Data. First, we use a human brain dataset acquired on 3T scanner [14] with 60 encoding directions, b = 1000 s/mm2 , 72 slices with 1.7mm thickness, twenty one b = 0 s/mm2 images, 128 x 128 image matrix, TE = 100 ms, TR = 12 s. We compare the segmentations of the DT Euclidean and Riemannian [5] and ODF surface evolutions on two well-known fiber bundles; the corpus callosum (CC) and cortico spinal tract (CST). Then, we test our ODF segmentation on the public NMR database [9]. The 13 datasets were acquired on a 1.5T scanner with 200 encoding directions, b = 3000 s/mm2 , 60 slices with 2 mm thickness, twenty five b = 0 s/mm2 images, 128 x 128 image matrix, TE = 93.2 ms, TR = 1.9 s. For each subject, a single voxel in the medial part of the CC is selected (manually) to initialize the flow.
5
Segmentation Results and Discussion
Synthetic Datasets. First, Fig. 1 shows that initialization has a strong influence on the final surface. If the initialization contains strictly anisotropic DTs/ODFs, the final surface is not able to pass through the fiber crossing area, as seen in Fig. 1(a). Similarly, the final surface is trapped in the crossing area when
Segmentation of Q-Ball Images Using Statistical Surface Evolution
773
Fig. 1. In the first row, the ODFs and the DTs and ODFs in the 90◦ crossing area. In (a-c), from left to right, the initialization used, the DT Riemannian [5] and the ODF flow segmentation. In the last row, the ODF front evolution in time.
initializing strictly in the 2-fiber region (Fig. 1(b)). This is because the statistics of the initial region have a large difference with the rest of the DTs/ODFs and hence, the evolving surface is blocked from connecting to the rest of the structure. However, if the initialization contains a mixture of both single fiber and 2-fiber DTs/ODFs, the DT flow propagates through the crossing region to connect to the similar anisotropic DTs on the other side of the crossing and the second fiber is completely ignored, as seen in Fig. 1(c). The DTs in the crossing are oblate and there is no information on the second orientation. In contrary, there is information about the second orientation in the ODF flow and the surface evolution finds the whole 2-fiber structure as coherent. Fig. 2 shows a more complex branching region. In the DT flow, we see that the surface remains trapped in the regions of the initial seeding for all initializations. In contrary, in the ODF case, when the flow is initialized in the bottom and middle part of the branch, the whole branching structure is recovered because the ODF contains a broader range of orientations in its statistics. Biological Phantom Dataset. Fig. 3 shows that the DT flow with the Euclidean distance is unable to segment the spinal cords. Whereas in [5] the initialization was placed outside the phantom and the flow converged inwards, here, we initialized inside the structure and we see that the surface leaks outside the cords because many DTs are isotropic in the fibers and there are also isotropic DTs outside the structure with mean diffusivity in a similar range. However, our new ODF flow segments the whole structure quite easily. The segmentation agrees with results published using the DT Riemannian flow [5, Fig.12-13].
774
M. Descoteaux and R. Deriche
Fig. 2. Segmentation on a synthetic branching example. In (a-d), from left to right, the initialization used, the DT Riemannian [5] and the ODF flow segmentation.
Fig. 3. In (a-c), from left to right, the initialization, the DT Euclidean [5] flow at t = 40 starting to leak outside the phantom structure and the segmentation of the ODF flow
Human Brain Datasets. Our ODF segmentation on real datasets recovers more structure than other published results on the CC and CST [1,4,5]. Fig. 4 shows that we are able to reproduce results from [5] with the DT-based flows using both the Euclidean and Riemannian distances. In the DT Euclidean flow, we see that the evolving surface stops near complex crossing area where oblate and isotropic DTs (greenish-blue) block the flow. The DT Riemannian is able to connect more voxels than the DT Euclidean by slightly evolving into the crossing area. However, in the CST, the flow is still unable to recover the branching fiber structure projecting to the cortex. The ODF flow recovers that branching structure to the different sulci and also recovers more of the splenium of the CC. Fig 5 shows that our new ODF surface evolution is reproducible on many subjects from the same set parameters. Convergence depends on the subject but was always obtained automatically for 80 to 120 iterations of the flow, where an iteration takes roughly 0.5 second on a Dell single processor, 3.4 GHz, 2 GB RAM machine. We see that for most subjects, we have segmented the full CC with the longer posterior parts of the splenium and the full genu, as in Fig. 4.
Segmentation of Q-Ball Images Using Statistical Surface Evolution
775
Fig. 4. ODF flow segmentations can propagate through crossing regions and go further than other segmentation methods. DT-based segmentations are overlaid on a slice with DTs and the ODF flow is overlaid on the same slice with the ODFs.
Fig. 5. Automatic segmentation of the corpus callosum using the ODF flow on the 13 subjects of the NMR database [9] from a single seed point in the middle of the CC
Overall all CC structures are similar but there are some differences across CCs. Hence, it is now important to quantify this multi-subject variability.
6
Conclusion
We have presented an efficient statistical surface evolution framework for the segmentation of Q-Ball images. The proposed method combines state-of-theart SH reconstruction of the ODF from QBI and state-of-the-art region-based surface evolution. To answer questions of the introduction: 1) The segmentation problem on ODF images can be formulated efficiently with the level sets evolving to partition similar ODF based on their spherical harmonic representation. 2) The ODF flow is able to deal with complex fiber configurations such as crossing and branching fibers better than DT-based segmentation using the Euclidean and Riemannian distances. 3) It is possible to validate the segmentation results. In particular, we obtained sets of globally coherent ODFs agreeing with
776
M. Descoteaux and R. Deriche
well-known real data cerebral anatomical structures as well as with synthetic and biological phantom datasets where the ground truth was known. Another important contribution was to show the reproducibility of the surface evolution on real datasets with different b-values and also on the 13 subjects from the public NMR database. It is now important to develop a better initialization of the level set front in order to perform a fully automatic segmentation. It is now possible to imagine performing a multi-subject study with segmented fiber bundles to quantify certain diffusion properties and attempt to follow the evolution of white matter diseases.
References 1. Zhukov, L., Museth, K., Breen, D., Whitakert, R., Barr, A.H.: Level set modeling and segmentation of DT-MRI brain data. J. of Electronic Imaging 12, 125–133 (2003) 2. Feddern, C., Weickert, J., Burgeth, B.: Level-set methods for tensor-valued images. In: Proceedings of the Second IEEE Workshop on Geometric and Level Set Methods in Computer Vision, pp. 65–72. IEEE Computer Society Press, Los Alamitos (2003) 3. Wang, Z., Vemuri, B.C.: DTI segmentation using an information theoretic tensor dissimilarity measure. IEEE Trans. in Medical Imaging 24(10), 1267–1277 (2005) 4. Jonasson, L.: Segmentation of diffusion weighted MRI using the level set framework. PhD thesis, Ecole Polytechnique federale de Lausanne (2006) 5. Lenglet, C., Rousson, M., Deriche, R.: DTI segmentation by statistical surface evolution. IEEE Transactions in Medical Imaging 25(6), 685–700 (2006) 6. Tuch, D.: Q-ball imaging. Magnetic Resonance in Medicine 52(6), 1358–1372 (2004) 7. McGraw, T., Vemuri, B., Yezierski, R., Mareci, T.: Segmentation of high angular resolution diffusion MRI modeled as a field of von Mises-Fisher mixtures. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 463– 475. Springer, Heidelberg (2006) 8. Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R.: Regularized, fast, and robust analytical Q-ball imaging. Magnetic Resonance in Medicine (to appear) 9. Poupon, C., Poupon, F., Allirol, L., Mangin, J.F.: A database dedicated to anatomo-functional study of human brain connectivity. In: HBM. Twelfth Annual Meeting of the Organization for Human Brain Mapping (2006) 10. Anderson, A.: Measurements of fiber orientation distributions using high angular resolution diffusion imaging. Magnetic Resonance in Medicine 54, 1194–1206 (2005) 11. Hess, C., Mukherjee, P., Han, E., Xu, D., Vigneron, D.: Q-ball reconstruction of multimodal fiber orientations using the spherical harmonic basis. Magnetic Resonance in Medicine 56, 104–117 (2006) 12. Paragios, N., Deriche, R.: Geodesic active regions: a new paradigm to deal with frame partition problems in computer vision. Journal of Visual Communication and Image Representation 13(1/2), 249–268 (2002) 13. Campbell, J., Siddiqi, K., Rymar, V., Sadikot, A., Pike, B.: Flow-based fiber tracking with diffusion tensor Q-ball data: Validation and comparison to principal diffusion direction techniques. NeuroImage 27(4), 725–736 (2005) 14. Anwander, A., Tittgemeyer, M., von Cramon, D.Y., Friederici, A.D., Knosche, T.R.: Connectivity-based parcellation of BROCA’s area. Cerebral Cortex 17(4), 816–825 (2007)
Evaluation of Shape-Based Normalization in the Corpus Callosum for White Matter Connectivity Analysis Hui Sun1 , Paul A. Yushkevich1, Hui Zhang1, Philip A. Cook1 , Jeffrey T. Duda1 , Tony J. Simon2 , and James C. Gee1 1
Penn Image Computing and Science Laboratory, Department of Radiology, University of Pennsylvania, Philadelphia, USA 2 Cognitive Analysis and Brain Imaging Laboratory, M.I.N.D. Institute, University of California, Davis, USA
Abstract. Recently, concerns have been raised that the correspondences computed by volumetric registration within homogeneous structures are primarily driven by regularization priors that differ among algorithms. This paper explores the correspondence based on geometric models for one of those structures, midsagittal section of the corpus callosum (MSCC), and compared the result with registration paradigms. We use geometric model called continuous medial representation (cm-rep) to normalize anatomical structures on the basis of medial geometry, and use features derived from diffusion tensor tractography for validation. We show that shape-based normalization aligns subregions of the MSCC, defined by connectivity, more accurately than normalization based on volumetric registration. Furthermore, shape-based normalization helps increase the statistical power of group analysis in an experiment where features derived from diffusion tensor tractography are compared between two cohorts. These results suggest that cm-rep is an appropriate tool for normalizing the MSCC in white matter studies.
1 Introduction Group analysis of the “appearance” or “texture” features in medical images requires establishing correspondence between individual images. The problem of correspondence can be stated in terms of “normalization”: to remove the anatomical differences between individuals so that group comparison can be carried out. This is usually achieved by volumetric registration, which warps the individual image into the template’s space. Volumetric registration is usually driven by two terms: the similarity metric and the regularity prior. Many regularization priors (elastic, fluid, diffeomorphic et. al) are usually used to enforce the smoothness of the map between images. Within homogeneous structures, because of the lack of intensity information, the correspondences computed by the volumetric registration could be primarily driven by regularization priors that differ among algorithms [12]. In these cases, geometric models can provide a basis for normalization, especially given the large body of research on correspondence methods for these models [6, 8]. Validation of the normalization is usually hard, due to the difficulty to obtain gold standard. In this paper, validation is performed with the help of Diffusion Tensor MRI (DTI), where the alignment of the “hidden” features obtained from diffusion tensor tractography are evaluated after normalization. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 777–784, 2007. c Springer-Verlag Berlin Heidelberg 2007
778
H. Sun et al.
This paper is concerned with a particular type of geometric model: the continuous medial representation (cm-rep) [15]. Medial representations (m-rep) [11] describe structures by explicitly defining the medial axis (skeleton) of the structure and deriving boundary geometry from the medial axis. The PDE-based cm-rep approach [15] takes particular care to ensure that the precise geometric relationship between the medial axis and the derived boundary is maintained. 3D cm-rep models have been successfully built and applied in neuroimage studies [16]. In particular, a closed-form solution is derived for the 2D equivalent of the PDE in [13], allowing efficient generation of the cm-rep models for 2D structures. The cm-rep model is especially suitable for normalization of anatomical structures in multivariate datasets, because of its ability to extend boundary-based or skeleton-based correspondences to the interiors of structures [13,15]. For 2D objects, the cm-rep model establishes a shape-based coordinate system tailored to the object, where one coordinate (or two coordinates for 3D object) follows the skeleton, and the other coordinate goes from the skeleton to the boundary. This paper presents an application where shape-based normalization of the MSCC is used to analyze patterns of commissural connectivity in the human brain as derived from diffusion tensor imaging. Shape-based normalization of the MSCC is compared to registration paradigms, with results favoring the shape-based approach.
2 Method Sec 2.1 and Sec 2.2 briefly describe the two normalization techniques which will be compared in this paper, and Sec 2.3 introduces the method to validate the normalization technique under Diffusion MRI framework. 2.1 Shape-Based Normalization There are many techniques in the shape analysis field that establish shape-based correspondences between boundaries of structures. The cm-rep method has a unique property that allows boundary-based or skeleton-based correspondences to be propagated to the interiors of objects, thus enabling shape-based normalization. The cm-rep method has been thoroughly explained in [13, 15]. The most important concept is medial axis, also called the Blum skeleton, which is defined as the locus of centers and radii of all the maximal inscribed disks (MIDs) inside of an object. The Process-Induced Symmetric Axis (PISA) [9] is closely related to the Blum skeleton. It is formed by the midpoints of the chords connecting the corresponding boundary points where the MID is tangent to the object’s boundary. The cm-rep models for anatomical structures can be constructed by fitting a deformable cm-rep template to binary segmentations of the structure. The interior of the cm-rep model can be parameterized by a shape-based coordinate system where axis t is the Process-Induced Symmetric Axis (PISA), and the other axis ξ goes from the PISA to the boundary. Example of gridline of this coordinate system is illustrated in Fig. 2(a). The normalization based on it is illustrated in Fig. 2(d).
Evaluation of Shape-Based Normalization in the Corpus Callosum
779
In this paper, equal arc length parameterization of the PISA is used to establish the correspondence on t axis, and equal length parameterization of the line between PISA and the corresponding boundary is used to establish the correspondence on ξ axis. 2.2 Volumetric Registration Based Normalization Deformable registration is used to normalize the same dataset, and the results are compared with shape-based normalization. The symmetric diffeomorphic registration algorithm developed by Avants et al. [1], one of the state-of-the-art high-dimensional large deformation registration algorithms, is used in this paper. The template used in the registration is constructed from the dataset itself, following the method described in [2]. Considering the homogeneity on the interior of MSCC, the segmented 2D mask image of MSCC is directly used for registration. And the DTIbased measurements are then warped to the template space according to the registration results. 2.3 Validation of the Normalization Diffusion MRI studies provide an attractive framework within which to evaluate the performance of MSCC normalization via cm-reps and other techniques. With the help of diffusion tensor tractography, every location in the MSCC can be associated with a set of features derived from fiber tracts passing through that location. In a multi-subject experiment, these features can be used to detect structural differences between control and patient cohorts. The effect of different normalization methods on the statistical significance of detected differences can then be analyzed. Furthermore, normalization methods can be evaluated by examining how well they align anatomically labeled fiber tracts within cohorts. The following sections describe the approaches used to evaluate normalization from these different standpoints. Features Derived from Diffusion Tensor Tractography. Diffusion tensor tractography is a tool for studying the white-matter connectivity in the brain. In our study we use two streamline tracking methods implemented in the open source Camino toolkit [3]. The FACT method, proposed by Mori et al [10], follows the local fiber orientation in each voxel, changing direction at voxel boundaries. We also track using a fixed, subvoxel step size (0.4mm), following interpolated orientations taken from the vector field at each step using a simple eight-neighbor trilinear interpolation. We refer to this as the VINT (vector interpolation) method. Both of the tracking algorithms use an fractional anisotropy (FA) threshold and a curvature threshold to terminate the tracking. We repeat all our experiments for eight different tracking settings to check and ensure the stability of our result. The FA threshold is either 0.15 or 0.25, the curvature threshold allows a maximum curvature of either 45 or 60 degrees over the length of a voxel, and the local fiber direction is determined by either the FACT or VINT algorithm. The midsagittal plane is identified automatically according to the symmetry of the left and right hemisphere and the MSCC is manually segmented in the midsagittal plane. We then segment the fiber pathways of the MSCC in each subject by seeding streamline tractography in every voxel in the diffusion-tensor image and retaining only streamlines
780
H. Sun et al.
that intersect the MSCC. Following a similar method to Corouge et al [5], we examine the FA of diffusion along the length of each streamline. The FA of each voxel visited by a streamline is weighted by the length of the line segment in the voxel in order to compute a mean, giving each streamline an associated tract FA. The tract FA is plotted on the MSCC, as illustrated in Fig. 2(c). The value of each voxel is the mean tract FA of all streamlines that pass through that voxel. MSCC Connectivity Labeling. A qualitative examination of the cortical connectivity is complimentary to the quantitative tract-based measures. Following [4, 7], we label the streamlines passing through the MSCC according to their cortical connectivity. To label the fibers, an anatomical atlas [14] is used, which divides each hemisphere of the cerebral cortex into four regions: frontal, parietal, temporal and occipital. The atlas is aligned to the T1-weighted image of each subject using a diffeomorphic image registration algorithm [1]. Examples of the warped atlas in space of the T1-weighted image are shown in Fig. 1. Then the warped atlas in the space of the T1-weighted image is further aligned to the space of the diffusion tensor image using the transformation that coregisters the T1-weighted image to the diffusion tensor image. Now each fiber derived from tractography in the diffusion image space can be assigned a label according to the cortical region of its endpoints. An example of labeled fibers is shown in Fig. 1. Finally, we can label each voxel in the MSCC according to the connectivity of fibers crossing it. If streamlines passing through the voxel connect to multiple cortical zones, we label the voxel according to the cortical zone that has the most fibers passing through the voxel. If no streamlines pass through a voxel, it has a clear label.
Fig. 1. Left: Examples of warped labeled atlas in the space of T1-weighted image; right: example of color labeled fibers. (Red for frontal lobes, blue for parietal lobes, yellow for temporal lobes and purple for occipital lobes).
3 Results 3.1 Subjects and Data Acquisition Our evaluation experiments use a dataset from a chromosome 22q11.2 deletion syndrome study. It includes 3.0-T high-resolution structural MRI and diffusion tensor MRI scans for 11 normally developing children and 19 children with the 22q11.2 syndrome. The structural MRI was acquired using a T1-weighted magnetization-prepared rapid gradient echo (MP-RAGE) sequence with the following scanning parameters: repetition
Evaluation of Shape-Based Normalization in the Corpus Callosum
781
time (TR) 1620 ms, echo time (TE) 3.87 ms, 15◦ flip angle, matrix size = 256 × 192, slice thickness of 1.0 mm, spacing between slices of 1.0 mm, yielding 160 axial slices with in-plane resolution of 0.98 × 0.98 mm. A single-shot, spin-echo, diffusion-weighted echo-planar imaging (EPI) sequence was used for the diffusion tensor MRI. The diffusion scheme was as follows: one image without diffusion gradients (b = 0 s/mm2 ), hereafter referred to as the [b = 0] image, followed by twelve images measured with twelve non-collinear and non-coplanar diffusion encoding directions isotropically distributed in space (b = 1000 s/mm2 ). Additional imaging parameters for the diffusion-weighted sequence were: TR = 6500 ms, TE = 99 ms, 90◦ flip angle, number of averages = 6, matrix size = 128 × 128, slice thickness = 3.0 mm, spacing between slices = 3.0 mm, 40 axial slices with in-plane resolution of 1.72 × 1.72 mm.
(a)
(b)
(c)
(d)
Fig. 2. (a)Illustration of PISA parameterization for one subject (Dice coefficient between the binary mask and its cm-rep model is given on the top). (b)The mean connectivity map obtained by cm-rep approach rendered on mean MSCC shape. (c) Example of tract FA plotted on the MSCC. (d) Illustration of shape-based normalization by plotting the mean connectivity map on shape-based coordinate system.
3.2 Matching of the Connectivity Labels Given the connectivity-based labeling of the MSCC described in Sec. 2.3, it is possible to evaluate normalization algorithms based on how well they align connectivity labels across subjects. To do that, we measure the Dice overlap coefficient between normalized maps. Overlap is computed separately for each of the four labels in the connectivity map and is averaged among all pairs of subjects within each cohort. Table 1 lists the average overlaps for the FACT tracking method with parameters FA threshold 0.25 and curvature threshold 60 degrees, showing significantly higher overlap for the cmrep method. The comparison was repeated for the 8 variations of the tracking method, showing similar result. The maximum p-values among all 8 tracking methods are also included in Table 1. The difference in overlaps was statistically significant in almost all cases, with the exception of the temporal lobe (the temporal lobe occupies a very small portion of the connectivity map and is matched poorly by all normalization methods). To help explaining the differences between the two normalization methods, we examine the Jacobian determinant maps associated with warping each subject into the normalized space. The average (over all 30 subjects in the study) variance of the Jacobian determinant inside MSCC is 0.22 for registration based normalization, and 0.06
782
H. Sun et al.
for cm-rep based normalization. As shown in Fig. 3, the Jacobian map of cm-rep normalization is much more uniform than that of registration. This is to be expected, since cm-rep correspondences are more global in nature than correspondences based on local regularization priors, which are employed in registration. This difference in deformation fields can help explain better alignment of connectivity maps by the cm-rep method. Table 1. Comparison of deformable registration based normalization and cm-rep based normalization for matching the connectivity label of each lobe. The quality of label matching is quantified using Dice similarity coefficients between pairs of subjects. Statistics are carried out to measure the significance of the differences between normalization methods.
CTRL(11 subjects)
registration cm-rep p-value Max p-value among 8 methods 22Q PAT(19 subjects) registration cm-rep p-value Max p-value among 8 methods All(30 subjects) registration cm-rep p-value Max p-value among 8 methods
FRONT 0.735 0.811 < 0.001 < 0.001 0.768 0.835 <0.001 <0.001 0.761 0.824 <0.001 <0.001
PAR 0.481 0.553 0.006 0.030 0.481 0.546 <0.001 <0.001 0.483 0.543 <0.001 <0.001
TEMP 0.124 0.134 0.387 0.478 0.264 0.272 0.362 0.408 0.209 0.215 0.334 0.420
OCC 0.514 0.633 <0.001 0.007 0.495 0.608 < 0.001 < 0.001 0.503 0.614 <0.001 <0.001
2.0
0.2
Fig. 3. Color encoded Jacobian determinant maps inside MSCC for one subject. Left: cm-rep mapping; right: deformable registration.
3.3 Statistics on Tract FA Maps Finally, we compare the effect of normalization on the power of statistical analysis involving mean FA features derived from diffusion tensor tractography. As described in Sec. 2.3, a tract-wise mean FA value is associated with each point in the MSCC of each subject. Using each normalization method, the tract-wise FA maps are warped to a common template space, in which point-wise statistical analysis (a two-sample unpaired t-test comparing patients and controls) is performed. Regions of statistical significance in the template space are found using cluster analysis with permutation testing. Pixels with t-value higher than 3.13 is selected to build clusters, and permutation testing is used to build an empirical distribution of cluster size under the null hypothesis. For
Evaluation of Shape-Based Normalization in the Corpus Callosum
783
normalization based on region-of-interest registration, no clusters with p-value below 0.05 are detected, regardless of the tracking method used. In contrast, cm-rep normalization finds consistent results across 8 tracking method. Fig. 4 summarized the result for FACT tracking with a FA threshold of 0.25 and a curvature threshold of 60 degrees. A significant cluster in the mid-section of the MSCC is detected. We also collapsed the tract FA data on the PISA skeleton by summarizing the value of all points with the same t PISA coordinate, and plot the summarized value along PISA skeleton in Fig. 4. The summarized tract FA values for control groups appear to have three peaks along the PISA skeleton, while the patient group does not have an obvious middle peak. The statistical analysis confirms the significance of this difference. 0
Mean IntegratedFA 0.2 0.4 0.6 0.8
1.
0
logadjusted pvalue 0.25 0.5 0.75
1.
3
0.4
2.5 0.3
2 1.5
0.2
1 0.1 0.5 0 0
0.2
0.4
0.6
0.8
1.
0
0.25
0.5
0.75
1.
Fig. 4. This figure illustrates the differences of tract FA between the control group and 22q group, obtained using shape-based normalization. (1)Left: result of cluster analysis, areas with significant differences are shown in color. (2)Middle: the mean tract FA map of each groups after collapsing onto the PISA skeleton; the blue curves are for control group and red curves are for patient group. (3)Right: plot of -Log(adjusted p-values) for multiple statistical tests on the difference between the two groups after collapsing to skeleton; the p-values are corrected for multiple comparison using step down permutation; and the blue line is the significant threshold corresponding to adjusted-p=0.05.
4 Conclusion We have evaluated shape-based normalization method on MSCC in the context of a DTI study for 22q11.2 deletion syndrome. Results show that shape-based normalization aligns subregions of the MSCC, defined by connectivity, more accurately, and also helps increase the statistical power of group analysis in an experiment where tract FA is compared between two cohorts. These results suggest that cm-rep can be a more appropriate tool than the more widely used registration for normalizing the corpus callosum in white matter studies.
References 1. Avants, B.B., Gee, J.C.: Symmetric geodesic shape averaging and shape interpolation. In: ECCV Workshops CVAMIA and MMBIA, pp. 99–110 (2004) 2. Avants, B.B., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration: Evaluating automated labeling of elderly and neurodegenerative cortex and hippocampus. In: Pluim, J.P.W., Likar, B., Gerritsen, F.A. (eds.) WBIR 2006. LNCS, vol. 4057, Springer, Heidelberg (2006)
784
H. Sun et al.
3. Cook, P.A., Bai, Y., Hall, M.G., Nedjati-Gilani, S., Seunarine, K.K., Parker, G.J.M., Alexander, D.C.: Camino: Open-source diffusion-MRI reconstruction and processing. In: Proceedings of the Scientific Meeting of the International Society for Magnetic Resonance in Medicine, p. 2759 (2006) 4. Cook, P.A., Zhang, H., Avants, B.B., Yushkevich, P.A., Alexander, D.C., Gee, J.C., Ciccarelli, O., Thompson, A.J.: An automated approach to connectivity-based partitioning of brain structures. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 164–171. Springer, Heidelberg (2005) 5. Corouge, I., Fletcher, P.T., Joshi, S., Gouttard, S., Gerig, G.: Fiber tract-oriented statistics for quantitative diffusion tensor MRI analysis. Medical Image Analysis 10, 786–798 (2006) 6. Davies, R.H., Twining, C.J., Cootes, T.F., Waterton, J.C., Taylor, C.J.: A minimum description length approach to statistical shape modeling. IEEE Trans. Med. Imaging 21(5), 525– 537 (2002) 7. Huang, H., Zhang, J., Jiang, H., Wakanaa, S., Poetschera, L., Miller, M.I., van Zijl, P.C.M., Hillis, A.E., Wytik, R., Mori, S.: DTI tractography based parcellation of white matter: Application to the mid-sagittal morphology of corpus callosum. neuroim 26, 195–205 (2005) 8. Kotcheff, A.C.W, Taylor, C.J.: Automatic construction of eigenshape models by genetic algorithm. Information Processing in Medical Imaging, 1–14 (1997) 9. Leyton, M.: Symmetry-curvature duality. Computer Vision, Graphics, and Image Processing 37(3), 327–341 (1987) 10. Mori, S., Crain, B.J., Chacko, V.P., van Zijl, P.C.M.: Three dimensional tracking of axonal projections in the brain by magnetic resonance imaging. Ann. Neurol. 45, 265–269 (1999) 11. Pizer, S.M., Fletcher, P.T., Joshi, S., Thall, A., Chen, J.Z., Fridman, Y., Fritsch, D.S., Gash, A.G., Glotzer, J.M., Jiroutek, M.R., Lu, C., Muller, K.E., Tracton, G., Yushkevich, P., Chaney, E.L.: Deformable m-reps for 3D medical image segmentation. International Journal of Computer Vision 55(2), 85–106 (2003) 12. Rohlfing, T.: Transformation model and constraints cause bias in statistics on deformation fields. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006(1). LNCS, vol. 4190, pp. 207–214. Springer, Heidelberg (2006) 13. Sun, H., Yushkevich, P.A., Zhang, H., Gee, J.C., Simon, T.J.: Efficient generation of shapebased reference frames for the corpus callosum for dti-based connectivity analysis. In: CVPRW 2006. Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop, p. 87. IEEE Computer Society, Washington (2006) 14. Yushkevich, P., Dubb, A., Xie, Z., Gur, R., Gur, R., Gee, J.: Regional structural characterization of the brain of schizophrenia patients. Acad. Radiol. 12(10), 1250–1261 (2005) 15. Yushkevich, P.A., Zhang, H., Gee, J.: Continuous medial representation for anatomical structures. IEEE Trans. Med. Imaging (to appear) 16. Yushkevich, P.A., Zhang, H., Gee, J.C.: Statistical modeling of shape and appearance using the continuous medial representation. In: MICCAI. Medical Image Computing and Computer-Assisted Intervention, vol. 2, pp. 725–732 (2005)
Accuracy Assessment of Global and Local Atrophy Measurement Techniques with Realistic Simulated Longitudinal Data Oscar Camara1 , Rachael I. Scahill2 , Julia A. Schnabel1 , William R. Crum1 , Gerard R. Ridgway1 , Derek L.G. Hill1 , and Nick C. Fox2 1
Centre for Medical Image Computing (CMIC), Department of Medical Physics and Bioengineering, University College London, London WC1E 6BT, UK [email protected] 2 Dementia Research Centre (DRC), Institute of Neurology, University College London, Queen Square, London WC1N 3BG, UK
Abstract. The main goal of this work was to assess the accuracy of several well-known methods which provide global (BSI and SIENA) or local (Jacobian integration) estimates of longitudinal atrophy in brain structures using Magnetic Resonance images. For that purpose, we have generated realistic simulated images which mimic the patterns of change obtained from a cohort of 19 real controls and 27 probable Alzheimer’s disease patients. SIENA and BSI results correlate very well with gold standard data (BSI mean absolute error < 0.29%; SIENA < 0.44%). Jacobian integration was guided by both fluid and FFD-based registration techniques and resulting deformation fields and associated Jacobians were compared, region by region, with gold standard ones. The FFD registration technique provided more satisfactory results than the fluid one. Mean absolute error differences between volume changes given by the FFD-based technique and the gold standard were: sulcal CSF < 2.49%; lateral ventricles < 2.25%; brain < 0.36%; hippocampi < 1.42%.
1
Introduction
Atrophy measurements in some key brain structures, obtained from structural Magnetic Resonance (MR) images, can be used as biomarkers for neurodegenerative diseases in clinical trials [1], giving complementary information to cognitive tests. Computational anatomy methods [2] have been developed to analyse longitudinal and cross-sectional MR data, including quantification of atrophy. Until recently, the evaluation of these methods has been extremely difficult since there was no reliable gold standard. Furthermore, semi-automatic or manually traced measurements of regions of interest suffer from lack of reproducibility and sensitivity, as well as being labor-intensive. Recently, Karacali et al. [3] and Camara et al. [4] proposed two different approaches 1 aiming to answer this 1
Both authors provide tools or gold standard data at the following addresses: https://www.rad.upenn.edu/sbia/; http://www.ixi.org.uk
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 785–792, 2007. c Springer-Verlag Berlin Heidelberg 2007
786
O. Camara et al.
question. The first technique is based on the generation of topology-preserving deformation fields with Jacobian determinants matching the desired volumetric changes on a specific region of interest. The main drawback of this technique is that it does not take into account the interrelation of different structures. Camara et al. presented a technique in which atrophy in brain structures is simulated with a thermoelastic model of tissue deformation. The main drawback of this method is that it requires a set of segmented structures to build the input of the FEM solver, unlike Karacali’s method, which does not necessarily need a segmentation step prior to simulation (the region of interest can be a sphere centered on a manually selected point in the image). On the other hand, in [4], the biomechanical readjustment of structures is modelled, using conventional physics-based techniques based on biomechanical tissue properties. In this work, we have analysed the accuracy of some well-known longitudinal atrophy measurement methods with a realistic gold standard, providing very valuable information for their use in a clinical context or trials. To the best of our knowledge, this is the first time that such an assessment study, using plausible simulated longitudinal brain atrophy to compare several techniques, has been performed. The gold standard data has been created using an improved version of the methodology presented in [4]. Two popular methods, SIENA [5] and BSI [6], that provide global estimates of brain atrophy, and Jacobian integration guided by two different nonrigid registration methods (B-Spline FFD [7] and fluid-based [8] one), that gives local volume changes, were evaluated. A cohort of pairs of MR scans corresponding to 27 probable Alzheimer’s Disease (AD) patients and 19 age-matched controls was used to guide the generation of the gold standard data.
2
Atrophy Simulation
The methodology for the generation of simulated images involves four main steps: generation of a reference labelled 3D mesh; its adaptation to every subject anatomy; running the FEM solver to simulate regional volumetric change on every subject-specific mesh; and application of the resulting deformation fields to the corresponding baseline MRI. 2.1
Reference Volumetric Mesh
A reference tetrahedral mesh (868404 elements) was built using the procedure detailed in [4]. In our work, we have employed a different set of labels, according to the information we had from the cohort of real images (both hippocampi and whole brain) and the boundary conditions imposed on the FEM model, as explained in Section 2.3. Therefore, we used labels for the whole brain (Grey Matter and White Matter together), the lateral ventricles, left and right hippocampus, the subtentorial area, extra-sulcal and sulcal cerebrospinal fluid (CSF), the last three being relevant for boundary condition purposes. The separation of the extra-ventricular CSF into two different classes was obtained by applying a brain
Accuracy Assessment of Atrophy Measurement Techniques
787
Fig. 1. From left to right: original MNI Brainweb atlas; image with simulated atrophy with old; and new boundary conditions; subtraction between simulated AD images with old and new boundary conditions
cortex segmentation tool, available in Brainvisa 2 , on the grey-level version of the MNI Brainweb 3 atlas. The outer interface of the resulting segmentation reaches the brain hull, therefore it includes sulcal CSF, which is isolated using the GM and WM labels of the MNI atlas. 2.2
Subject-Specific Meshes
In order to generate a cohort of simulated images with neuroanatomical variation representative of the population, the atlas-based 3D mesh has been adapted to the cohort of real images cited above, thus building a set of corresponding subject-specific meshes, which will be subsequently introduced into the FEM solver. T1-weighted volumetric MR images acquired on a 1.5 Tesla Signa unit (General Electric, Milwaukee) using a 256*256 matrix to provide 124 contiguous 1.5mm coronal slices through the head were available for every subject. The mesh adaptation was performed by applying a Mesh-Warping (MW) technique [9], in which the transformation resulting from a fluid registration [8] between the grey-level atlas image and every subject MR scan is applied to the atlas-based mesh. The classical Jaccard overlap measure of semi-automatically obtained brain-masks [10] after fluid registration was (mean±STD) 0.855±0.024 for probable ADs and 0.886±0.018 for controls, demonstrating a good fitting of the subject-specific meshes. The fluid registration generates diffeomorphic transformations, thus, theoretically assuring non-negative volume mesh elements after the Mesh-Warping procedure. Nevertheless, we found 8 cases in ADs and 5 in controls with folded elements after MW (mean of 1.62 negative elements in ADs and 1.20 in controls, out of 868404 elements), due to some local Jacobian values very close to zero and the interpolation step needed to warp the meshes. All negative volume elements were corrected by moving conflicting nodes to their neighbour barycentre. 2.3
Finite-Element Deformation Model
Model. The subject-specific meshes were introduced into the FEM solver, in which volume changes were simulated with a thermoelastic model of soft tissue 2 3
http://brainvisa.info http://www.bic.mni.mcgill.ca/brainweb/
788
O. Camara et al.
deformation, based on the TOAST package, which is freely available 4 . After defining the elastic material properties of every region, a set of thermal coefficients, one per structure, that will result in differential regional volume changes to be applied, is computed. These were based on semi-automatically obtained segmentations [10] for both hippocampi and the whole brain between the pairs of MR scans of the cohort described above. Homogeneous Dirichlet boundary conditions were introduced using a PayneIrons method to suppress the displacement of the mesh nodes corresponding to the surface of the mesh and the subtentorial area. In Camara et al. [4], there was not any restriction on the labels composed of CSF, since they should fill the space left by brain atrophy. This approach could result in unrealistic shifts at the top and the bottom of the brain, as illustrated in Figure 1. We addressed this problem in a different way, by separating the extra-ventricular CSF into sulcal and extra-sulcal CSF, and applying Dirichlet boundary conditions to the latter, notably improving the realism of the simulated images (see Figure 1). This fact raises the question about the behaviour of sulcal and extra-sulcal CSF and their interrelation when atrophy occurs, which is, to the best of our knowledge, not investigated in the literature. Gold Standard Data. The FEM solution consists of a deformation field, defined at each node of the mesh, which produces the desired volume changes. It can be directly applied to the input mesh or introduced to an interpolation step to generate a voxel-by-voxel defined deformation. In [4], the volumetric gold standard data was obtained by integrating over every region the volume differences between corresponding elements of the original and warped meshes. Therefore, it did not take into account the interpolation step needed to apply the FEM solution to a grey-level image. In our work, the volumetric gold standard data is obtained directly from the voxel-by-voxel deformation fields after this interpolation step since they are the ones used to generate the simulated grey-level images, i.e. they are a more accurate gold standard. The determinants of the Jacobians of these dense deformation fields were computed and integrated over partial volume Regions of Interest (ROI) defined with available information about the percentages of every tissue on each mesh element. Figure 2 shows an example of a gold standard deformation field and its corresponding Jacobian map for a simulated AD subject.
3
Atrophy Measurement Techniques
3.1
Global Techniques (BSI/SIENA)
Freeborough and Fox [6] proposed the Boundary Shift Integral (BSI) technique which computes volume change via the amount by which the boundaries of a given cerebral structure have moved. A region around these boundaries is defined with a series of morphological operations, and, subsequently, volume 4
http://www.medphys.ucl.ac.uk/∼ martins/toast/index.html
Accuracy Assessment of Atrophy Measurement Techniques
789
Fig. 2. Gold standard deformation field (left) and zoom into the lateral ventricles (middle, top) and both hippocampi (middle, bottom). The corresponding Jacobian map (right) is also shown (yellow: volume gain; red: volume loss; blue: small volume change).
loss is approximated by integrating the differences in intensities between both MR scans over the defined region, normalized by image intensity means and bounded by predefined upper and lower intensity values. A rigid registration and a semi-automatic segmentation step to delineate the targeted structures are needed as pre-processing steps. The SIENA (Structural Image Evaluation, using Normalization, of Atrophy) technique, proposed by Smith et al. [5], automatically extracts the brain from a pair of MR images, aligns the brain masked images with an affine transformation constrained by outer skull segmentations and finally estimates atrophy based on the movement of brain edges. The latter step is based on finding all brain surface edges in both MR scans, integrating the distance between the closest matching edge points, multiplied by voxel volume and normalized by the number of points found. 3.2
Local Techniques (Jacobian Integration)
The main purpose of Jacobian integration strategies is to capture volume changes within the deformation fields resulting from applying a high-dimensional registration technique between two pairs of MR scans. The analysis of the deformation fields is usually achieved at a voxel-wise level by computing the determinant of their Jacobian matrix, which gives a point-estimate of volumetric change. Additionally, if regions of interest are available, an integration of the Jacobians over these regions gives an estimate of their local volume change. Therefore, the accuracy will depend on the performance of the registration methodology chosen to cope with the deformations between the images. In this work, we have evaluated, for cerebral atrophy measurement purposes, two well-known and widely used registration techniques: B-Spline Free-Form Deformations [7] and a particular implementation of a fluid-based method [8]. The former deforms an image volume by manipulating an underlying mesh of control points. Displacements are then interpolated using the 3D cubic B-spline tensor. For this work, a multi-level FFD was used [11], by first deforming an isotropic FFD mesh of 5mm resolution, followed by a further refinement step using a 2.5mm resolution mesh. The meshes were adapted to exclude deforma-
790
O. Camara et al.
Fig. 3. Correlation (left) and mean differences (right) between gold standard and SIENA/BSI results for simulated controls and patients
tions outside of the reference brain masks. The similarity measure of choice was Normalized Mutual Information. For each level, gradient descent optimization for a maximum of 20 iterations, for 4 steps and initial step sizes of 2mm and 1mm, respectively, was performed. In fluid registration the transformation is modelled as a viscous flow which warps the source image to match the target image. The driving force was derived from Intensity Cross-Correlation (ICC). The registration was run at half image-resolution, without any masking of structures, for up to 400 iterations subject to the ICC improving globally at each iteration. Further details on the implementation can be found in [8].
4 4.1
Results Global Techniques (BSI/SIENA)
Figure 3 shows a good correlation between the gold standard volume changes and SIENA/BSI results, both for controls and patients. Average absolute differences in brain volume change with respect to the gold standard (-0.39±0.74 for controls and -1.45±0.94 for patients) were small, both for controls (BSI of 0.14±0.10; SIENA of 0.18±0.14) and probable ADs (BSI of 0.29±0.24; SIENA of 0.44±0.48). SIENA and BSI results provided similar accuracy but their behaviour was different since SIENA tended to overestimate brain volume change whereas BSI underreported atrophy consistently, as illustrated in Figure 3 (left). Another difference can also be observed between the performance of both methods in controls and probable ADs respectively, the latter being more challenging due to a higher amount of brain volume change. 4.2
Local Techniques (Jacobian Integration)
Volume change values (mean±STD of percentage of the baseline structure volume), for simulated controls and patients, from the gold standard and those provided by the FFD and fluid-based techniques, are shown in Table 1.
Accuracy Assessment of Atrophy Measurement Techniques
791
Table 1. Volume changes (mean±STD of percentage of the baseline structure volume) provided by gold standard, FFD and fluid techniques Controls Structures
Gold Extra-sulcal CSF 1.43±3.12 Sulcal CSF 1.95±4.20 Ventricles 2.24±5.22 Left Hippocampi -0.03±1.11 Right Hippocampi -0.39±1.17 Subtentorial Area 0.09±0.19 Whole Brain -0.47±1.01
FFD 0.95±2.65 1.13±2.50 1.96±4.91 -0.18±0.56 -0.35±0.57 0.03±0.18 -0.40±0.78
Patients Fluid 0.72±1.57 0.60±1.32 2.04±4.30 -0.11 ±0.31 -0.20±0.54 0.09±0.17 -0.32±0.61
Gold 4.52±2.87 6.11±4.07 6.02±4.13 -4.00±2.97 -4.05±3.05 0.30±0.20 -1.71±1.12
FFD 3.27±2.92 3.61±2.94 5.09±3.94 -2.99±2.13 -2.80±2.08 0.23±0.20 -1.36±1.02
Fluid 1.87±1.60 1.71±1.48 5.08±3.85 -1.90±1.29 -1.71±1.33 0.28±0.21 -1.06±0.82
Fig. 4. Jacobian images of gold standard (left) , FFD (centre) and fluid (right) deformation fields (yellow: volume gain; red: volume loss; blue: small volume change)
The FFD-based method provided extremely good accuracy in whole brain and both hippocampi, with larger errors occurring in the sulcal CSF, as was expected due to the complexity of such a region. The fluid algorithm provided less accurate estimates of volume change for all structures involved in our experiment. An additional difference with respect to the FFD-based technique is the smoothness of the fluid-derived Jacobian map, as shown in Figure 4. This figure allows a visual comparison, for one probable AD case, between the Jacobian maps computed from the gold standard, the FFD and fluid deformation fields. The smoothness of the fluid-based Jacobian map is due to its intrinsic distribution of the volume change in homogeneous regions.
5
Conclusions
To the best of our knowledge, this work presents the first report of the accuracy of some well-known atrophy measurement techniques with realistic simulated data. Both global techniques analysed in this paper have been extensively employed for the neuroimaging community and results presented here provide the accuracy obtained by these methods for global volume change estimation purposes. Regarding local methods, the FFD-based one performed better than the fluid registration technique, demonstrating promising results for being used in studies such as clinical trials. Future work will be focused on the definition of good quantitative measures to evaluate the performance of each method and the
792
O. Camara et al.
assessment of cross-sectional methods such as Voxel-Based Morphometry ones with realistic simulated data. Acknowledgement. O. Camara acknowledges support of the EPSRC GR/ S48844/01, Modelling, Understanding and Predicting Structural Brain Change. W.R. Crum acknowledges support of the Medical Images and Signals IRC (EPSRC GR/N14248/01 and UK Medical Research Council Grant No. D2025/31). J.A. Schnabel acknowledges support of the EPSRC GR/S82503/01, Integrated Brain Image Modelling project. R.I. Scahill and N.C. Fox acknowledge support from the UK Medical Research Council, G90/86 and G116/143 respectively.
References 1. Fox, N., Black, R., Gilman, S., Rossor, M., Griffith, S., Jenkins, L., Koller, M.: Effects of Aβ immunization (AN1792) on MRI measures of cerebral volume in Alzheimer’s disease. Neurology 64, 1563–1572 (2005) 2. Ashburner, J., Csernansky, J., Davatzikos, C., Fox, N., Frisoni, G., Thompson, P.: Computer-assisted imaging to assess brain structure in healthy and diseased brains. Lancet Neurology 2, 79–88 (2003) 3. Karacali, B., Davatzikos, C.: Simulation of tissue atrophy using a topology preserving transformation model. IEEE Transactions on Medical Imaging 25, 649–652 (2006) 4. Camara, O., Schweiger, M., Scahill, R., Crum, W., Sneller, B., Schnabel, J., Ridgway, G., Cash, D., Hill, D., Fox, N.: Phenomenological model of diffuse global and regional atrophy using finite-element methods. IEEE Transactions on Medical Imaging 25, 1417–1430 (2006) 5. Smith, S., Stefano, N.D., Jenkinson, M., Matthews, P.: Normalized accurate measurement of longitudinal brain change. Journal of Computer Assisted Tomography 25(3), 466–475 (2001) 6. Freeborough, P., Fox, N.: The boundary shift integral: an accurate and robust measure of cerebral volume changes from registered repeat MRI. IEEE Transactions on Medical Imaging 16(5), 623–629 (1997) 7. Rueckert, D., Somoda, I., Hayes, C., Hill, D., Leach, M., Hawkes, D.: Nonrigid Registration Using Free-Form Deformations: Applications to Breast MR Images. IEEE Transactions on Medical Imaging 18(8), 712–721 (1999) 8. Crum, W., Tanner, C., Hawkes, D.: Anisotropic multi-scale fluid registration: evaluation in magnetic resonance breast imaging. Physics in Medicine and Biology 50, 5153–5174 (2005) 9. Camara, O., Crum, W., Schnabel, J., Lewis, E., Schweiger, M., Hill, D., Fox, N.: Assessing the quality of Mesh-Warping in normal and abnormal neuroanatomy. In: Medical Image Understanding and Analysis (MIUA 2005), pp. 79–82 (2005) 10. Freeborough, P., Fox, N., Kitney, R.: Interactive algorithms for the segmentation and quantitation of 3-D MRI brain scans. Computer Methods and Programs in Biomedicine 53, 15–25 (1997) 11. Schnabel, J., Tanner, C., Castellano-Smith, A., Leach, M., Hayes, C., Degenhard, A., Hose, R., Hill, D., Hawkes, D.: Validation of Non-Rigid Registration using Finite Element Methods. In: Insana, M.F., Leahy, R.M. (eds.) IPMI 2001. LNCS, vol. 2082, pp. 183–189. Springer, Heidelberg (2001)
Combinatorial Optimization for Electrode Labeling of EEG Caps Micka¨el P´echaud1, Renaud Keriven1 , Th´eo Papadopoulo1, and Jean-Michel Badier2 1 Odyss´ee Lab, ´ ´ Ecole Normale Sup´erieure, Ecole des Ponts, INRIA. 45, rue d’Ulm - 75005 Paris - France [email protected] 2 INSERM, U751, Marseille, France
Abstract. An important issue in electroencephalographiy (EEG) experiments is to measure accurately the three dimensional (3D) positions of the electrodes. We propose a system where these positions are automatically estimated from several images using computer vision techniques. Yet, only a set of undifferentiated points are recovered this way and remains the problem of labeling them, i.e. of finding which electrode corresponds to each point. This paper proposes a fast and robust solution to this latter problem based on combinatorial optimization. We design a specific energy that we minimize with a modified version of the Loopy Belief Propagation algorithm. Experiments on real data show that, with our method, a manual labeling of two or three electrodes only is sufficient to get the complete labeling of a 64 electrodes cap in less than 10 seconds.
1
Introduction
Electroencephalography (EEG) is a widely used method for both clinical and research purposes. Clinically, it is used e.g. to monitor and locate epilepsy, or to characterize neurological disorders such as sleeping or eating disorders and troubles related to multiple sclerosis. Its main advantages are its price compared to magnetoencephalography (MEG), and its very good time resolution compared e.g. to fMRI. Conventionally, EEG readings were directly used to investigate brain activity from the evolution of the topographies on the scalp. Nowadays, it is also possible to reconstruct the brain sources that gave rise to such measurements, solving a so-called inverse problem. To this purpose, it is necessary to find the electrode positions and to relate them to the head geometry recovered from an anatomic MRI. Current techniques to do so are slow, tedious, error prone (they require to acquire each of the electrodes in a given order with a device providing 3D coordinates[17]) and/or quite expensive (a specialized system of cameras is used to track and label the electrodes[23]). Our goal is to provide a cheap and easy system for electrode localization based on computer vision techniques. In modern EEG systems, the electrodes (64, 128 or even 256) are organized on a cap that is placed on the head. Our system takes as inputs multiple pictures of the head wearing the cap from various positions. As a preliminary step, N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 793–800, 2007. c Springer-Verlag Berlin Heidelberg 2007
794
M. P´echaud et al.
electrodes are localized and their 3D positions are computed from the images by self-calibration (a technique that recovers the cameras’ positions from the image information [8]) and triangulation. These are standard techniques that can provide 3D point coordinates with a quite good accuracy. There remains the problem of electrode identification which labels each 3D position with the name of the corresponding electrode. Finding a solution to this last problem is the focus of this paper. Note, that a good labeling software can also improve current systems by removing acquisition constraints (such as the recording of the electrodes in a given order) and by providing better user interfaces. We propose a method that recovers this labeling from just a few (two or three) manually annotated electrodes. The only prior is a reference, subject independent, 3D model of the cap. Our framework is based on combinatorial optimization (namely on an extension of the Loopy Belief Propagation algorithm[21]) and is robust to soft deformations of the cap caused both by sliding effects and by the variability in subjects’ head geometry.
2
Problem Definition
The inputs of our method consist of: – a template EEG cap model providing labeled electrodes, along with their 3D positions (in fact, as we will explain further, an important feature of our method is that only the distances between close electrodes are used). L will denote the set of labels (e.g. L = {F pz, Oz, · · · }), and C = {Cl , l ∈ L} will be their corresponding 3D positions. Cl could be for example the average position of electrode l among a variety of prior measures. However, in our experiments, it was just estimated on one reference acquisition. – the measured 3D positions of the electrodes to label, obtained by 3D reconstruction from images. We will denote by M = {Mi , i ∈ [1..n]}, these n 3D points. The output will be a labeling of the electrodes, i.e. a mapping φ from [1..n] to L. Note that n could be less than the total number |L| of electrodes in cases where some electrodes are of the cap are not used.
3
Motivation
In this section, we discuss other possible approaches for the electrode labeling problem. As it will be detailed in section 6, we have tried some of these methods without any success. This will motivate our energy-based combinatorial approach. A simple method could consist of a 3D registration step, followed by a nearest-neighbor labeling. Let T be a transformation that sends M into the spatial referential of C. A straight labeling could be: φ(i) = arg min d(Cl , T (Mi )) l∈L
where d(A, B) denotes the Euclidean distance between points A and B. Actually, we first tested two direct ways of obtaining an affine transformation T :
Combinatorial Optimization for Electrode Labeling of EEG Caps
795
– moment-based affine registration: in this case, we computed first and second order moments of the sets of points M and C and choose T as an affine transformation which superimposes these moments. – 4 points manual registration: here, we manually labeled 4 particular electrodes in M and took for T the affine transformation which exactly sends these 4 electrodes to the corresponding positions in C. As explained in section 6, we observed that these two approaches give very bad average results. One could argue that this might be caused by the quality of the registration. A solution could be to use more optimal affine registration methods, like Iterative Closest Points[26,3]. Yet, a close look at what caused bad labeling in our experiments, reveals that this would not improve the results : the main reasons are indeed that (i) the subject whose EEG has to be labeled does not have the same head measurements than the template, and moreover that (ii) the cap is a soft structure that shifts and twists from one experiment to another. It is clear that only a non-rigid registration could send M close to C. The problem can be modeled in term of space deformation. For instance, a ThinPlate Spline[5,12] based algorithm follows this approach. Another framework is the deformable shape matching one : such methods rely on shape deformation and intrinsic shape properties[24] - rather than on deforming the ambient space - in order to make the shapes match. However, because of the topology of the electrodes on the cap, relations between points are also of importance. In that sense, our problem is close to the one investigated by Coughlan et al. [7,1], which they solve recovering both deformations and soft correspondences between two shapes. Yet, in our case, we see two main differences : (i) labeling, rather than shape matching, is the key issue, and (ii) enforcing relational constraints between points are more important than regularizing deformations. For these reasons, we propose a method based on optimal labeling for which the only (soft) constraints are the distances between nearby points, without modeling any deformation. Our formulation also have the advantage of being parameter-free. In the remaining of the article, we first state our model and the associated energy; we then discuss our choice for an energy minimization algorithm. Finally, we validate our method giving qualitative and quantitative results on real experiments.
4
Proposed Framework
The complete pipeline of our system is depicted figure 1. As we already explained, we do not consider here the 3D reconstruction step, but only the labeling one. From the measured data M , we construct an undirected graph G = (V, E), where V = [1..n] is the set of vertices and E a certain set of edges which codes the relations between nearby electrodes. As it will become clear in the following, the choice of E will tune the rigidity of the set of points M . Practically, the symmetric k-nearest neighbors or all the neighbors closer than a certain typical distance, are two valid choices. Given an edge e = (i, j) ∈ E for i ∈ V and j ∈ V , we denote by dij = d(Mi , Mj ) the distance between points Mi and Mj in the measured data and by d˜ij = d(Cφ(i) , Cφ(j) ) the reference distance between the
796
M. P´echaud et al. C
Ck
φ∗ = arg min(U (φ))
3D reconstruction M
G = (V, E) i Mi
Cl
j
Mj
Fig. 1. Complete pipeline : we obtain 3D positions M (bottom left) by reconstruction from several (usually 6) pictures (top left). A graph G then is constructed from these positions (bottom right). Considering a template cap and associated positions C (top right), we label the measured electrodes by estimating φ∗ = arg min(U (φ)). In this example, φ(i) = k, φ(j) = l.
electrodes φ(i) and φ(j). In order to preserve in a soft way the local structure of the cap, we propose to simply minimize the following energy: ρ(dij , d˜ij ) (1) U (φ) = (i,j)∈E
where ρ is a cost-function which penalizes differences between the observed and template distances. Note that, whereas the global one-to-one character of φ is not explicitly enforced by this model, the local rigidity-like constraints enforce it. Graph rigidity theory is a very complex domain (see for example [4] as an introduction), beyond the purpose of this article. Following the classical framework of Markov Random Fields (MRF) [18,2,10], this can be rewritten as maximizing the following function: exp(−ρ(dij , d˜ij )) = Ψi,j (φ(i), φ(j)) (2) P (φ) = exp(−U (φ)) = (i,j)∈E
(i,j)∈E
Normalizing P by dividing by the sum over all the possible mappings φ, yields a Gibbs distribution over a MRF derived from graph G with L as the set of possible labels each vertex. The problem is thus reduced to the classical case of finding a Maximum A Posteriori (MAP) configuration of a Gibbs distribution: p(φ) =
1 ψi (φ(i)) ψi,j (φ(i), φ(j)) K i∈V
(3)
(i,j)∈E
where K is a normalizing constant. ψi represents some extra prior information which can be added to the model. We have ψi (φ(i)) = 1 if there is no prior information over the labeling. However, ψi can be designed to take into account
Combinatorial Optimization for Electrode Labeling of EEG Caps
797
various priors. As explained in 6, we merely impose the label of some electrodes, but for example one could imagine using color information obtained from the pictures as a prior to the labeling.
5
Energy Minimization
The problem of finding a MAP configuration of a Gibbs distribution being NPcomplete [15], we cannot expect to get an algorithm that optimally solves every instance of the problem. Since the seminal work of Geman & Geman [10], who proposed an algorithm that warrants the probabilistic convergence toward the optimal solution – however with an unreasonable run-time – several methods have been investigated to maximize general distributions like (3). Among these, minimal-cut based methods (often referred to as GraphCuts), introduced in computer vision and image processing by [11], has received many attention (see [14,6]). These methods can achieve global optimization for a restricted class of energies[13]. For more general energies, approximations were proposed [22]. As we experimented[19], these approximations fail to recover a correct labeling in our problem, which belongs to a class of multilabel problems that are not easily tackled by GraphCuts. As a consequence, we opted for a completely different but widely spread algorithm, namely Belief Propagation (BP), and more precisely for its variant adapted to graphs: Loopy Belief Propagation (LBP). Please see [9] for a recent reference. Briefly, it consists in propagating information through the edges of the graph: each node i sends messages to its neighbors k, measuring the estimated label of k from its own point of view. Messages are passed between nodes iteratively until a convergence criterion is satisfied. This algorithm is neither guaranteed to converge nor to converge to an optimal solution. However, it behaves well in a large variety of early vision problems. Empirical and theoretical convergence of this family of methods were studied for instance in [20,25]. Actually, we designed for this work an original and faster version of LBP. It is an improved version of LBP based on the idea of [16]. At the beginning of each iteration, it performs a label pruning at each node, which leads to a slight speedup. However, unlike in [16], a pruned label can reappear in the next iterations, hence a non-greedy behavior of our algorithm. Due to lack of place, we refer the reader to a detailed research report[19].
6
Experiments
We used 6 sets of 63 electrodes. Each set consists of 63 estimated three dimensional points, acquired on different subjects with the same EEG cap and manually labeled. To test our algorithm as extensively as possible, we ran the algorithm on each set, taking successively each of the other sets as a reference. We hence simulated 30 different pairs (M, C). At least one electrode in M was manually labeled (see further).
798
M. P´echaud et al.
Affine registration (moment based) Affine registration (4 manual points) Our method - (F pz, Oz, T 8) manually labeled Our method - (Oz, T 8) manually labeled Our method - 3 random electrodes labeled Our method - 2 random electrodes labeled Our method - 1 random electrode labeled
N C misclassified labels 48.7% 21.3% 0% 0% 0% 0% 0% 0.03% 0.3% 0.2% 4.2% 3.7%
Fig. 2. Classification errors. N C gives the percentage of instances of the problem for which our method did not converge. Misclassified labels percentages are estimated only when convergence occurs.
Fig. 3. A sample result. M is in red and C in green. Top left: 63 estimated 3D electrodes positions. Top center: reference. Bottom left: subset of a labeling with the moment based algorithm; C4 is wrongly labeled CP4, and F1 is labeled F3 (not shown). Bottom center: a subset of correct correspondences retrieved by our algorithm. Top and bottom right: full labeling retrieved by our algorithm, superimposed with anatomical MRI.
E was chosen the following way : we first estimated a typical neighbor distance by computing the maximum of the nearest neighbor distance for all electrodes in M , and then considered as belonging to E, every pair of distinct electrodes within less than three times this distance. In order to accelerate and enforce convergence, we used the three following technical tricks: – we used our modified LBP algorithm[19] – we added a classical momentum term ([20]) – denoting by Vf the subset of V of the manually labeled electrodes, we added the set of edges Vf ×V to E, allowing accurate information (Vf electrodes’ labels being known exactly) to propagate quickly in the graph.
Combinatorial Optimization for Electrode Labeling of EEG Caps
799
Although non indispensable, this led to a mean running time of less than 11s on a standard 3GHz PC and to a smaller number of non converging optimization. y x The cost-function ρ was of the form ρ(x, y) = y+ + x+ where is a small positive constant. We did not notice sensitivity with respect to this choice, as far as the following key conditions are fulfilled: (i) penalizing differences between x and y and (ii) penalizing small values of x or y. This latest condition enforces (yet does not warrant) a one-to-one mapping φ. Different experiments where carried out. First, the prior consisted in manually labeling electrodes F pz, Oz, and T 8. In that case, our method recovers all the electrodes, which was, as expected, not at all the case with an affine registration+nearest neighbor approach (see figure 2). Actually, we observed that labeling (Oz, T 8) seems sufficient. Yet, without any further data, we do not consider that labeling two electrodes only is reliable. Figure 3 shows a result on a case where affine registration does not work and the final 3D reconstruction with our method. To demonstrate the robustness of our algorithm, we also tested hundreds of other conditions, in which 1, 2 or 3 randomly chosen electrodes were ”manually” labeled. Non-convergence was only observed for non reasonable choices of ”manually” labeled electrodes: indeed, if they are chosen on the sagittal medium line, there is an undetermination due to the left-right symmetry of the cap. This does not occur when the electrodes are set by a human operator. The classification error rates are low (see figure 2 again) but not negligible. This makes us plead for a manual labeling of two or three fixed and easy to identify electrodes, e.g. (F pz, Oz, T 8). Finally, we also successfully tested cases for which n < |L|, i.e. when some electrodes are missing (details in [19]).
7
Discussion
Experiments show that our framework leads to fast, accurate and robust labeling on a variety of data sets. We consider providing on the WEB in a near future an complete pipeline including our algorithm - ranging from 3D reconstruction of electrodes to their labeling. Such a system would only require a standard digital camera and would imply minimal user interaction (manually labeling three electrodes). Note that the flexibility of our M RF formulation allows different priors. We plan for instance to use the color of electrodes on the images as a further prior for labeling. This could lead to a fully automated system, where no user interaction would be required.
References 1. Rangarajan, A., Coughlan, J.M., Yuille, A.L.: A bayesian network framework for relational shape matching. In: 9th IEEE ICCV, pp. 671–678. IEEE Computer Society Press, Los Alamitos (2003) 2. Besag, J.: Spatial interaction and the statistical analysis of lattice systems. Journal Royal Statis. Soc. B-148, 192–236 (1974)
800
M. P´echaud et al.
3. Besl, P.J., McKay, N.D.: A method for registration of 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992) 4. Hendrickson, B.: Conditions for unique graph realizations. SIAM J. Comput. 21(1), 65–84 (1992) 5. Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. PAMI 11(6), 567–585 (1989) 6. Boykov, Y., Veksler, O., Zabih, R.: Markov random fields with efficient approximations. In: CVPR 1998, p. 648. IEEE, Washington (1998) 7. Coughlan, J.M., Ferreira, S.J.: Finding deformable shapes using loopy belief propagation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 453–468. Springer, Heidelberg (2002) 8. Faugeras, O., Luong, Q.T., Papadopoulo, T.: The Geometry of Multiple Images. MIT Press, Cambridge (2001) 9. Felzenszwalb, P., Huttenlocher, D.: Efficient belief propagation for early vision (2004) 10. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. PAMI 6(6), 721–741 (1984) 11. Greig, D.M., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. J. R. Statist. Soc. B 51, 271–279 (1989) 12. Chui, H., Rangarajan, A.: A new algorithm for non-rigid point matching. In: CVPR, pp. 2044–2051 (2000) 13. Ishikawa, H.: Exact optimization for markov random fields with convex priors (2003) 14. Ishikawa, H., Geiger, D.: Mapping image restoration to a graph problem (1999) 15. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002 (3). LNCS, vol. 2352, pp. 65–81. Springer, Heidelberg (2002) 16. Komodakis, N., Tziritas, G.: Image completion using global optimization. In: CVPR 2006, IEEE Computer Society Press, Washington (2006) 17. Kozinska, D., Nowinski, K.: Automatic alignment of scalp electrode positions with head mrivolume and its evaluation. In: Engineering in Medicine and Biology, BMES/EMBS Conference, Atlanta (October 1999) 18. L´evy, P.: Chaˆınes doubles de markov et fonctions al´eatoires de deux variables. C.R.Acad´emie des sciences 226, 53–55 (1948) 19. Pechaud, M., Keriven, R., Papadopoulo, T.: Combinatorial optimization for electrode labeling of EEG caps. Technical Report 07-32, CERTIS (July 2007) 20. Murphy, K.P., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: An empirical study. In: Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 467–475 (1999) 21. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., San Francisco (1988) 22. Raj, A., Zabih, R.: A graph cut algorithm for generalized image deconvolution. In: ICCV 2005, pp. 1048–1054. IEEE, Washington (2005) 23. Russell, G.S., Eriksen, K.J., Poolman, P., Luu, P., Tucker, D.M.: Geodesic photogrammetry for localizing sensor positions in dense-array eeg. Clinical Neurophysiology 116, 1130–1140 (2005), http://www.egi.com/c gps.html 24. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Alignment-based recognition of shape outlines. In: 4th International Workshop on Visual Form, pp. 606–618 (2001) 25. Weiss, Y., Freeman, D.: On the optimality of solutions of the max-product beliefpropagation algorithm in arbitrary graphs. IEEETIT 47 (2001) 26. Zhang, Z.: iterative point matching for registration of free-form curves. Technical Report RR-1658, INRIA (1992)
Analysis of Deformation of the Human Ear and Canal Caused by Mandibular Movement Sune Darkner1,2 , Rasmus Larsen1 , and Rasmus R. Paulsen2 1
Department of Informatics and Mathematical Modelling, Technical University of Denmark, Denmark [email protected] 2 Oticon A/S, Denmark
Abstract. Many hearing aid users experience physical discomfort when wearing their device. The main contributor to this problem is believed to be deformation of the ear and ear canal caused by movement of the mandible. Physical discomfort results from added pressure on soft tissue areas in the ear. Identifying features that can predict potential deformation is therefore important for identifying problematic cases in advance. A study on the physical deformation of the human ear and canal due to movement of the mandible is presented. The study is based on laser scannings of 30 pairs of ear impressions from 9 female and 21 male subjects. Two impressions have been taken from each subject, one with open mouth, and one with the mouth closed. All impressions are registered using non-rigid surface registration and a shape model is built. From each pair of impressions a deformation field is generated and propagated to the shape model, enabling the building of a deformation model in the reference frame of the shape model. A relationship between the two models is established, showing that the shape variation can explain approximately 50% of the variation in the deformation model. An hypothesis test for significance of the deformations for each deformation field reveals that all subjects have significant deformation at Tragus and in the canal. Furthermore, a relation between the magnitude of the deformation and the gender of the subject is demonstrated. The results are successfully validated by comparing the outcome to the anatomy by using a single set of high resolution histological sectionings of the region of interest.
1
Introduction
A recent survey has shown that physical comfort and acoustical feedback are among the ten most important issues for hearing aid user satisfaction [1]. It is well known among hearing-aid manufacturers that physical deformation of the human ear canal is connected to problems with both comfort and acoustical feedback. Furthermore, it is known that deformation of the ear canal is closely linked to speaking, chewing, yawning, and movement of the mandible in general. The human ear canal consists of a soft and a bony part. The bony part is em
Thanks to 3D Lab at the department of Orthodontics, Panum Institute, Denmark.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 801–808, 2007. c Springer-Verlag Berlin Heidelberg 2007
802
S. Darkner, R. Larsen, and R.R. Paulsen
a
b
c
Fig. 1. (a) Map of the anatomy of the human ear. (b&c) Histological sectioning of the human ear from [2]. (b) A transversal cut containing the canal. As can be seen, the canal is situated between the mastoid and the mandible before entering the mastoid itself. Furthermore, there are two bends of the canal. The outer bend is called the first bend and the inner, just before the canal enters the mastoid, is called the second bend. (c) A cut in the sagittal plane at the dashed red line of b showing the soft tissue around the ear canal between the mandible and the mastoid.
bedded in the mastoid and, thus, not subject to deformation. However, the soft part of the canal is situated between the mandible and the mastoid surrounded by skin, cartilage, and fat; all tissues that are highly deformable. Fig. 1(a) shows an anatomical labelling of the human ear. From fig. 1(b) and (c) it is obvious that movement of the mandible will cause deformation of the tissue around it, hence, changing the shape of ear canal. Very little is known about the nature of this deformation seen from a hearing-aid perspective. We believe that systematic knowledge of the shape change of the ear canal can be used in future hearingaid production, thus, creating better and more comfortable hearing aids. In this study, a set of 3D scanned ear impressions (see fig. 2(a)) is used in a non-rigid registration framework to create a shape model and a deformation model. In the following analysis, we try to establish an understanding of where in the ear and canal the significant shape changes occur and if these changes are related to the shape of the ear and canal. Furthermore, it is examined if there is any
a
b
Fig. 2. (a) A typical scanning of an impression taken from the production. The scanning has been opened at the top, and the lower part has had most artifacts removed manually. Some of the anatomical features have been labelled. (b) The magnitude of the deformation field projected onto the open mouth impression.
Analysis of Deformation of the Human Ear and Canal
803
gender-related differences in ear-canal dynamics. All such relations will be beneficial in discovering problematic cases.
2
Previous Work
It is only recently that 3D scanners have been introduced in the production of hearing aids. Therefore, most prior work on ear canal shape was done directly on ear impressions using calipers etc. Oliviera et al. [3,4] have analyzed the changes that occur in the ear canal due to movement of the mandible and concluded that there is a deformation and also a change in volume. They claim that the deformation only occurs in the coronal plane. However, Grenness et al. [5] have shown that the deformation is more complex, assuming that the Concha is stable during movement of the mandible. This claim remains to be proven. Finally, Pirzanski [6] has analyzed the dynamics of the ear canal with the goal of increasing hearing aid users acceptance rates. However, all of the above is based on manual measurements and manual registration, which is prone to error. A statistical shape model of the static ear canal based on scanned ear impressions and automated registration have been presented by Paulsen et al. [7].
3
Data
The data consists of 60 scannings of ear impression taken from 30 individuals, 21 males and 9 females, ranging from 25 to 65 years of age. Two impressions were taken from each, one with open mouth using a mouth prop to create a similar opening angle for all subjects and one with closed mouth. To ensure consistency, all impressions were made by the same audiologist and scanned on a 3D laser scanner by the same operator.
4
Inspection of the Anatomy
The images seen in fig. 1(a) and (b) are a part of a high-resolution histological sectioning study by Sørensen et al. [2]. The data set includes the outer ear, making it possible to investigate the physiology of the human ear and canal. As mentioned, the mandible is situated in front of the ear canal between the first and the second bend. It is known that the tissue surrounding the canal is not directly attached to the mandible. When the mandible moves forward a small void is created, which is filled by the surrounding tissue. It is expected that this will cause a deformation of the wall of the ear canal on the anterior side between the first and the second bend. As fig. 1(c) shows, the posterior and top side of the canal are situated very close to the mastoid, thus, limiting the amount of deformation on this side of the canal. Tragus and Cavum Concha is situated on the soft tissue surrounding the Mandible. In fact, careful examination reveals that the whole outer ear is situated on soft deformable tissue, especially
804
S. Darkner, R. Larsen, and R.R. Paulsen
the part below Crus of Helix. Hence, Grenness’ assumption of a stable Concha seems to be wrong and the ear below Crus of Helix can be expected to move inwards perpendicular to the sagittal plane as the mouth opens. Inspection of the histological data reveals that the best reference for the data in this study is the Cymba Concha. This part of the ear might also be subject to deformation. However, since it is situated on the outside of the mastoid, contrary to the Cavum Concha which is situated just beneath, it is not as likely as other parts of the outer ear to displace and deform.
5
Model Generation
To make a consistent data analysis a frame of reference must be established. A fully automatic rigid registration algorithm by Darkner et al. [8], evaluated by Darkner et al. [9], is used to register the Cymba Concha of all impression pairs. A highly constrained non-rigid surface registration is then applied to create the deformation field, see fig. 2 (b). Secondly, to establish dense point to point correspondence across the population the non-rigid surface registration algorithm is applied to all closed mouth impression. Dense point to point correspondence is generated from the resulting registration using the angle weighted normal method by Bærentzen and Aanæs [10] and ray tracing [11]. The result is then propagated to the deformation field for correspondence between the deformations and shapes. The non-rigid registration is based on the diffeomorphic warp presented by Cootes and Twining [12], extended to 3D by VesterChristensen et al. [13], using the distance and cost functions of [8]. A steepest descent algorithm implemented as the inverse compositional algorithm by Baker and Matthews [14] reduces the registration time to 4-8 minutes per shape registered on a 1.7 GHz laptop PC. From the Procrustes [15], registered shapes a shape model and a deformation model are created. Generally, the impressions do not depict the exact same part of the ear, only an overlapping region; hence, the models are cropped to their common subset.
6
Analysis and Results
Visual inspection of the data reveals that almost all of the subjects have a clear visible deformation of their ear canal due to movement of the mandible. It is evident that a deformation occurs in the canal that has its maximum on the anterior side of the wall between the first and the second bend. As seen from fig. 3 the mean deformation is exactly that. This confirms our observations from the histological sectioning. Additionally, a deformation of Tragus, AntiTragus and Cavum Concha can be observed in the deformation model, which again corresponds well with the observations from the histological sectioning. The magnitude of the deformation varies among individuals from ≈ 0.2 mm. to 2.3 mm. Using the mean shape as reference the average, minimum and maximum
Analysis of Deformation of the Human Ear and Canal
805
Fig. 3. The mean deformation and the first 3 modes of deformation variation and the mean shape and the first 3 modes of shape variation. All +2 standard deviations.
deformation over all sets of impressions can be calculated as average = 0.4349 mm.,min ≈ 0.0 mm. and max ≈ 2.3 mm.. It is a well known fact that ear size and gender are related [7]. It is observed that the first mode of variation (P C1shape ) of our shape model is related to size (fig. 3). This is confirmed by using a logistic regression model to predict gender from this variable. Let p be the posterior probability of a male. Then the model logit(p) = α + βP C1shape is significant with significance levels less than 7%. Visual inspection of the deformation fields has led us to suspect that males tend to have larger deformations than females. Hence, a logistic regression was performed with the mean amount of deformation over the entire shape as predictor of the gender. A model without intercept was chosen since no deformation should model odds 50/50 between genders. The resulting model is significant at a 4% level, confirming the hypothesis. The modelled difference between genders in deformation is most likely related to the differences in the male and female mandible. Such discrimination has been reported by Giles [16] and Graw [17] using the size and strength of the mandible. 6.1
Shape Related to Deformation
To investigate if shape and deformation are related, the deformation and the shape models are investigated (see fig. 3). Comparison of the first 5 modes explaining 82% of the total variation of the deformation model and the first 7 modes explaining 80% of the total variation of the shape model are made. The number of modes are found using parallel analysis by Horn [18]. The first mode of deformation variation, containing primarily size of the deformation, cannot be explained by any of the 7 modes from the shape model. However, the next 3 modes of variation can. The second mode of deformation can be interpreted as change in angle of the canal in the plane bisecting the coronal-transversal angle in relation to Concha. This mode can be modelled by the second and third
806
S. Darkner, R. Larsen, and R.R. Paulsen
mode from the shape model. These two modes represents the vertical length of the Concha and the angle of the canal in the transversal plane. The third mode of deformation is the angle of the canal in relation to Concha in the transversal plane and the shape of the canal; round or oval. This mode of deformation can be modelled by the 6th mode of shape variation. Both these models P C2def = α + β1 P C2shape + β2 P C2shape , and P C3def = α + β1 P C6shape are significant at the 1% level. The 4th deformation mode is the bending of the canal in the transversal plane and the deformation of the Intertragic notch and can be related to the roundness of Concha and the angle of the canal in the coronal plane. The model has the form, P C4def = α + β1 P C4shape and is significant at the 1% level. The first mode of shape variation mode can explain the 5th mode of deformation variation with significance p < 0.08. Combined with the 6th mode the significance becomes p < 0.06. 6.2
Analysis of the Deformation Field
Now we will examine the deformation field for an individual ear in more detail, i.e. at every vertex of the triangulated surface representing the ear we will test if a significant deformation occurs as a function of opening the mouth. This involves simultaneous testing of ≈ 10000 hypotheses. In order to do this we will employ Efrons [19] procedure for estimating the empirical null hypothesis for each individual. Using a i.i.d. normal assumption for the deformation vector elements under the null hypothesis - Hi - we have that the magnitude of the deformation vector for the ith vertex Yi follow a σχ(3) distribution. For each vertex we can transform the Yi ’s to z-values (Φ is the standard normal cumulative distribution), where zi = Φ−1 (prob{Yi > yi },
zi |Hi ∈ N (0, 1)
(1)
The latter part is the theoretical null hypothesis. In Fig. 4(a) we show as a bar plot the histogram of z’s from an experiment where two impressions have been taken from the same ear with closed mouth. The heavy right tail was expected due to shifting of ear wax, hair etc. We approximate the histogram with a smoothing spline and extract the maximum point and the full width half maximum of the (first) major top. This provides us with robust estimates of the mean and standard deviation under the null hypothesis. The empirical null is zi |Hi ∈ N (−0.40, 0.60). The reasons for the difference between the theoretical and empirical nulls may be hidden correlations or the presence of genuine but uninteresting small effects. Assuming that for each ear a large proportion of the vertices will exhibit no change due to mouth opening we can use a similar procedure to identify those vertices where significant change occur. In Fig. 4(b-c) the null is estimated from the first major peak and 95% and 99% quantiles are determined. The corresponding vertices where significant changes occur are shown in Fig. 4(d-f). We see that significant changes occur in Cavum Concha, at Tragus and Anti-Tragus and in the canal; in full correspondence with our expectations and precisely
Analysis of Deformation of the Human Ear and Canal
1200
800
800
700
700
600
600
500
500
400
400
300
300
200
200
807
1000
800
600
400
200 100
100
0 −4
−3
−2
−1
0
1
2
3
4
5
6
7
0 −6
−5
−4
−3
−2
−1
0
0 −6
−5
−4
−3
−2
a
b
c
d
e
f
−1
0
1
Fig. 4. (a) Histogram of the z-values under the null hypothesis and robust estimation of the normal parameters using a smoothing spline. (b-c) Histograms of z − values, robust null estimation and 95% and 99% quantiles for two ears.(d) the 95% and 99% quantiles of the ear in b. (e-f) p-value maps of b and c respectively.
where most hearing aids are situated in the ear. This confirms that movement of the mandible causes discomfort for some hearing-aid users.
7
Conclusion
We have shown that it is possible to consistently locate regions of significant deformation caused by movement of the mandible in all subjects. The occurrence of the deformations corresponds well to the physiology of the human ear, in terms of soft tissue and bony structures. The locations that are deforming the most are exactly where hearing aids normally are situated in the ear. Hence, we can confirm that movement of the mandible can cause discomfort in the ear when wearing a hearing aid. Furthermore, we have shown that males in general are more prone to deformation of the ear and canal and that the common assumption of men having larger ears than women seems to hold. Finally we have shown several significant relations between the shape of the ear and canal and the deformation occurring during movement of the mandible. We can explain 50% of the variation of the deformation using the first 6 modes of variation from the shape model. Our findings are very significant, even when considering the limited size of the data set. The features described by the modes of variation in the shape model can be used as guidelines to detect potentially problematic cases. They point to a problem caused by a specific kind of deformation, thus enabling the dispenser or hearing-aid manufacturer to take appropriate actions to eliminate the problem.
808
S. Darkner, R. Larsen, and R.R. Paulsen
References 1. Kochkin, S.: MarkeTrak V: ”Why my hearing aids are in the drawer”: The consumers perspective. The Hearing Journal 53(2), 34–39 (2000) 2. Sorensen, M.S., Dobrzeniecki, A.B., Larsen, P., Frisch, T., Sporring, J., Darvann, T.A.: The visible ear: A digital image library of the temporal bone. ORL 64, 378– 381 (2002) 3. Oliviera, R., Hammer, B., Stillman, A., Holm, J., Jons, C., Margolis, R.: A look at ear canal changes with jaw motion. Ear and Hearing 13(6), 464–466 (1992) 4. Oliviera, R., Babcock, M., Hoeker, M.V.G., Parish, B.: The dynamic ear canal and its implications: The problem may be the ear, and not the impression. Hear Reviews 12(2), 18–19 (2005) 5. Grenness, M.J., Osborn, J., Weller, W.L.: Mapping ear canal movement using areabased surface matching. JASA 111(2), 960–971 (2002) 6. Pirzanski, C.: Despite new digital technologies, shell modelers shoot in the dark. The Hearing Journal 59(10), 28–31 (2006) 7. Paulsen, R.R., Larsen, R., Laugesen, S., Nielsen, C., Ersbøll, B.K.: Building and testing a statistical shape model of the human ear canal. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, Springer, Heidelberg (2002) 8. Darkner, S., Vester-Christensen, M., Larsen, R., Paulsen, R.R., Nielsen, C.: Automated 3D rigid registration of open 2D manifolds. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, Springer, Heidelberg (2006) 9. Darkner, S., Vester-Christensen, M., Larsen, R., Paulsen, R.R.: Evaluating a method for rigid registration. In: SPIE Medical Imaging 2007 (February 2007) 10. Bærentzen, J., Aanæs, H.: Signed distance computation using the angle weighted pseudo-normal. IEEE Transactions on Visualization and Computer Graphics 11(3), 243–253 (2005) 11. Whitted, T.: An improved illumination model for shaded display. Commun. ACM 23(6), 343–349 (1980) 12. Cootes, T., Marsland, S., Twining, C., Smith, K., Taylor, C.: Groupwise diffeomorphic non-rigid registration for automatic model building, vol. IV, pp. 316–327 (2004) 13. Vester-Christensen, M., Erbou, S.G., Darkner, S., Larsen, R.: Accelerated 3D image registration. In: SPIE Medical Imaging 2007 (February 2007) 14. Baker, S., Matthews, I.: Lucas-Kanade 20 years on: A unifying framework. International Journal of Computer Vision 56(3), 221–255 (2004) 15. Dryden, I.L., Mardia, K.: Statistical Shape Analysis. Wiley, Chichester (1998) 16. Giles, E.: Sex determination by discriminant function analysis of the mandible. American Journal of Physical Anthropology 22(2), 129–135 (1964) 17. Graw, M.: Significance of the classical morphological criteria for identifying gender using recent skulls, vol. 3 (January 2001) 18. Horn, J.L.: A rationale and test for the number of factors in factor analysis. Psychometrika 30, 179–185 (1965) 19. Efron, B.: Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association 99, 96–104
Shape Registration by Simultaneously Optimizing Representation and Transformation Yifeng Jiang1 , Jun Xie2 , Deqing Sun1 , and Hungtat Tsui1 1
Department of Electronic Engineering, The Chinese University of Hong Kong {yfjiang,dqsun,httsui}@ee.cuhk.edu.hk 2 School of Computer Science, University of Central Florida, USA [email protected]
Abstract. This paper proposes a novel approach that achieves shape registration by optimizing shape representation and transformation simultaneously, which are modeled by a constrained Gaussian Mixture Model (GMM) and a regularized thin plate spline respectively. The problem is formulated within a Bayesian framework and solved by an expectation-maximum (EM) algorithm. Compared with the popular methods based on landmarks-sliding, its advantages include: (1) It can naturally deal with shapes of complex topologies and 3D dimension; (2) It is more robust against data noise; (3) The registration performance is better in terms of the generalization error of the resultant statistical shape model. These are demonstrated on both synthetic and biomedical shapes.
1
Introduction
Shape registration has a long history being a key problem for shape analysis in various disciplines [1] and receives increasing interests because of the success of the data-driven deformable models [2] in computer vision and medical image analysis, where shape registration is a bottleneck in the model training stage. To construct such models is the application background of this paper. There has been a lot of previous research on shape registration, and two basic elements are often involved in various methods: 1. Shape representation. The representation of shape data can directly adopt point sets, or adopt some parametric models: parametric curves or surfaces [3], Fourier descriptors or spherical harmonic functions [4], and wavelet transform. Sometimes non-parametric models are also used: medial axes, M-reps, implicit distance functions [5], and curvature scale spaces. 2. Transformation model. The mapping between shapes is usually parameterized by transformation models. Global transformation models apply to the entire shapes, e.g. rigid, similarity (Euclidean), affine, and perspective. Local transformation models can represent non-rigid deformations, e.g. optical flow, Thin Plate Splines (TPS) [6, 7], Radial Basis Functions and Free Form Deformations (FFD) [5]. In many works, partial differential equations (PDE) are also utilized to implicitly represent the transformation model. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 809–817, 2007. c Springer-Verlag Berlin Heidelberg 2007
810
Y. Jiang et al.
To our best knowledge, most existing approaches treat representation first and separately. For example, in [4], the shape is first mapped onto a topologically equivalent parameter space before seeking for the parameters of a contiguous mapping. In [5], the shapes are first represented in the space of distance functions, then the global and local registration are sought by maximizing different shape similarity measures. This leaves the seeking for correspondence to optimizing transformation only. In other words, the shape representations are chosen to fit individual shapes best, without considering the complexity of transformation. From the view of the construction of shape model, which couples both representation and transformation, this may lead to a sub-optimal result. As an exception, the popular landmark-sliding algorithms [6, 3, 8, 9] revise the shape representation together with the transformation, both of which are established on a set of landmarks, thus has the potential to obtain the optimal shape model. However they have two limitations. 1. They require input shapes to be parameterized, so that the trajectories for landmarks can be manipulated. For 3D shapes or shapes with complicated topologies (multi-parts, self-intersection), this is difficult. 2. They are vulnerable to noise, because every piece of landmark trajectory is interpolated by a small number of points. Rough trajectories tend to make the cost function of registration much more nonlinear. In this paper, we propose a new approach which Simultaneously Optimizes Shape Representation and Transformation (SORT for short) to overcome the limitations mentioned above. It formulates the shape registration problem in a Bayesian framework and solves the problem by an EM algorithm. SORT is similar to landmark-sliding algorithm in the sense of using a piece-wise linear representation for shapes, but instead of seeking for a set of landmarks, SORT searches for a set of short segments, on which the correspondence is established. SORT straightforwardly works on point sets without any need of parameterization, because a set of segments is able to model (represent) a point set directly in the 2D or 3D space.
2
Problem Statement
Given M shapes {Sm , m = 1 . . . M } and each shape consists of a set of points Sm = {smi , i = 1 . . . Nm }, where smi is the coordinate of one point. Our goal is to find an appropriate piece-wise representation with L segments for each Sm . These segments are corresponding across the whole shape group. In 2D cases, each segment can be parameterized by a pair of points (u, v), so Sm is represented by Xm = {(umj , vmj ), j = 1, . . . , L}. In 3D cases, each segment can be parameterized by three points. Since all the shapes consist of corresponding segments, we assume there is a prototype shape, S0 , that can generate all shapes in the group by certain transformations. S0 is represented by X0 = {(u0j , v0j ), j = 1, . . . , L}. Then, the representation for each Sm is approximately recovered by Xm = Tm (X0 )+ t , where
Shape Registration
811
Tm is the non-rigid transformation between X0 and Xm , and t is noise. Next, line segments are reconstructed from Xm = {(umj , vmj ), j = 1, . . . , L} by a certain rule L, e.g. if (umj , vmj ) represents two ends of a line segment, the rule is simply connecting them. Then all the points of {Sm } can be obtained by uniformly sampling (denoted as U) each segments plus a certain noise s . The above process can be considered as a generative model : Sm = {smi } = U · L · (Tm (X0 ) + t ) + s , where t ∼ N (0, I · σt ), s ∼ N (0, I · σs ).(1)
I is the 2 × 2 identity matrix, and σs is a scale indicating the data’s noise level. Now the shape registration problem can be formally stated as finding X0 , {Tm } and {Xm }, which makes the given {Sm } most probable under the generative model. In a Bayesian framework, we infer them by maximizing the a posterior probability (MAP). Under the i.i.d. assumption, we have: ˆ 0 , {Tˆm , X ˆ m }) = arg (X
max
X0 ,{Tm ,Xm }
p(X0 , {Tm , Xm }|Sm )
(2)
M
p(Sm |Xm , θm )p(Xm |X0 , Tm )p(Tm ),
= arg max m=1
where θm represents the additional parameters may be involved. Note that there is no direct statistical relationship between Sm and (X0 , Tm ), neither between X0 and {Tm }. θm and X0 are simply assumed to be uniformly distributed. We discuss the likelihood and prior items in the following sections, which are corresponding to the representation and transformation models in shape registration. 2.1
Likelihood
It is natural to use mixture model as the likelihood model since we are going to represent the data by a set of segments. Each mixture component is devoted to one segment, with parameters (umj , vmj , σs ). Recall there are Nm points in Sm , we have: Nm
p(Sm |Xm , θm ) =
Nm
p(smi |Xm , θm ) = i=1
L
πmj p(smi |umj , vmj , σs )
(3)
i=1 j=1
L where πmj is component coefficient, and πmj ≥ 0, j=1 πmj = 1. According to the generative model (1), p(smi |umj , vmj , σs ) may be further decomposed into a Gaussian and a uniform distribution between the two ends of the line segment. However, such model is not easy to be manipulated because of the uniform distribution involved. In this paper, we approximate this using a 2D Gaussian by imposing two constraints on the eigenvalues of its covariance matrix. Denoting the two eigenvalues as λ1 and λ2 and λ1 ≥ λ2 , the constraints √ are: (1) λ2 = σs2 , which means the width of segment is√equal to σs ; (2) 2ρ λ1 = l, where l is the length of line segment, and ρ = 2 in this paper. Because we prefer to use (umj , vmj , σs ) as the parameters of likelihood model, and (umj , vmj ) should be coordinates of points, we need to establish the relationship between (umj , vmj , σs ) and Gaussian’s parameters (μ, Σ). It is straightforward to set umj = μ as the center of Gaussian. As to vmj , please refer to Fig. 1, we have:
812
Y. Jiang et al.
Σmj = f (vmj , σs ) = RT ΛR ,
where
θ = arctan
y , x
cos θ sin θ σ2 0 and Λ = − sin θ cos θ 0 σs 2 1 σ 2 = (x2 + y 2 ), [x, y]T = vmj − umj α
R=
which means vmj is one point on the major axis of 2D Gaussian’s equalprob contours (ellipses), departing a certain distance from umj . The distance is set to be ασ 2 where σ 2 the maximal eigen value of Σmj , and α ∈ (0, 1).
v u v v'
Fig. 1. Using 2D Gaussian to model a segment of shape. Left: Green dots represent the shape points, modeled by the blue line segment. The ellipse is one of the Gaussian’s equalprob contour, and (u, v) is located on its major axis; Right: using a set of Gaussians to model a shape of hand.
As v and v shown in Fig. 1, there are two points along the major axis of Σmj that satisfy (4), on either sides of umj . The selection of vmj is discussed in section 3.1. We also expect a constraint on vmj , which is that the ratio of σ to σs should not exceed a certain value β, β > 1. So finally in (3), p(smi |Xm , θm ) is a constrained GMM, where p(smi |umj , vmj , σs ) is a constrained Gaussian: p(smi |umj , vmj , σs ) = s.t.
2.2
1 1
2π|Σmj | 2
1 −1 exp(− (smi − umj )T Σmj (smi − umj )) 2
min(eig(Σmj )) = σs , Σmj = f (vmj , σs ), and
σs2 ≤
1 vmj − umj 2 ≤ β · σs2 .(4) α
Priors
According to the generative model (1), p(Xm |Tm , X0 ) is a Gaussian distribution: p(Xm |Tm , X0 ) =
1 1 exp(− Xm − Tm (X0 )2 ). 2πσt 2σt
(5)
As to the transformations {Tm }, we assume them to be smooth, and we do not want any penalty on the rotation, translation and global shear, because we consider this is not the intrinsic variation of a shape. In this case the bending energy of Thin Plate Spline (TPS), Etps , can be utilized to construct the prior, as successfully applied in many previous works [6,9], by which the density function of Tm can be defined as p(Tm ) = Z1t exp(−λt Etps (Tm )). Then the total prior desnity function can be written as p(Xm , Tm , X0 ) =
1 exp −γ(Xm − Tm (X0 )2 + λEtps (Tm )) , Z
(6)
Shape Registration
813
where coefficient λ and γ are hyperparameters, and Z is a partition constant adopted to guarantee that p(Xm , X0 , Tm ) is a density function.
3
Optimization
The maximizing a posterior (MAP) problem is now formulated as: M
ˆ m }, {Tˆm }, X ˆ 0 ) = arg max ({θˆm }, {X
Nm
L
πmj pmj (smi |Xm , σs )p(Xm , Tm , X0 ).(7)
m=1 i=1 j=1
All the parameters, in both likelihood and prior items will be optimized together, or in other words, parameters for both representation and transformation models will be optimized simultaneously. Since the Gaussian mixture is involved, an EM algorithm is adopted for this task, where close-form solutions exist for πmj , {umj }, X0 and Tm , among which Tm is a regularized TPS [10]. There is no analytic solution for {vmi }, and they are optimized by gradient-based method in each maximization iteration. 3.1
Implementation
The proposed representation and transformation models both have a parameter that can naturally control how fine the models are, so we embed EM iteration in a coarse-to-fine scheme to achieve a more stable optimization. For representation model, it is σs , which represents the noise level of shape data that the constrained GMM can cope with; For deformation model it is λ, which controls how nonrigid Tm is. The implementation in this paper starts with great σsmax and λmax , then (k) gradually decrease them by certain ratios rσ and rλ respectively, until σs is (k) min less than σs . For each σs , the models are updated by the embedded EM algorithm, taking the result of previous step as the initial condition. (0) The very initial {Tm } are set to be identity transforms plus a translation between each shape centroid and their mean. This assumes that at the coarsest stage the representation of all shapes are the same except for a translation. The (0) initial {umj } are set to be evenly placed on a small circle around shape centroid. Besides, we set the initial covariance matrix of each Gaussian mixture to be isotropic, thus the equalprob of Gaussian becomes a circle, so as the possible (0) positions of vmj . Then we pick the rightmost point as vmj for all the mixture (k)
components. After first update, vmj will always be the ones resulted from EM (0)
algorithm and no selection is needed. The initial X0 is the mean of {Xm }. In all the experiments presented in this paper, λmax = Nm × M × σsmax , σsmax m is set to be the square of the size of the shape, and the final σsmin is set to be the mean square of distance between neighboring points. Ratio rσ and rλ are both (k) set to be 0.95. γ is difficult to decide and in this paper it is set to be 1/σs . For other parameters, α is not sensitive and α = 0.5; β decides how long the segments will be, and is the only parameter we adjust according to the data.
814
4 4.1
Y. Jiang et al.
Experimental Results Register 2D Shapes
We first demonstrate the registration results of SORT on 2D shapes with simple topologies. Typical landmark-sliding algorithms can also work on these shapes so a comparison can be conducted. SORT is compared with two of them. One is the MDL based algorithm implemented by Thodberg [8], and the other is arclength parameterization. In all experiments, Thodberg’s algorithm will run with 8 active nodes optimized over 40 passes, using 16, 32, or 64 landmarks (SORT will work with 8, 16 or 32 segments accordingly). In addition, a popular point cloud matching algorithm – Super Clustering-Matching (SCM) algorithm proposed by Chui [7], is also included for comparison. All algorithms are implemented using MATLAB 7.1, and run on an 1.66 GHz Intel Centrino CPU. Because our purpose for shape registration is to build shape models, we empoly the model generalization error [3] as the criterion for evaluation. In particular, the error adopted here is the fitting error between the original shape contour and the contour generated by the resultant model, which is more meaningful in practice than the distance error between landmarks points. To evaluate the results of SORT by this criterion, we take Xm = {(umj , vmj ), j = 1, . . . , L} as the “landmarks”, so the number of landmarks is twice the number of segments. Shape data under experiments include 9 synthetic bump boxes, 9 shapes of femur, and 9 silhouette profiles. All of them are obtained from Thodberg’s package [8] and widely tested in literatures. Fig. 2 shows the plain registration result for 9 bumpboxes and the variation of resulting shape model. The result is yielded by SORT using 8 segments. The equalprob ellipse of each segment is drawn on the shape (denoted by dense green dots), and the corresponding ones on different shapes have the same colors. The black and red dots on each segment denote {umj } and {vmj } respectively. The variation of resulting model shows that SORT has correctly captured the variation of bumpboxes. The registration process takes 37 seconds. The mean and standard deviation of shape generalization error for bumpboxes, are demonstrated in Fig. 3. They are compared with Thodberg’s algorithm, the
−2σ
2σ
mode 1
mode 2
mode 3
Fig. 2. Registration of 9 Bump boxes, using 8 segments. Left: Plain registration result; Right: the first three modes of variation.
Shape Registration 1.4
1.4
SORT Thod ARC SCM
1
0.8
Error
Error
1
0.6
0.2
0.2
0
2
4
6
8
0 −2
10
1
0.6
0.4
SORT Thod ARC SCM
1.2
0.8
0.4
0 −2
1.4
SORT Thod ARC SCM
1.2
Error
1.2
0.8
0.6
0.4
0.2
0
2
Modes
4
6
8
0 −2
10
0
2
Modes
1.4
1.2
1
4
6
8
10
Modes
1.4
SORT Thod ARC SCM
815
1.5
SORT Thod ARC SCM
1.2
1
SORT Thod ARC SCM
0.6
0.8
Error
Error
Error
1 0.8
0.6
0.5 0.4
0.4
0.2
0 −2
0.2
0
2
4
6
8
10
Modes
0 −2
0
2
4
Modes
6
8
10
0 −2
0
2
4
6
8
10
Modes
Fig. 3. Shape generalization errors on bump boxes. First row: From left to right, SORT uses 8, 16, and 32 segments,and other 3 algorithms use 16, 32, and 64 landmarks accordingly. Second row: Generalization error with noise of scale 0.1, 0.2, and 0.4 (from left to right). SORT uses 16 segments while other algorithms use 32 landmarks.
arc-length parameterization, and Chui’s SCM, denoted as “Thod”, “ARC” and “SCM” respectively in the figure. It is observed that SORT has considerable advantage when small number of segments is used for registration. Although SCM can find optimal transformations, its performance is not so good in terms of the shape generalization error. This is because the design of SCM has no intention to obtain an optimized representation of the original shapes. This is also true for most point cloud matching algorithms. Fig. 3 also shows the shape generalization error of registration under different scale of Gaussian noise. The scales are defined as the ratio between noise magnitude and mean distance among neighboring points of the shape. It is observed that the performance of SORT is obviously less affected by data noise. Similar results are also observed on Femur, and silhouette profiles. To demonstrate SORT’s capability of registering shapes of complicated topologies, which is difficult for landmark-sliding methods, a registration is conducted for shapes of 9 heart, each consisting of two chambers (shapes of multi-parts), and the results are given in Fig. 4. The registration process takes 173 seconds. 4.2
Register 3D Shapes
In the 3D space, the equalprob surfaces of Gaussian are ellipsoids, and the kernel for regularized TPS becomes φi (x) = x − X0i . Each segment is parameterized by 3 points (u, v, w), where u is still the center of the Gaussian, while (v, w) are located on the first and second major axes of its equalprob ellipsoid. The
816
Y. Jiang et al. −2σ
2σ
mode 1
mode 2
mode 3
Fig. 4. Registration of 9 heart chamber shapes. SORT uses 26 segments. Left: a shape consisting of 2 chambers extracted from a echocardiogram image; Right: first 3 modes of shape variation.
Fig. 5. Registration of 9 3D bumps. Left: Corresponding equalprob surfaces of segments on 4 bump shapes; Right: the shape variation (Surface shown here is interpolated from ends of the first two major axes of all the segments).
constraints for Gaussian are similar to 2D cases as in section 2.1, min(eig(Σmj )) 2 2 = σs , σs2 ≤ α1 vmj − umj ≤ β · σs2 , σs2 ≤ α1 wmj − umj ≤ β · σs2 , resulting a flattened equalprob ellipsoid, analogous to an elongated ellipse in 2D cases. But attention is needed on the ends of those open shapes, i.e. the boundaries of open surfaces. In our experiments, we handle this by registering the boundaries separately as a 3D curve, whose results are then merged to the surface registration results. Fig. 5 demonstrates the registration results on a group of synthesized 3D bumps. 12 segments are used for each surface, and 30 segments are used for each surface boundary. As seen, SORT performs very well to capture the only shape variation for this dataset.
5
Summary and Future Work
In this paper, shape registration is formulated as a Bayesian inference problem with a constrained GMM coupled by a regularized TPS based prior. This problem is solved by an algorithm called SORT. SORT basically is an EM algorithm embedded in a coarse-to-fine scheme. Extensive experimental results demonstrate that SORT has a number of advantages compared with the popular algorithms
Shape Registration
817
based on landmark-sliding. For future work, we plan to study some ways to improve the optimization, such as split and merge EM.
References 1. Dryden, I.L., Mardia, K.V.: Statistical Shape Analysis. John Wiley and Sons, West Sussex (1998) 2. Cootes, T.F., Taylor, C., Cooper, D., Graham, J.: Active shape models – their training and application. Comput Vis Image Underst 61, 38–59 (1995) 3. Davies, R.: Learning Shape: Optimal Models for Analysing Shape Variability. PhD thesis, University of Manchester (2002) 4. Meier, D., Fisher, E.: Parameter space warping: Shape-based correspondence between morphologically different objects. IEEE Trans. Med. Imaging 21(1), 31–47 (2002) 5. Huang, X., Paragios, N., Metaxas, D.N.: Shape registration in implicit spaces using information theory and free form deformations. IEEE Trans. Patt. Anal. Mach. Intell. 28(8), 1303–1318 (2006) 6. Bookstein, F.: Landmark methods for forms without landmarks: morphometrics of group differences in outline shape. Med. Image Anal. 1, 225–243 (1997) 7. Chui, H., Zhang, J., Rangarajan, A.: Unsupervised learning of an atlas from unlabeled point-sets. IEEE Trans. Patt. Anal. Mach. Intell 26, 160–173 (2004) 8. Thodberg, H.H.: Minimum description length shape and appearance models. In: Proc. IPMI, BMVA, pp. 51–62 (2003) 9. Richardson, T., Wang, S.: Shape correspondence using landmark sliding, insertion and deletion. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 435–442. Springer, Heidelberg (2005) 10. Wahba, G.: Spline models for observational data. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Landmark Correspondence Optimization for Coupled Surfaces Lin Shi1,2 , Defeng Wang1,2 , Pheng Ann Heng1,2 , Tien-Tsin Wong1,2 , Winnie C.W. Chu3 , Benson H.Y. Yeung4 , and Jack C.Y. Cheng4 1
Department of Computer Science and Engineering {lshi,dfwang,pheng,ttwong}@cse.cuhk.edu.hk 2 Shun Hing Institute of Advanced Engineering 3 Department of Diagnostic Radiology and Organ Imaging [email protected] 4 Department of Orthopaedics and Traumatology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China {byeung,jackcheng}@cuhk.edu.hk
Abstract. Volumetric layers are often encountered in medical images. Unlike solid structures, volumetric layers are characterized by double and nested bounding surfaces. It is expected that better statistical models can be built by utilizing the surface coupleness rather than simply applying the landmarking method on each of them separately. We propose an approach to optimizing the landmark correspondence on the coupled surfaces by minimizing the description length that incorporates local thickness gradient. The evaluations are performed on a set of 2-D synthetic close coupled contours and a set of real-world open surfaces, the skull vaults. Compared with performing landmarking separately on the coupled surfaces, the proposed method constructs models that have better generalization ability and specificity.
1
Introduction
Statistical shape analysis is attracting increasing research interests and efforts because of its wide application in model-based image segmentation and pathological changes detection. Landmark-based shape analysis methods, such as the active shape model, require labeling landmarks with anatomical equivalence. Although manual landmarking can generate acceptable results in 2-D, it is subjective, error-prone, and time-consuming, which limits its application in 3-D. Bookstein [1] proposed to optimize the positions of corresponding points by minimizing the bending energy between landmarks on two shapes when the landmarks are sliding on the shape boundary. The landmark correspondence problem can actually be solved in a principled way by being interpreted as an optimization problem. Kotcheff et al. [2] proposed to minimize the determinant of the covariance matrix, while Davies et al. [3] designed an objective function based on the minimum description length (MDL) principle that assumes simple descriptions generalize best. Different from landmarking methods that operate N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 818–825, 2007. c Springer-Verlag Berlin Heidelberg 2007
Landmark Correspondence Optimization for Coupled Surfaces
819
on an individual base, MDL determines the landmark positions via minimizing the description length of the information needed to transmit the training set, so that a compact description across the whole set can be derived with desired properties. Compared with manual labeling, SPHARM, and DetCov, MDL outperforms as it results in specific, generalized, and compact models [4]. Ericsson et al. [5] used the gradient descent strategy to improve the convergence speed of MDL, which makes MDL more practical in medical applications. Although MDL is recognized as the “optimal” method in landmark correspondence optimization, applying the MDL principle flexibly and creatively rather than following certain existing algorithm will achieve better results. Our insight is that the shape properties should be well-understood and exploited in designing the landmarking algorithm. For example, to find out the landmarks on shapes with meaningful curvature changes, the curvature information should be considered in the optimization [6]. Richardson et al. [7] deal with the landmarking problem for 2-D open curves by introducing a novel tailor-made method, which achieves better performance than the generic MDL.
(a)
(b)
(c)
Fig. 1. Examples of coupled-surface structures: (a) the skull; (b) the skull vault, which is the part above the red frame indicated in (a); (c) the cerebral cortex
Volumetric layers are a kind of commonly encountered shapes in medical images, such as the skin of internal organs, myocardium of the left ventricle, and the cerebral cortex. A open coupled-surface structure, i.e., the skull vault, and a closed coupled-surface structure, i.e., the cerebral cortex, are illustrated in Fig. 1. Because they contain double 3-D boundaries, automatic and accurate landmarking is of great importance in analyzing their shapes. However, existing landmarking techniques, including MDL, are designed for single-surface objects. Actually, the local thickness is hidden information, which reflects the correlation of the two surfaces and facilitates human perception of such coupled-surface shapes. Thus it is reasonable that the locations with consistent thickness changes are assigned with landmarks. In this paper, we demonstrate how the coupleness information can be properly incorporated in the description length to solve the automatic landmarking problem in coupled-surface shapes.
820
2 2.1
L. Shi et al.
Automatic Model Building for Coupled Surfaces Landmark Initialization
The training surfaces are parameterized for convenient landmarks manipulations. It is desired that when neighbouring parameterized landmarks are adjusted to the same direction, their corresponding points in the training shape move consistently. Thus the conformal mapping that preserves the local angles is preferred.
(a)
(d)
(b)
(e)
(c)
(f)
Fig. 2. Conformal mapping of the open and closed surfaces, and landmark initialization: (a) the outer skull vault surface, an open surface; (b) the conformal mapping of (a) to a unit disk; (c) uniform disk subdivision; (d) the GM/CSF interface, a closed surface; (e) the conformal mapping of (d) to a unit sphere; (f) uniform sphere subdivision
Mapping an open surface to a unit disk is achieved by minimizing the string energy of the mesh (1) E(W, Ω) = [v1 ,v2 ]∈E wv1 ,v2 Ω(v1 ) − Ω(v2 )2 , where Ω(v) is the map of vertex v. The weight wv1 ,v2 is determined via wv1 ,v2 = (tan α2 + tan β2 )/dist (v1 , v2 ), where α and β are the adjacent angles in the two triangles sharing the edge [v1 , v2 ], and dist (v1 , v2 ) is the Euclidean distance between v1 and v2 . Fig. 2 (a) and (b) give an outer skull vault mesh and its map on a unit disk. To map a closed mesh to a unit sphere, the string energy in equation (1) is still the objective to be minimized, but the weight is defined as wv1 ,v2 = 1 2 (cot μ + cot ν), where μ and ν are the opposite angles in the two triangles with
Landmark Correspondence Optimization for Coupled Surfaces
821
the common edge [v1 , v2 ]. A closed surface, i.e., the brain GM/CSF interface, and its map on a unit sphere are shown in Fig. 2 (d) and (e) respectively. After the mapping has been determined, we uniformly sample in the parameter domain and map the sample points back to the surface as the initial landmarks. The planar disk is subdivided recursively into small triangles, e.g., Fig. 2(c), and the vertices of those triangles are the sample points. Subdividing the sphere leads to a uniform sampling as Fig. 2(f) shows. 2.2
Landmark Correspondence Optimization Using MDL
Thickness Definition of Volumetric Layer. The thickness of a volumetric layer at a point on the bounding surface is the distance from that point to the opposite surface. There exist several definitions for the layer thickness, such as the closest thickness (Tclose ) and the normal thickness (Tnormal ) [8]. Tclose is the distance from a point on one surface to the closest point on the other. Tnormal is the distance from a point on one surface to the point on the other in the direction of the surface normal. To find a generic measure that performs reasonably on every type of layers is impractical. The layer thickness in this study is determined as the distance between each pair of corresponding points on the two surfaces with the same polar coordinate. This measure is named as the radial thickness (Tradial ). We illustrate the measures of Tclose , Tnormal , and Tradial on an axial plane of the skull boundary (see Fig.3). Different from Tclose and Tnormal that depend on the starting surface, the Tradial is unique and landmarks are grouped in pairs through this measurement. Description Length Minimization for Coupled-Surface Structures. The MDL is recognized as the “optimal” method for generating corresponding landmarks, since it is based on the philosophy that the simplest description generalizes best. Our point is that landmarks could have properties other than spatial locations, and these properties can also be considered to minimize the description length. Therefore, in our method, the coupled surfaces are treated as a master
Fig. 3. Different thickness definitions: (a) the coupled surfaces; (b) the closest thickness measure; (c) the normal thickness measure; (d) the proposed radial thickness measure
822
L. Shi et al.
surface and a supplementary surface. The information at each landmark in the master surface consists of both the spatial position and the thickness gradient at that landmark, i.e., [x, y, z, ξt ], where ξ is the parameter controlling the importance of the thickness gradient t . And the landmarks in the inner surface are obtained naturally through the thickness measurement. For the skull vault, we take the outer surface as the master surface because it is more dominant in determining the shape of the volumetric layer. In the cases that the inner surface is more important, the master surface can be switched to the inner surface. Actually, the shape of the supplementary surface is not discarded, as it is embeded in the “thickness” information. Once the landmark position is adjusted, the thickness at that particular landmark will be recomputed. We adopt a simplified version of the description length [6], 1 + log(λm /λcut ) if λm λcut (2) F = m Lm with Lm = λm /λcut if λm < λcut . Note that λm are the eigenvalues derived from the landmarks in the master surface. λcut can be determined by λcut = (σ/r)2 , where σ is the standard deviation of noise in the training data and r depends on the resolution of the images from which the training shapes are extracted. The landmark positions are adjusted by locally warping the parameterization inside Gaussian kernel regions. The magnitude of the adjustment is proportional to the distance to each kernel center. The optimization is implemented by the gradient descent strategy. Suppose matrix L contains landmarks on the training shapes as columns, k is the number of landmarks in each shape, and s is the number of shapes in the training set. Since each vertex on the mesh contains both spatial position and thickness gradient value at that point, the dimension 1 (L − L), where L is the matrix with of the matrix L is 4k × s. Let A = √s−1 all columns set to the mean shape x. Using the singular value decomposition (SVD), the matrix A can be written as A = UDV T . U and V are columnorthogonal matrices, and D is a diagonal matrix. Since the mesh to be analyzed in this study is 2-manifold, two variables (θ, φ) are involved in the disk or sphere parameter domain. Take the nth landmark of the j th sample for instance, we have the landmark movement (Δθ, Δφ) as follows, 4n ∂aij ∂Lm ∂F · ∂θnj = (3) Δθ = ∂θ m i=4n−3 ∂a nj ij ∂Lm 4n ∂aij ∂F Δφ = ∂φ · ∂φnj = i=4n−3 , (4) m ∂aij nj where ∂Lm = ∂aij
2uim vjm /dm 2dm uim vjm /λcut
if λm λcut . if λm < λcut
(5)
uim and vjm are the elements of the matrices U and √ V respectively. dm is the element of the diagonal matrix D, and it is equal to λm . The surface gradients ∂aij ∂aij ( ∂θnj , ∂φnj ) are estimated by the finite difference.
Landmark Correspondence Optimization for Coupled Surfaces
3
823
Experimental Evaluation and Comparison
The quality of landmark correspondence is evaluated by the performance of the resultant model. Given different number of modes M , the generalization ability is the ability to describe the object that is not included in the training set, because a good shape model should not be overfitted by training samples. Its error G(M ) is usually calculated as the averaged leave-one-out error. The specificity reflects if a model only generates samples similar to training shapes. Its error S(M ) can be estimated by the averaged distances between samples newly generated with the model and the closest training shape. In our experiment, 10,000 test samples are generated. The parameter σ is set to 0.3, r is 100, and ξ is 1.0. 3.1
Results on the Synthetic Dataset
A set of 50 samples of a simple 2-D shape with varying thickness values at different positions are generated. Fig. 4 shows the landmarking results of the proposed method (MDL-thickness) and the MDL performed on two boundaries separately (MDL-separate) on three of them. For each training shape, the number of landmarks on either the inner or the outer surface is set to 14. The results show that landmarks obtained by MDL-thickness are in pairs and are located in the regions where the thickness values reach the extrema. However, landmarks obtained by MDL-separate are equally spaced in the outer contour, while in the inner contour, they are located at positions with small spatial group changes. The MDL considering local curvature [6] can place the landmarks onto the “peaks” and “valleys” in the inner contour because large curvatures are
Fig. 4. Landmarking results on three training shapes in the synthetic dataset: (a) result of MDL-separate; (b) result of MDL-thickness
824
L. Shi et al.
detected there. Since the outer contour does not have any curvature change, the result on the outer contour will be the same as that from MDL-separate. Generalization errors and specificity errors of MDL-separate and MDL-thickness using different numbers of modes are plotted in Fig. 5 (a) and (b) respectively. It can be observed that both G(M ) and S(M ) of MDL-thickness are smaller than those of MDL-separate when various numbers of modes are chosen. 3.2
Evaluation on the Real Skull Vault Dataset
The skull volumes of 18 subjects were segmented from the head CT data collected in the Prince of Wales Hospital, Hong Kong. The field of view of the CT data is 512 × 512 and the voxel size is 0.49mm × 0.49mm × 0.63mm. The skull vault is the upper part of the skull and is an open coupled-surface structure. A total of 578 corresponding landmarks are determined using MDL-thickness and MDLseparate respectively. We plot the quality measures G(M ) and S(M ) of the models built with MDL-separate and MDL-thickness under different numbers of modes in Fig. 5 (c) and (d). It can be observed that both G(M ) and S(M ) of the model built with MDL-thickness are smaller than those built with the MDL-separate.
(a)
(b)
(c)
(d)
Fig. 5. The generalization error and specificity error of MDL-separate and MDLthickness on the synthetic dataset and the real skull vault dataset
Landmark Correspondence Optimization for Coupled Surfaces
4
825
Conclusion
This paper describes a generic automatic landmarking method for structures with coupled surfaces by minimizing the description length. In this method, the local thickness gradient is treated as an extra property of each landmark, and thus the positions with group-wise consistent thickness changes are implicitly favored. Once the landmark on one surface is determined, its counterpart on the other surface can be found directly. The optimization converges fast as the gradient descent method is used. The quality of the models constructed from our proposed method are evaluated and compared with those obtained by treating the coupled surfaces as independent. The evaluation results show the advantage of considering thickness information in landmarking volumetric layers.
Acknowledgement The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project no. CUHK4453/06M) and CUHK Shun Hing Institute of Advanced Engineering. This work is also affiliated with the Virtual Reality, Visualization and Imaging Research Center at The Chinese University of Hong Kong as well as the Microsoft-CUHK Joint Laboratory for Human-Centric Computing and Interface Technologies.
References 1. Bookstein, F.L.: Landmark methods for forms without landmarks: morphometrics of group differences in outline shape. Medical Image Analysis 1(3), 225–244 (1997) 2. Kotcheff, A., Taylor, C.: Automatic construction of eigenshape models by direct optimization. Medical Image Analysis 2(4), 303–314 (1998) 3. Davies, R., Twining, C., Cootes, T., et al.: A minimum description length approach to statistical shape modelling. IEEE Trans. Med. Imaging 21, 525–537 (2002) 4. Styner, M., Rajamani, K., et al.: Evaluation of 3d correspondence methods for model building. In: Taylor, C.J., Noble, J.A. (eds.) IPMI 2003. LNCS, vol. 2732, pp. 63–75. Springer, Heidelberg (2003) 5. Ericsson, A., Astr¨ om, K.: Minimizing the description length using steepest descent. In: British Machine Vision Conference, Norwich, pp. 93–102 (2003) 6. Thodberg, H.H., Olafsdottir, H.: Adding curvature to minimum description length shape models. In: British Machine Vision Conference, Norwich, pp. 251–260 (2003) 7. Richardson, T., Wang, S.: Open-curve shape correspondence without endpoint correspondence. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 17–24. Springer, Heidelberg (2006) 8. MacDonald, D., Kabani, N., et al.: Automated 3-d extraction of inner and outer surfaces of cerebral cortex from MRI. NeuroImage 12(3), 340–356 (2000)
Mean Template for Tensor-Based Morphometry Using Deformation Tensors Natasha Lepor´e1 , Caroline Brun1 , Xavier Pennec2 , Yi-Yu Chou1 , Oscar L. Lopez3 , Howard J. Aizenstein4 , James T. Becker4, Arthur W. Toga1 , and Paul M. Thompson1 1
2
Laboratory of Neuro Imaging, UCLA, Los Angeles, CA 90095, USA Asclepios Research Project, INRIA Sophia-Antipolis, 2004 route des Lucioles 06902 Sophia-Antipolis Cedex, France 3 Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213 USA 4 Department of Neurology, University of Pittsburgh, Pittsburgh, PA 15213 USA
Abstract. Tensor-based morphometry (TBM) studies anatomical differences between brain images statistically, to identify regions that differ between groups, over time, or correlate with cognitive or clinical measures. Using a nonlinear registration algorithm, all images are mapped to a common space, and statistics are most commonly performed on the Jacobian determinant (local expansion factor) of the deformation fields. In [14], it was shown that the detection sensitivity of the standard TBM approach could be increased by using the full deformation tensors in a multivariate statistical analysis. Here we set out to improve the common space itself, by choosing the shape that minimizes a natural metric on the deformation tensors from that space to the population of control subjects. This method avoids statistical bias and should ease nonlinear registration of new subjects data to a template that is ’closest’ to all subjects’ anatomies. As deformation tensors are symmetric positive-definite matrices and do not form a vector space, all computations are performed in the log-Euclidean framework [1]. The control brain B that is already the closest to ’average’ is found. A gradient descent algorithm is then used to perform the minimization that iteratively deforms this template and obtains the mean shape. We apply our method to map the profile of anatomical differences in a dataset of 26 HIV/AIDS patients and 14 controls, via a log-Euclidean Hotelling’s T 2 test on the deformation tensors. These results are compared to the ones found using the ’best’ control, B. Statistics on both shapes are evaluated using cumulative distribution functions of the pvalues in maps of inter-group differences.
1
Introduction
Tensor-based morphometry (TBM) is an increasingly popular method to study differences in brain anatomy statistically [23],[5], [22]. In TBM, a non-linear registration algorithm is used to align a set of images to a common space, and a statistical analysis is typically performed on the Jacobian determinants (local N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 826–833, 2007. c Springer-Verlag Berlin Heidelberg 2007
Mean Template for Tensor-Based Morphometry Using Deformation Tensors
827
expansion factors) of the deformation fields generating the transformation. Most commonly, one of the control subjects’ images, or a high-resolution single subject MRI atlas [10], is selected as the reference to which all the other images are mapped. To avoid biases induced by the choosing a single individual as a template, methods for creating an average image using the entire set of controls have also been developed. For instance in [11], a mean template is defined by transforming one of the control images using the average of the displacement fields resulting from its registration to all other controls. A similar approach was also adopted in [9], where the average was taken with respect to both the deformation and the intensities of the reference images. Other investigators have advocated a more computationally intensive ’targetless’ normalization approach, in which all images in a group are matched to each other pairwise, and each image’s mean vector field so obtained is applied to it before averaging the deformed images across subjects [24], [22], [28], [27], [12]. Groupwise registration is increasingly common to avoid systematic confounding effects and bias associated with aligning images to a specific individual brain, which can arise when the geometry and intensities of the target image resemble some members of the population more √ than others. In [14], the deformation tensors J t J were used to perform statistics in TBM, where J is the local Jacobian matrix of the transformation. This method outperformed det J, the most commonly used scalar measure of deformation, for mapping the profile of brain atrophy associated with HIV/AIDS. Specifically, multivariate analysis of the local tensor, using a manifold version of the Hotelling T 2 test, was much more sensitive to group differences than det J. The determinants represent local volume differences across subjects, while the deformation tensors reflect local differences in shape, orientation, and volume. When statistics are performed on the deformation tensors in TBM, a consistent way to define the average image is as one that minimizes and appropriate norm on the deformation tensors generated using that image as a registration target. For example, when a set of control subjects’ images is mapped to a template, it is reasonable to expect the mean deformation tensor to be identically zero everywhere after log transformation, or, if that is not possible, at least having minimum mean-squared error in a relevant tensor norm. Here we develop an approach to achieve this, by using a log-Euclidean metric on the space of tensors; the regularizer then has a form that is consistent with the tensor statistics used ultimately for mapping systematic effects on anatomy. In related work on geodesic shooting [17] and large-deformation diffeomorphic metric mapping (LDDMM) [2], mean templates are defined that minimize the geodesic distance to a population of anatomies. These geodesic distances are Riemannian metrics formulated in terms of integrals of ||Lv||, where L is a self-adjoint (elliptic) differential operator regularizing the deformation, v is its velocity field, and ||.|| is a norm, such as the simple L2 norm or the Hα1 norm used in the Camassa-Holm equation for modeling solitons [29]. Lorenzen et al. [15] [16] generated a representative common template from a multimodality image set using large-deformation mappings and registration with the Kullback-Leibler
828
N. Lepor´e et al.
divergence. Gerig et al.[7] generalized the mean anatomical template estimation to accommodate repeated measures data, e.g. images collected longitudinally from a pediatric population. In this work, we set out to find a transformation ΦBA from an initial brain B selected from a set of control subjects, to an average brain A. The average brain image intensity is defined as IA = IB ◦ ΦBA . B is taken as the reference image, and we seek the transformation of its geometry that minimizes the bias on the deformation tensors: argminΦBA E(ΦBA ) where E(ΦBA ) is the total size of the deformation tensors E= d(Si , Id)2
(1)
(2)
i
Here the Si represent the square of the deformation tensors from image i, d(., .) is the distance, and Id is the identity. In practice, to make calculations easier, we actually compute the inverse transformation, ΦAB . (Note that this formulation could be extended to consider intensity matching as well, as in [15] [16] where the sum of an intensity matching energy and a deformation energy is minimized). The deformation tensors are constrained to be positive-definite matrices, and form a conical submanifold of the space of square matrices. An intrinsic definition of d(., .) is needed for (2). Recently, Arsigny et al. [1] presented a log-Euclidean framework to perform computations in this space. Distances are computed after applying the matrix logarithm transformation, which transports the deformation tensors to the tangent space at the origin, where simple matrix operations can be used. When log transforms are used, even on the scalar Jacobian determinant, several sources of bias are avoided in the resulting statistics (which can lead to skewness and non-zero mean even under the null hypothesis [13]). This method was used in [14] to compute statistics on the deformation tensors in TBM. In the log-Euclidean framework, the distance between two elements of the space S1 and S2 is given by d(S1 , S2 ) = || log S1 − log S2 ||, where ||.|| denotes a norm, and log is the matrix logarithm. Here we will use [1] d(S1 , S2 ) = (Trace(log S1 − log S2 )2 )1/2 . Taking into account (3), (2) becomes 2 2 E= || log Si || d x = T r(log Si )2 d2 x i
(3)
(4)
i
that is, the size of the Si given a transformation of the chosen image ΦAB . We used a fluid registration algorithm [6] to register the images. The code was accelerated using a convolution filter derived from the Green’s function of
Mean Template for Tensor-Based Morphometry Using Deformation Tensors
829
the differential operator in the fluid equation [3] [8]. ΦAB was then computed using gradient descent. As our initial brain B, we selected the control subject for which (4) was minimal. In the next section, we describe our gradient descent algorithm. Our method is then applied to perform a TBM analysis of the corpus callosum in a group of 26 AIDS patients and 14 matched controls.
2
Method
A gradient descent method in the log-Euclidean framework was outlined in [19] and [20] for the log-Euclidean elasticity. Here we use the general philosophy described in those references. However, a major added complication is that our method requires two consecutive registrations, from A to B and from B to i. The transformation ΦAi (rA ) from A to image i at point rA is given as a function of the deformation fields D by [11] ΦAi (rA ) = ΦBi ◦ ΦAB (rA ) = rA + DAB (rA ) + DBi (rA + DAB (rA )) The value of Si from A to image i is thus given by Si (ΦAB ) = ∂α (ΦBi ◦ ΦAB )∂α (ΦBi ◦ ΦAB )t . α
Using Si (ΦAB + u) =
∂α (ΦBi ◦ (ΦAB + u))∂α (ΦBi ◦ (ΦAB + u))t
α
=
∂α (ΦBi ◦ ΦAB +
α
uk (∂k ΦBi ) ◦ ΦAB + ...)
k
∂α (ΦBi ◦ ΦAB +
uk (∂k ΦBi ) ◦ ΦAB + ...)t ,
k
we find the directional derivative of Si in the direction of the vector field u [∂α (ΦBi ◦ ΦAB )][∂α ( uk (∂k ΦBi ) ◦ ΦAB )]t ∂u Si (ΦAB ) = α
k
+[∂α ( uk (∂k ΦBi ) ◦ ΦAB )][∂α (ΦBi ◦ ΦAB )]t k
The directional derivative of the energy gradient for image i is then: 2 ∂u T r(log Si ) = 2 T r(log(Si )Si−1 ∂u Si ) [∂α (ΦBi ◦ ΦAB )][∂α ( uk (∂k ΦBi ) ◦ ΦAB )]t ) = 4 T r(Z α
+4
T r(Z
α
where Z ≡
log(Si )Si−1 .
k
[∂α ( uk (∂k ΦBi ) ◦ ΦAB )][∂α (ΦBi ◦ ΦAB )]t ) k
Integrating by parts, we finally obtain
830
N. Lepor´e et al.
∂u
T r([ ∂α (Z∂α (ΦBi ◦ ΦAB ))][ uk (∂k ΦBi ) ◦ ΦAB ]t )
uk (∂k ΦBi ) ◦ ΦAB ][ ∂α (Z∂α (ΦBi ◦ ΦAB ))]t ) T r([
T r(log Si ) = −4 2
−4
α
k
k
α
The total derivative term cancels as the image intensity and thus ΦAB is zero near enough to the boundary. This can be guaranteed in the general case by padding the image with zeros. Finally, we obtain the gradient of the energy for image i as ∂α (Z∂α (ΦBi ◦ ΦAB )) | (∇ΦBi ) ◦ ΦAB > . (5) ∇Ei = −4 < α
where < .|. > denotes the usual scalar product in R3 . 2.1
Numerical Implementation
As an initial condition for the gradient descent, we moved the chosen template B to the location of the average deformation field from B to all other controls. This definition of a ‘vector mean’ template has been adopted by others [11], but here we optimize it using a further deformation to yield a template with minimal energy in the multivariate log-Euclidean space. Using a finite difference scheme in the computation of the gradient yields poor results, as a small number of voxels with large gradient values can end up driving the computation, and in such cases most of the image will change very slowly. We remedied this problem using a multi-resolution scheme, for which all derivatives in (5) were computed through convolution with a Gaussian filter, for which the variance was reduced at each resolution step. To improve the speed of convergence, the positions were updated after the computation of the descent direction for each i. 2.2
Data
Twenty-six HIV/AIDS patients (age: 47.2 ± 9.8 years; 25M/1F; CD4+ T-cell count: 299.5 ± 175.7 per μl; log10 viral load: 2.57 ± 1.28 RNA copies per ml of blood plasma) and 14 HIV-seronegative controls (age: 37.6 ± 12.2 years; 8M/6F) underwent 3D T1-weighted MRI scanning; subjects and scans were the same as those analyzed in the cortical thickness study in [25], where more detailed neuropsychiatric data from the subjects is presented. All patients met Center for Disease Control criteria for AIDS, stage C and/or 3 (Center for Disease Control and Prevention, 1992), and none had HIV-associated dementia. All AIDS patients were eligible to participate, but those with a history of recent traumatic brain injury, CNS opportunistic infections, lymphoma, or stroke were excluded. All patients underwent a detailed neurobehavioral assessment within the 4 weeks before their MRI scan, involving a neurological examination, psychosocial interview, and neuropsychological testing, and were designated as having no,
Mean Template for Tensor-Based Morphometry Using Deformation Tensors
831
mild, or moderate (coded as 0, 1, and 2 respectively) neuropsychological impairment based on a factor analysis of a broad inventory of motor and cognitive tests performed by a neuropsychologist [25]. All subjects received 3D spoiled gradient echo (SPGR) anatomical brain MRI scans (256x256x124 matrix, TR = 25 ms, TE = 5ms; 24-cm field of view; 1.5-mm slices, zero gap; flip angle = 40o ) as part of a comprehensive neurobehavioral evaluation. The MRI brain scan of each subject was co-registered with a 9parameter transformation to the ICBM53 average brain template, after removal of extracerebral tissues (e.g., scalp, meninges, brainstem and cerebellum). The corpus callosum of each subject was hand-traced [26], using an interactive segmentation software. The traces were treated as binary objects (1 within the CC, 0 outside), as we wished to evaluate anatomical differences in a setting where intensity was held constant (see Lorenzen et al. [15] [16], where a radiometric term based on information theory was included in the template estimation equations, but tensor statistics were not evaluated).
3
Results
The total energy was found to be much lower in the case of the mean template (EA = 3.027 × 103 vs EB = 3.794 × 103 ). T 2 statistics identifying group differences in our dataset are shown in Fig. 1a. The cumulative distribution function of the p-values is plotted in Fig. 1b against the p-values that would be expected under the null hypothesis, for both templates. For null distributions (i.e. no group difference detected), these are expected to fall along the x = y line, and larger deviations from that curve represent larger effect sizes. The registration to the average brain gives statistics similar to the one to one individual. Thus we do not sacrifice any of the signal by using our averaging procedure. Furthermore, the average template can be used to remove potential interaction between the 0
1 0.9
−0.5
0.8
−1
0.7 0.6
−1.5
0.5 0.4
−2
0.3
−2.5
0.2 0.1
−3
0
0
0.2
0.4
0.6
0.8
1
Fig. 1. Left: Voxelwise p-values computed from the Hotelling’s T 2 test on the deformation tensors for the average template. The scale shows values of log 10 (p). Right: Cumulative distribution of p-values vs the corresponding cumulative p-value that would be expected from a null distribution for the average shape and the best brain. Pink curve: average brain, blue curve: best individual brain. Dotted line: x = y curve (null distribution).
832
N. Lepor´e et al.
registration accuracy and diagnosis that can occur when using an individual brain as a registration target.
4
Conclusion
In this paper, we derive a new way to compute mean anatomical templates by minimizing a distance in the space of deformation tensors. The resulting templates may be used for TBM, in which statistical analyses are performed on the deformation tensors mapping individual brains to the target image [14]. Because the deformation distance to the template is smaller with a tensor-based mean template, there is a greater chance that intensity-based registrations of individual datasets will not settle in nonglobal minima that are far from the desired correspondence field. In neuroscientific studies, this could be helpful in detecting anatomical differences, for instance in groups of individuals with neurodegenerative diseases, or in designs where the power of treatment to counteract degeneration is evaluated. Two caveats are necessary regarding the interpretation of this data. First, strictly speaking we do not have ground truth regarding the extent and degree of atrophy or neurodegeneration in HIV/AIDS. So, although an approach that finds greater disease effect sizes is likely to be more accurate than one that fails to detect disease, it would be better to compare these models in a predictive design where ground truth regarding the dependent measure is known (i.e., morphometry predicting cognitive scores or future atrophic change). Second, it may be more appropriate to use the mean shape anatomical template derived here in conjunction with registration algorithms whose cost functions are explicitly based on the log-transformed deformation tensors, such as those found for instance in [4] and [19]. To do this, we are working on a unified registration and statistical analysis framework in which the regularizer, mean template, and voxel-based statistical analysis are all based on the same log-Euclidean metric.
References 1. Arsigny, V., et al.: Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Mag. Res. in Med. 56, 411–421 (2006) 2. Beg, M.F., et al.: Computing large deformation metric mappings via geodesic flow on diffeomorphisms. Int. J. of Comp. Vision 61, 139–157 (2005) 3. Bro-Nielsen, M., Gramkow, C.: Fast fluid registration of medical images. Visualization in Biomedical Computing, 267–276 (1996) 4. Brun, C., et al.: Comparison of Standard and Riemannian Elasticity for TensorBased Morphometry in HIV/AIDS. In: MICCAI workshop on Statistical Registration: Pair-wise and Group-wise Alignment and Atlas Formation (submitted, 2007) 5. Chiang, M.C., et al.: 3D pattern of brain atrophy in HIV/AIDS visualized using tensor-based morphometry. Neuroimage 34, 44–60 (2007) 6. Christensen, G.E., et al.: Deformable templates using large deformation kinematics. IEEE-TIP 5, 1435–1447 (1996) 7. Gerig, G., et al.: Computational anatomy to assess longitudinal trajectory of the brain. In: 3DPVT, pp. 1041–1047 (2006)
Mean Template for Tensor-Based Morphometry Using Deformation Tensors
833
8. Gramkow, C.: Registration of 2D and 3D medical images, Master’s thesis, Danish Technical University, Copenhagen, Denmark (1996) 9. Guimond, et al.: Average brain models: a convergence study. Comp. Vis. and Im. Understanding 77, 192–210 (1999) 10. Kochunov, P., et al.: An optimized individual target brain in the Talairach coordinate system. Neuroimage 17, 922–927 (2003) 11. Kochunov, P., et al.: Regional spatial normalization: toward an optimal target. J. Comp. Assist. Tomogr. 25, 805–816 (2001) 12. Kochunov, P., et al.: Mapping structural differences of the corpus callosum in individuals with 18q deletions using targetless regional spatial normalization. Hum. Brain Map. 24, 325–331 (2005) 13. Leow, A.D., et al.: Statistical properties of Jacobian maps and inverse-consistent deformations in non- linear image registration. IEEE-TMI 26, 822–832 (2007) 14. Lepor´e, N., et al.: Multivariate Statistics of the Jacobian Matrices in Tensor-Based Morphometry and their application to HIV/AIDS. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, Springer, Heidelberg (2006) 15. Lorenzen, P., et al.: Multi-class Posterior Atlas Formation via Unbiased KullbackLeibler Template Estimation. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 95–102. Springer, Heidelberg (2004) 16. Lorenzen, P., et al.: Multi-modal image set registration and atlas formation. Med. Imag. Analysis 10, 440–451 (2006) 17. Miller, M.I.: Computational anatomy: shape, growth and atrophy comparison via diffeomorphisms. Neuroimage 23(Suppl. 1), 19–33 (2004) 18. Nichols, T.E., Holmes, A.P.: Non parametric permutation tests for functional neuroimaging: a primer with examples. Hum. Brain Map. 15, 1–25 (2001) 19. Pennec, X., et al.: Riemannian elasticity: A statistical regularization framework for non-linear registration. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 943–950. Springer, Heidelberg (2005) 20. Pennec, X.: Left-invariant Riemannian elasticity: a distance on shape diffeomorphisms? In: MFCA, pp. 1–13 (2006) 21. Studholme, C., et al.: Detecting spatially consistent structural differences in Alzheimer’s and fronto-temporal dementia using deformation morphometry. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, pp. 41–48. Springer, Heidelberg (2001) 22. Studholme, C., Cardenas, V.: A template free approach to volumetric spatial normalization of brain anatomy. Patt. Recogn. Lett. 25, 1191–1202 (2004) 23. Thompson, P.M., et al.: Growth Patterns in the Developing Brain Detected By Using Continuum-Mechanical Tensor Maps. Nature 404, 190–193 (2000) 24. Thompson, P.M., et al.: Mathematical/Computational Challenges in Creating Population-Based Brain Atlases. Hum. Brain Map. 9, 81–89 (2000) 25. Thompson, P.M., et al.: Thinning of the cerebral cortex visualized in HIV/AIDS reflects CD4+ T-lymphocyte decline. Proc. Nat. Acad. Sci. 102, 15647–15652 (2005) 26. Thompson, P.M., et al.: 3D mapping of ventricular and corpus callosum abnormalities in HIV/AIDS. Neuroimage 31, 12–23 (2006) 27. Twining, C.J.: A unified information-theoretic approach to groupwise non-rigid registration and model building. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 190–193. Springer, Heidelberg (2005) 28. Woods, R.P.: Characterizing volume and surface deformation in an atlas framework: theory, applications and implementation. Neuroimage 18, 769–788 (2003) 29. Younes, L.: Jacobi fields in groups of diffeomorphisms and applications. Quar. J. of Appl. Math 65, 113–134 (2007)
Shape-Based Myocardial Contractility Analysis Using Multivariate Outlier Detection Karim Lekadir1, Niall Keenan2, Dudley Pennell2, and Guang-Zhong Yang1 1
Visual Information Processing, Department of Computing, Imperial College London, UK 2 Cardiovascular Magnetic Resonance Unit, Royal Brompton Hospital, London, UK
Abstract. This paper presents a new approach to regional myocardial contractility analysis based on inter-landmark motion (ILM) vectors and multivariate outlier detection. The proposed spatio-temporal representation is used to describe the coupled changes occurring at pairs of regions of the left ventricle, thus enabling the detection of geometrical and dynamic inconsistencies. Multivariate tolerance regions are derived from training samples to describe the variability within the normal population using the ILM vectors. For new left ventricular datasets, outlier detection enables the localization of extreme ILM observations and the corresponding myocardial abnormalities. The framework is validated on a relatively large sample of 50 subjects and the results show promise in localization and visualization of regional left ventricular dysfunctions.
1 Introduction Assessment of the left ventricle is important to the management of patients with cardiac disease. With increasing advances in imaging techniques, most modalities now offer routine 4D coverage of the heart, allowing both global and local assessment of left ventricular morphology and function. Several existing semi- and fully-automatic segmentation methods allow rapid and objective delineation of the left ventricle [1-3]. Extracting relevant and reliable indices of myocardial contractile abnormality, however, remains a complex task [4-7]. Global markers such as stroke volume and ejection fraction are widely used in clinical practice but they are not suitable for identifying local abnormalities. Alternative regional assessment based on wall thickening is problematic for a number of reasons. First, important information such as shape, size and endo-cardial displacement are not encoded for dysfunction analysis. Additionally, only end-diastole and end-systole differences are taken into account, whilst certain symptoms such as cardiac dys-synchronization are related to the entire cardiac cycle. The definition of normal ranges is a further challenge as significant overlap exists with abnormal values. Furthermore, local assessment methods do not consider the geometry and motion at other locations of the left ventricle to detect inconsistencies. For these reasons, visual assessment by expert observers remains the gold standard in routine clinical applications, but it is time consuming and can involve significant bias. The proposed method for myocardial abnormality localization is based on interlandmark motion (ILM) vectors, which represent the simultaneous endo- and epicardial changes occurring at two regions of the left ventricle over the entire cardiac N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 834–841, 2007. © Springer-Verlag Berlin Heidelberg 2007
Shape-Based Myocardial Contractility Analysis Using Multivariate Outlier Detection
835
cycle. By combining pairs of locations of the left ventricle in the spatio-temporal representation, geometrical and dynamic inconsistencies can be identified efficiently, whilst the overlap between normal and abnormal values is reduced significantly. Additionally, ILM vectors can implicitly incorporate shape, size, thickness, and endo-cardial displacement for dysfunction analysis. To describe the variability within the normal population, multivariate tolerance regions are derived from training samples in a robust manner. For a given left ventricular dataset, an abnormality likelihood measure is estimated for each location from its associated ILM vectors and an iterative procedure enables the localization of myocardial abnormality. The method is validated with a population of 50 datasets containing normal and abnormal cases.
2 Methods 2.1 Inter-Landmark Motion (ILM) Vectors Conventional local assessment methods for abnormality localization consider each region of the left ventricle independently. Therefore, they do not take into account the global geometry and dynamics of the left ventricle in the analysis, nor in the definition of the normal ranges, thus causing significant overlap with abnormal values. To overcome these difficulties, this paper introduces inter-landmark motion (ILM) vectors, which describe the coupled motion and geometry over the entire cardiac cycle of pairs of myocardial locations (represented by landmark points). With this approach, each region is analyzed with respect to other locations of the left ventricle, allowing their coupled spatio-temporal relationships to be used for identifying geometrical or dynamic inconsistencies. Although the pairs of locations can be chosen for the entire left ventricle, it is more appropriate to restrict this to be within the same cross-section, where there is a high covariance between the landmarks. For each of the m points within the same cross-section, m − 1 ILM vectors can be derived. In the proposed framework, the required landmark-based representation of the myocardial boundaries is first obtained through delineation or segmentation. For each myocardial location (landmark), two rotational and translational invariant descriptors are extracted, i.e., the distances of the endo- and epi-cardial borders to a reference point on the same cross-section plane. The reference point is chosen as the center of the epi-cardial border as it is less susceptible to morphological variations. The invariance to scaling is not considered to allow the detection of size related abnormalities, such as dilatation. Each ILM vector, of dimension p = 4F , can be written as follows: T
v (Pi , Pj ) = (ai1, bi1,..., aiF , biF ,..., a j 1, bj 1,..., a jF , b jF )
(1)
where F is the number of frames in the cardiac cycle and a and b denote the endoand epi-cardial variables, respectively. The ILM vectors provide, in addition to size and thickness measures encapsulated by these variables, an implicit description of the shape of the myocardial borders.
836
K. Lekadir et al.
2.2 Multivariate Tolerance Regions In this work, the normal myocardial contractility properties are described using multivariate tolerance regions for each ILM vector. Given N training samples, a tolerance region TR in the p dimensional space can be described as:
TR = {v ∈ \ p | d (v, μ, Σ) < L}
(2)
where μ and Σ represent the location and scale of the multivariate distribution, d a distance measure to the center of the distribution and L a threshold that limits the size of the tolerance region to normal observations. The variability within the normal population can be well approximated by a multivariate normal distribution, in which case the location and scale in Eq. (1) are replaced by the mean observation v and the covariance matrix Sv , respectively. It was shown that an appropriate distance measure for multivariate normal tolerance regions is the Mahalanobis distance [8], i.e.,
d (v, v , Sv ) = (v − v )T Sv −1 (v − v )
(3)
In order to consider only the principle modes of variation, an eigen-decomposition of the covariance matrix can be applied. By rejecting the p − t noisy directions, the distance measure can be simplified to:
d (v, v , Sv ) =
t
∑ i =1
⎡U i (v − v ) ⎤ 2 i⎦ ⎣ where Sv = UEU T Ei
(4)
For the tolerance region limit L in Eq. (2), it was shown that it can be estimated from the critical values of the chi-square distribution as [10]:
L = χt2,(1−α )1 N
(5)
A training sample of normal subjects is used to capture the normal variability of myocardial contractility. In practice, however, extreme values of the ILM vectors may arise from some unexpected local abnormalities or due to errors in boundary delineation. This can considerably affect the calculation of the mean and covariance matrix. Therefore, a robust estimation of the tolerance region parameters is required. A natural robust estimator for the central observation of the distribution can be found by replacing the mean by the median vector, denoted as v * . A weighted estimation of the covariance can then be achieved in an iterative manner, where the robust covariance matrix Sv* at iteration t + 1 is calculated as: N
Sv* (t + 1) =
∑ w (v , v , S (t ))(v 2
i
*
* v
i
)(
− v * vi − v *
i =1
N
∑ ( i =1
)
w vi , v * , Sv* (t )
T
)
(6)
Shape-Based Myocardial Contractility Analysis Using Multivariate Outlier Detection
837
where w is a weight calculated from the observation, the median and the covariance matrix at previous iteration. The idea behind this formulation is to weight equally and heavily the observations that are close to the median and decrease the weights for observation further away. This procedure is repeated until the values of the weights do not change significantly. A definition of the weights can be written as follows:
(
w v, v * , S *
)
⎧⎪ 1 ⎪⎪ ⎪⎪ ⎛ ⎪ ⎜ d v, v * , S * − d0 = ⎪⎨ ⎪⎪exp ⎜⎜⎜− ⎜⎜⎜ 2σ02 ⎪⎪⎪ ⎜ ⎪⎪⎩ ⎝
((
)
(
)
if d v, v * , S * < d0 ⎞
) ⎟⎟⎟⎟ 2
⎟ ⎟⎟⎟ ⎠⎟
elsewhere
(7)
where d0 is a threshold calculated from the median d * and the robust standard deviation σ * of all distances [9], and σ0 specifies the decay rate for distances above the threshold, i.e.,:
d0 = d * + c1σ * (2 ≤ c1 ≤ 3) and σ0 = c2 σ * (0
((
d * = median d v, v * , S * 1≤i ≤N
d (v, v * , S * ) − d * )) and σ* = 1.4826 median 1≤i ≤N
(8)
2.3 Contractile Abnormality Identification For a given left ventricular dataset with delineated boundaries, the ILM vectors in Eq. (1) are calculated and outlying vectors are identified by using the following measure:
⎪⎧1 if v ∈ TR fv (v ) = ⎪⎨ ⎪⎪0 if v ∉ TR ⎪⎩
(9)
Because each ILM vector incorporates a pair of myocardial locations, it is not straightforward to identify which of the two landmarks corresponds to an abnormality when the vector in question is outside of the tolerance region. Abnormal landmarks, however, will have a high level of invalid ILM vectors. Therefore, a likelihood measure of abnormality can be calculated for each landmark by summing all measures from Eq. (9) for all the associated ILM vectors, i.e.,
fp (P ) = 1 −
1 m −1
m −1
∑f
v
(vi )
(10)
i =1
An iterative procedure is then used, with which the landmark with the highest abnormality measure is identified as abnormal. The abnormality measures of the remaining landmarks are updated by subtracting the contribution of the rejected landmark. The procedure is repeated until the highest abnormality measure is close to 0, suggesting that all remaining landmarks correspond to normal myocardial contractility.
838
K. Lekadir et al.
2.4 Validation The validation of the technique is carried out on a relatively large sample of 50 leftventricular datasets. The subjects were scanned using a 1.5T MR scanner (Sonata, Siemens, Erlangen Germany) and a TrueFISP sequence (TE = 1.5 ms, TR = 3 ms, slice thickness = 10mm and pixel size from 1.5 to 2mm) within a single breath-hold. Retrospective cardiac gating was used to ensure an even coverage of the entire cardiac cycle and for each subject 25 cine frames were acquired. For all datasets, delineation of the myocardial boundaries was carried out by an expert clinician by using a semi-automatic ventricular analysis tool. From the obtained contours, 182 landmarks were uniformly distributed by arc length for each of the endo- and epi-cardial shapes, where point correspondences were determined based on the location of the LV/RV junction points. Ejection fraction, stroke volumes and thickening were calculated and a detailed visual assessment was carried out by the expert observer for dysfunction localization. The 50 subjects were classified as normal, mildly/intermediately abnormal or severely abnormal. In this study, a total of 28 subjects were identified as normal by the expert observer and used for the tolerance model construction. All datasets were then evaluated using the proposed method, where the normal subjects were assessed on a leave-one-out basis.
3 Results The percentage of abnormal landmarks was calculated for each dataset and plotted in Fig. 1 against the visual classification ((a) normal, (b) mildly/intermediately abnormal and (c) severely abnormal). It can be seen from the figure that the calculated percentage of abnormality correlates well with the visual classification and that a good separation is achieved for almost all datasets. For numerical assessment of class separation, non-parametric tests were used and a significant difference between the 3 groups was found using the Kruskal-Wallis test (p<0.001) and post-hoc multiple
Fig. 1. Percentage of abnormality as calculated by the proposed technique for the entire datasets, plotted against the visual classification by the expert observer
Shape-Based Myocardial Contractility Analysis Using Multivariate Outlier Detection
839
comparisons using Mann-Whitney test showed significant differences between each of the 3 groups (p<0.001). The average abnormality percentage found for the normal, mildly abnormal and severely abnormal subjects were 1.0 ± 2.0, 14.2 ± 6.0 and 63.8 ± 25.1, respectively. Two normal datasets (shown in crosses) were misclassified by the proposed technique. The first one, characterized by extreme thickening of the myocardium (probably due to stress during the scan), was misclassified because of a training sample that did not include the corresponding variability. The second misclassification is due to right ventricular dysfunction and is discussed in detail in Fig. 4(b).
Fig. 2. Percentage agreement between automatic and manual dysfunction analysis for the normal and abnormal segments
Fig. 3. An example comparing the 17-segment based local assessment achieved by the automatic and manual abnormality analysis methods
840
K. Lekadir et al.
Fig. 4. Three examples illustrating the dysfunction analysis achieved by the proposed method. The results are mapped onto the LV surface for abnormality localization and visualization.
For local assessment of the technique, five datasets were selected from each of the three classes and further analyzed using the American Heart Association/American College of Cardiology (AHA/ACC) recommended 17-segment model. The segments were classified by the expert as normal or abnormal and for the proposed method, the abnormality measures were averaged for each segment. By counting the number of misclassifications, a percentage of agreement between the proposed method and the visual classification was calculated for the fifteen datasets and plotted in Fig. 2. It is evident from the graph that a good agreement was achieved throughout the datasets, with an average percentage equal to 91.3 ± 6.9 %. Fig. 3 shows an example of the segmental assessment achieved by both the automatic and manual methods. The identified abnormal segments correspond to myocardial infarct as it can be seen on the short axis images. The manual and automatic classifications correspond well overall. To visualize the localization of the abnormalities, LV surface maps were constructed using the results of the proposed abnormality detection technique. Three examples are displayed in Fig. 4, where the lighter shading corresponds to normal myocardial contractility while the darker shading indicates a local abnormality. The example in (a) is a left ventricle with partial dilatation and abnormal ejection fraction (36 %). Due to a formation of scar tissue at the antero-lateral region, the wall does not thicken as shown on the MR images and as identified by the abnormality map. The example in (b) corresponds to the second misclassification from Fig. 1, with normal ejection fraction and thickening measures. The subject, however, has an abnormal right ventricle which is affected by pulmonary hypertension, and thus pushes into the left ventricle. This causes a severe deformation at the septal region of the LV, which is correctly identified by the proposed method. In (c), a normal thickening is found overall, but it is less significant at the infero-lateral region than at the other regions of the myocardium. This is usually suspicious of myocardial ischemia.
Shape-Based Myocardial Contractility Analysis Using Multivariate Outlier Detection
841
4 Conclusion This paper presents a new model-based approach to myocardial contractility analysis, capable of detecting shape and motion inconsistencies between myocardial regions by using inter-landmark motion vectors and multivariate outlier detection. The results obtained on a relatively large sample show promise in localization and visualization of regional left ventricular dysfunctions. Future work includes the validation of the method on different groups of patients to enhance its clinical value, as well as its application to right ventricular dysfunction analysis.
References 1. Mitchell, S.C., Bosch, J.G., Lelieveldt, B.P.F., Geest, R.J.V.D., Reiber, J.H.C., Sonka, M.: 3-D active appearance models: segmentation of cardiac MR and ultrasound images. IEEE Transactions on Medical Imaging 21, 1167–1178 (2002) 2. Lekadir, K., Merrifield, R., Yang, G.-Z.: Outlier detection and handling for robust 3D active shape models search. IEEE Transactions on Medical Imaging 26, 212–222 (2007) 3. Jolly, M.-P., Duta, N., Funka-Lea, G.: Segmentation of the left ventricle in cardiac MR images. In: International Conference on Computer Vision (ICCV) (2001) 4. Frangi, A.F., Niessen, W.J., Viergever, M.A.: Three-dimensional modeling for functional analysis of cardiac images: a review. IEEE Transactions on Medical Imaging 20, 2–25 (2001) 5. Suinesiaputra, A., Üzümcü, M., Frangi, A.F., Kaandorp, T.A.M., Reiber, J.H.C., Lelieveldt, B.P.F.: Detecting regional abnormal cardiac contraction in short-axis MR images using independent component analysis. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, Springer, Heidelberg (2004) 6. Declerck, J., Feldmar, J., Ayache, N.: Definition of a 4D continuous planispheric transformation for the tracking and the analysis of LV motion. Medical Image Analysis 2, 197– 213 (1998) 7. Shi, P., Sinusas, A.J., Constable, R.T., Ritman, E., Duncan, J.S.: Point-tracked quantitative analysis of left ventricular surface motion from 3D image sequences: Algorithms and validation. IEEE Transactions on Medical Imaging 19, 36–50 (2000) 8. Healy, M.J.R.: Multivariate normal plotting. Applied Statistics 17, 157–161 (1968) 9. Huber, P.J.: Robust statistics. Wiley, New York (1981) 10. Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955 (1999)
Orthopedics Surgery Trainer with PPU-Accelerated Blood and Tissue Simulation Wai-Man Pang1, Jing Qin1 , Yim-Pan Chui1 , Tien-Tsin Wong1 , Kwok-Sui Leung2 , and Pheng-Ann Heng1,3 1
Dept. of Computer Science and Engineering, The Chinese University of Hong Kong 2 Department of Orthopaedics and Traumatology, CUHK 3 Shenzhen Institute of Advanced Integration Technology, Chinese Academy of Science/CUHK
Abstract. This paper presents a novel orthopedics surgery training system with both the components for modeling as well as simulating the deformation and visualization in an efficient way. By employing techniques such as optimization, segmentation and center line extraction, the modeling of deformable model can be completed with minimal manual involvement. The novel trainer can simulate rigid body, soft tissue and blood with state-of-the-art techniques, so that convincing deformation and realistic bleeding can be achieved. More important, newly released Physics Processing Unit (PPU) is adopted to tackle the high requirement for physics related computations. Experiment shows that the acceleration gain from PPU is significant for maintaining interactive frame rate under a complex surgical environments of orthopedics surgery.
1
Introduction
Orthopedics surgery simulation is complex due to the co-existence of various kinds of body tissues with heterogeneous mechanical behaviors, such as bones, dermis, fatty tissues and blood vessels. This obviously makes the modeling and simulation more complicated and challenging. Many existing simulators for Orthopedics surgeries are focusing on the interactivity and training procedures, while realism is compromised because of limited computational resources and the lack of realistic mechanical and anatomical data. For example, the physical simulation of bleeding and tissue deformation, which are indispensable components in orthopedics surgical simulation, are usually far from realistic. Besides, modeling of soft tissue and vessel are usually prepared manually, which may not reveal the correct anatomy and appearance of a real organ. To meet these challenges, in this paper, we present a novel virtual reality (VR) system for Orthopedics surgery training. New consumer-level Physics Processing Unit (PPU) is used to accelerate the physical computation involved in bleeding simulation and tissue deformation. By exploiting the power of the PPU, high realism and interactive frame rate can be simultaneously achieved in a cost-effective way. Moreover, geometry construction and visualization of dermis, N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 842–849, 2007. c Springer-Verlag Berlin Heidelberg 2007
Orthopedics Surgery Trainer
843
muscles and vessels are mainly based on anatomical information from Chinese Visible Human (CVH) dataset. This helps a lot in preparing a surgical environment similar to real situations. Finally, taumar surgery on upper limb is chosen as a particular application to demonstrate our system.
2
Related Work
Many academic and commercial VR-based orthopedics surgery simulators have been proposed for the training of surgeons [1,2,3]. Most of them can be categorized into two classes based on the nature of surgery involved, namely arthroscopy and trauma surgery. McCarthy et al. [4] proposed the Sheffield Knee Arthroscopy Training System (SKATS), in which trainees are expected to learn the skill of navigating the knee arthroscopically. Haptic devices are incorporated to increase the learning speed of users. Reinig et al. [5] presented the US Militarys Thigh Trauma Simulator which provides a virtual environment with case scenarios of thigh trauma resulting in femur fractures. Our current system is mainly targeted towards trauma surgeries and tries to simulate the kind of open surgery in orthopedics involving incision. Hardware acceleration for rendering has long been used in many graphics applications including surgical applications. Similar to rendering, physically based deformation is another essential and computationally intensive component in surgical simulations. Many researchers [6,7] have tried to exploit the power of the programmable GPU for physics computation, but they usually suffer from certain difficulties or limitations, such as the restriction on the topology of deformable model. Recently, a specialized hardware accelerator for physics known as Physical Processing Unit (PPU) is released for complex dynamic motion, interaction and physics computation [8]. More details of the capabilities of the PPU in our virtual orthopedics simulator will be introduced in section 4.
3
Geometric Modeling
In our system, creation of deformation and visualization models is based on anatomical information provided by the CVH dataset. In order for the models to resemble human organs, we employ a layered soft tissue model. Each layer is modeled as a mass-spring model with different topologies, like tetrahedron or lattice structures. In the modeling of human dermal-muscular tissues, epidermis is in tetrahedron structure while others are in lattice structure as shown in Figure 1(a) and (b). Based on the segmented CVH, consistent polygonal surfaces of each tissue layer are extracted as Stereolithography (STL) format. By re-parameterizing the polygonal surface meshes into a quad-based surface, we can easily control the complexity of the surface and skip the manual smoothing procedures. Next, a multi-layered mass-spring model can be automatically produced based on the re-parameterized surface meshes. For the tetrahedron layer, every triangle in the upper surface projects its centroid in the normal direction to hit the lower
844
W.-M. Pang et al.
Fig. 1. The multilayer topology of mass-spring model, (a) a side view, (b) a 3D view, (c) the upper limb model from CVH data and (d) the deformed model by external pressure
surface layer. These intersection points, namely projected centroids, form the masses units in the lower layer. Then, springs are added to connect the vertices in original triangle and its projected centroid. This creates a tetrahedral mesh structure between the two surface layers. Afterwards, triangulation is carried out on these projected centroids. Similar processes are repeated for the lower layers until all layers are constructed. Apart from the topology building of the mass-spring model, some auxiliary modules are used to extract boundary, texture from the CVH, adjust mass density and other mechanical properties. Figure 1(c) shows a textured mass-spring model generated using the CVH’s upper limb. To automatically construct the deformable vessel model, the centreline of the entire vascular structure have to be extracted first. Since bifurcations (branches) occur throughout the vascular network, a tree structure is deployed for storing this structure. After centerline extraction, we perform geometric modeling. First, a series of Bezier curves are fitted onto the extracted centreline where the distribution of control points depend on the local curvature of the vascular pathway. Bifurcations are handled through a modified Bezier triangle. Finally, the external surface structure are modeled through a swap of local frame.
4
PPU-Accelerated Integrated Orthopedics Simulation
Instead of concentrating on the simulation of bone,our system is a more comprehensive simulation system for different kinds of orthopedics surgery. Both realistic blood flow simulation and soft tissue deformation are supported in our integrated orthopedics simulator. These two components are most computationally intensive. Many of the previous simulators try to simplify their models for maintaining an interactive frame-rate, however, this will reduce the realism provided to the user. As a result, our system utilizes the recently released PPU to accelerate physics simulations. In addition to the deformation computation, the collision and interaction can be handled inside the PPU which only require construction of a collision mesh model at the initialization.
Orthopedics Surgery Trainer
4.1
845
MSM-Based Soft Tissue Deformation
It is well-known that softtissue exhibits non-linear viscoelasticity, that means the stress-strain relation is non-linear. In general, it has a very low stiffness at the beginning and extremely high stiffness after the tissue is stretched to a certain extent. Therefore, a biphasic mass-spring system is used to mimic the non-linear behavior of body tissues. However, the latest PPU is built only with linear mass-spring system implementation, even though it allows for the definition of mass-spring in arbitrary topology. As a result, we need to extend the current linear mass-spring system by making use of two linear springs with different elastic configurations in parallel. The basic idea is letting the first linear spring be effective in the first phase of the stress-strain curve, while the second linear spring becomes activated when the turning point is reached. This simple construction of biphasic spring from linear springs leads to a simple decomposition of a biphasic spring into two linear springs and can be done automatically when the deformable model is initialized. In order to obtain a macroscopic mass-spring system that is consistent with biomechanical properties of body tissue [9], we carry out an optimization process on (microscopic) individual spring elasticity. We choose simulated annealing for optimization, because it is very suitable for problems with many parameters and the first derivative of the objective function is unavailable. The objective function ψ for optimization is formulated as Equation 1, wi (pi − qi )2 (1) ψ= i
where pi is the value of the i-th parameter from the experiment and qi is the corresponding value for that particular tissue coming from data books. wi is the weighting factor for that parameter. If biphasic spring is used, i is 3 and pi are the first phase stiffness (K1), turning length and the second phase stiffness (K2); while for a linear spring, i is 1 which referring to stiffness alone. In order to evaluate pi , experiments are carried out on the multilayer model repeatedly until a minimum objective is found. The springs in all tissue layers are optimized in a similar manner. Table 1 reports the mechanical properties of real dermis as well as our optimized model. Notice that the optimization process is only necessary to be performed once for each tissue. Figure 1(d) shows a deformed result of the upper limb model. Theoretically, higher-order spring models should produce more realistic deformation, however, the visual difference is insignificant and the performance penalty induced by them might not worth. The relaxation of using high order spring is possible to certain tissue, for example, we did relax the spring to linear model like hypodermis. Table 1. Comparison of real dermis elastic properties with our optimized result
real dermis optimized model
K1 (KPa) K2 (KPa) Turning point (in strain) 0.1 18.8 0.4 0.118 18.51 0.41
846
W.-M. Pang et al.
r
kernel
)d*
)c*
)b*
Fig. 2. The computational model of Smoothed Particle Hydrodynamics (SPH), (a) a particle and its inference kernel with radius r, (b) the blood surface extraction from particles, and (c) the rendered result
4.2
SPH-Based Blood Flow Simulation
Smoothed Particle Hydrodynamics (SPH) is known to be an efficient method in many graphics application for fluid animation [10,11]. Moreover, it can be accelerated with the PPU, so that sophisticated blood effects can run at a high frame rates. SPH is a particle-based method for fluid simulation, in which nearby particles interact with each other according to the following formula, Q(l) = aj Qj f (l − lj , r) (2) j
where Q defines a scalar quantity at location l, which can be density, pressure and viscosity. It is a weighted sum of contributions from neighboring particles (denoted as j) within radius r, based on the field quantity Qj and inversely m proportional to their distance l − lj . aj is equal to ρjj , with mj and ρj are the mass and density of particle j respectively. The function f (l, r) is referred as the smoothing kernel with radius r as shown in Figure 2(a). The governing equations for fluid dynamics are the Navier-Stokes equations, which formulate conservation of momentum as follow, ∂v + v · ∇v = − ∇p + ρg + µ∇2 v, (3) ρ ∂t Notice that the above formulations had assumed the fluid is newtonian (e.g. water). According to biomechanicalliteratures, blood is regarded as nonnewtonian fluid. However, by carefully adjusting the properties, we can achieve similar visual effect. In our experiments, we use 3cP for viscosity, 20 N for stiffness and 1.06 g/cm−3 for density. To render thesurface from a cloud of particles, there are many existing methods, such as level-set, marching cube or point splatting. For performance reasons, we use the marching cubes method to track the blood surface as a triangular mesh and then render it using the standard graphics pipeline. g=
∇d
(4)
Orthopedics Surgery Trainer
|g(ri )| > n=
847
h
(5)
g(ri ) − |g(r . i )|
(6)
The surface tracking starts with the building of a color field d within the space that the particles occupy. Then, we compute the gradient field g using Equation 4; surface elements are located using Equation 5 with h being the threshold. Finally, the surface normal n is obtained by normalizing g according to Equation 6.
5
Time Performance and Visualization
Experiments are carried out to compare the performance of a PPU-based system and a solely CPU-based system. Our two core components, the mass-spring modeled soft-tissue deformation and the SPH modeled bleeding simulation are used in the experiments. The test platform is a PC equipped with Pentium 4 Dual Core 3.2 GHz CPU and 4 GB memory, while the PPU is AGEIA PhysX Processor with 128MB. Figure 3(a) shows the experimental results of the mass-spring model. A significant speed improvement is observed in all grid-sizes with the use of the PPU, especially the double improvement in 15×5 and 18×5 grids. For fluid simulation using smoothed particles hydrodynamics method, the real-time frame rate is reported using the PPU when the number of particles is below 6,000 (see Figure 3(b)). With the same number of particles, pure CPU version decreases much faster in performance. The sudden performance drop at 6,000 particles in the PPU may be due to a data transfer bottleneck on the relatively slow PCI bus interface. From our experience, around 3,000 particles is enough for an acceptable bleeding effect. An integrated environment with both MSM (39×9 mesh size) and SPH (5,000 particles) running can maintain an interactive frame rate of above 10 frames per second. We demonstrate our orthopedics training system with an upper limb surgery training programme, in which a deformable upper limb is modeled as the 70
70
30
Grid Size
(a)
00
00
00
00 80
70
60
00
00
00 50
40
30
0
x9
x1
36
39
x8
x7 30
33
x6
x6 24
27
x6
x5
x5 18
21
15
12
12
x5
0
x3
10
0
00
20
10
00
20
40
20
30
CPU
50
10
40
PPU
60
Frame rate (frame /s)
Frame Rate (F/S)
50
90
PPU CPU
60
Number of particles
(b)
Fig. 3. Comparison on time performance of using the PPU and CPU for (a) massspring model deformation and (b) SPH bleeding simulation
848
W.-M. Pang et al.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4. Open orthopedic surgery: (a) the mass-spring model of the CVH upper limb, (b) the surface model, (c) the incision fixed by arms, (d) pulling on the incision, (e) an accidental cut at the blood vessel and (f) bleeding from the blood vessel
proposed mass-spring model and bleeding is simulated by the SPH particle method. By providing the generated mass-spring model, fluid particles and collision models to the PPU, we receive an updated coordinate of the masses and fluid particles in each time step. Based on these updated positions, we render the tissue models with texture from the CVH dataset. Figures 4(a) and (b) show the mass-spring model and the surface model generated using an upper limb of the CVH dataset respectively. Figures 4(c) and (d) show an incision being done on the upper limb and that the limb can be deformed by a user for investigation during training. Bleeding is common during surgery, especially when there is accidental surgical fault, as shown in Figures 4(e) and (f). The user can practice handling bleeding using a draining tool. Realistic bleeding and soft-tissue deformation effects improve the experience given to the trainee.
6
Conclusion and Future Work
An integrated system for orthopedic surgery training is presented. The proposed method for modeling soft-tissue and blood by a combination of CVH dataset and biomechanics information is shown to be effective and efficient. By accelerating the computation with the PPU, a high fidelity of realism and interactive frame rate
Orthopedics Surgery Trainer
849
can be achieved in a virtual environment simulating both soft-tissue deformation and bleeding. Experiments have demonstrated the practical value of our work on an orthopedics trainer. We believe our work will be beneficial to similar surgical simulators as well. As a future work, we will try to enhance the visualization of blood and soft tissues by using more robust and advanced rendering techniques.
Acknowledgments The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project no. CUHK4223/04E). This work is also affiliated with the Virtual Reality, Visualization and Imaging Research Centre at The Chinese University of Hong Kong as well as the Microsoft-CUHK Joint Laboratory for Human-Centric Computing and Interface Technologies.
References 1. Olga, S., Alexei, S., Sen, H.T.: Orthopaedic surgery training simulation. Journal of Mechanics in Medicine and Biology, World Scientific (accepted, 2006) 2. Anderson, B.D., Nordquist, P., Skarman, E., Boies, M.T., Anderson, G.B., Carmack, D.B.: Integrated Lower Extremity Trauma Simulator. In: Westwood, J.D., et al. (eds.) Proceedings of Medicine Meets Virtual Reality, 2006, pp. 19–24. IOS Press, Amsterdam (2006) 3. Cannon, W.D., Eckhoff, D.G., Garrett, W.E., Hunter, R.E., Sweeney, H.J.: Report of a group developing a virtual reality simulator for arthroscopic surgery of the knee joint. Clinical Orthopaedics & Related Research 442, 21–29 (2006) 4. McCarthy, A.D., Moody, L., Waterworth, A.R., Bickerstaff, D.R.: Passive haptics in a knee arthroscopy simulator: Is it valid for core skills training? Clinical Orthopaedics & Related Research 442, 13–20 (2006) 5. Reinig, K., Lee, C., Rubinstein, D., Bagur, M., Spitzer, V.: The united states military’s thigh trauma simulator. Clinical Orthopaedics & Related Research 442, 45–56 (2006) 6. Georgii, J., Westermann, R.: Mass-spring systems on the gpu. Simulation Modelling Practice and Theory 13(8), 693–702 (2005) 7. Mosegaard, J., Sorensen, T.S.: A gpu accelerated spring-mass system for surgical simulation. In: MMVR 2005. Proceedings of 13th Medicine Meets Virtual Reality (MMVR), pp. 342–348 (2005) 8. Nealen, A., Muller, M., Keiser, R., Boxerman, E., Carlson, M.: Physically based deformablemodelsincomputergraphics.ComputerGraphicsForum25(4),809–836(2005) 9. Silver, F.H., Freeman, J.W., Devore, D.: Viscoelastic properties of human skin and processed dermis. Skin Research and Technology (7), 18–23 (2001) 10. M¨ uller, M., Charypar, D., Gross, M.: Particle-based fluid simulation for interactive applications. In: SCA 2003. Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, Aire-la-Ville, Switzerland, Switzerland, Eurographics Association, pp. 154–159 (2003) 11. M¨ uller, M., Schirm, S., Teschner, M.: Interactive blood simulation for virtual surgery based on smoothed particle hydrodynamics. Technol. Health Care 12(1), 25–31 (2004)
Interactive Contacts Resolution Using Smooth Surface Representation J´er´emie Dequidt1 , Julien Lenoir2 , and St´ephane Cotin1,3 1
2
SimGroup, CIMIT, Cambridge, USA [email protected] Alcove Project, LIFL/INRIA Futurs, Lille, France [email protected] 3 Harvard Medical School, Boston, USA [email protected]
Abstract. Accurately describing interactions between medical devices and anatomical structures, or between anatomical structures themselves, is an essential step towards the adoption of computer-based medical simulation as an alternative to traditional training methods. However, while substantial work has been done in the area of real-time soft tissue modeling, little has been done to study the problem of contacts occurring during tissue manipulation. In this paper we introduce a new method for correctly handling complex contacts between various combination of rigid and deformable objects. Our approach verifies Signorini’s law by combining Lagrange multipliers and the status method to solve unilateral constraints. Our method handles both concave and convex surfaces by using a displacement subdivision strategy, and the proposed algorithm allows interactive computation times even in very constrained situations. We demonstrate the efficiency of our approach in the context of interventional radiology, with the navigation of catheters and guidewires in tortuous vessels and with the deployment of coils to treat aneurysms.
1
Introduction
Real-time soft tissue modeling has been the focus of a majority of publications in the field of medical simulation [1,2,3,4]. This can be explained by the importance of tissue-tool interactions in the overall realism of a simulation, but also by the complexity of the problem. Accurately modeling the deformation of an anatomical structure during tissue manipulation is a very difficult task, in particular when non-linear stress-strain relationships are required while at the same time maintaining real-time computation [2,4]. However, even complex models cannot correctly describe soft tissue deformations unless the contacts occurring during tissue manipulation are correctly determined. Very often, interactions are limited to a single point of contact [5] or at least a very localized area of contact, that would correspond for instance to grasping the tissue or probing it. There are many medical procedures, however, where such limited interactions are not sufficient. For instance, in surgery, tissue palpation requires much more complex N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 850–857, 2007. c Springer-Verlag Berlin Heidelberg 2007
Interactive Contacts Resolution Using Smooth Surface Representation
851
interactions, and in laparoscopic surgery, many procedures require a combination of sliding and grasping that cannot be modeled by usual approaches. A very illustrative example is found in interventional radiology, with catheter navigation or coil deployment. Such procedures involve inserting or deploying flexible devices in very tight spaces, thus leading to a large number of contacts combined with sliding conditions. Finally, besides contacts occurring during tissue-instrument interactions, it is typical in surgical procedures that organs slide and collide against other anatomical structures. Taking into account that type of contacts would certainly have a positive impact on the realism of the simulation. Modeling contacts involves not only detecting the occurrence of a contact but also computing the involved structures’ collision response. While the problem of collision detection has been often addressed in Computer Graphics and to some extent in medical simulation, the issue of modeling the collision response has mostly been addressed in the fields of Mechanical Engineering and Robotics. However, when taking into account the particular constraints inherent to realtime simulation and deformable structures, little has been done. Among the most relevant work, Kry et al. [6] propose a technique well suited for evolving contacts on a smooth surface, incorporating both slip and no-slip friction. Their method is very fast but only handles simple contacts between rigid surfaces, which need to be described as a parametrization. Garcia et al. [7] introduce a fast algorithm based on fuzzy logic. Using kinetic and geometric information of a surgical tool interacting with an organ, they compute the new position of the colliding vertices using simple rules. However this method is limited to collisions between a rigid object and a deformable one and the collision response, based on a projection technique, does not take into account the physics of the objects in contact. The method we previously introduced in [8] handles contacts by computing the local compliance at the point of contact and by describing it in the contact space using the Delassus operator. Collision detection is performed using a proximity measure with the triangulation of the surface. The problem of multiple contacts is then solved using an iterative solver, in particular a Gauss-Seidel algorithm. Although efficient, this approach has a non-optimal rate of convergence that lead to either non plausible behavior or increased computation times. In this paper, we present a real-time algorithm based on Lagrange multipliers to handle multiple complex contact situations between combinations of rigid and deformable objects. After introducing some basic notions of contact mechanics, we present in section 2.2 a method based on Lagrange multipliers to design a fast, robust and generic algorithm to handle contacts between hundreds of colliding degrees of freedom. We then describe in section 2.3 a fast collision detection method based on implicit surface representation, and we conclude with a series of results in the context of interventional radiology.
2
Contact Modeling
Modeling contacts is a well known topic in Computational Mechanics and has recently been an active research field in Computer Graphics [9,10,11]. In this
852
J. Dequidt, J. Lenoir, and S. Cotin
section we give an overview of the mechanics of contacts, in particular Signorini’s law and its linearization in the contact space, formulated as the Delassus operator. Then we present our approach based on the use of Lagrange multipliers and a displacement subdivision strategy. 2.1
Mechanics of Contacts
Definition. A contact is an unilateral constraint applied on a specific point P (the contact point): g(P ) ≥ 0. A mechanical system usually defines a set of degrees of freedom (DOFs) x characterizing its physical state. Given this notation, a contact gi links several DOFs via a linear relationship gi (x) = j hij xj . By extension, a set of contacts gi (x) ≥ 0 noted g(x) are linked to a set of DOFs via a matrix g(x) = Hx where each line of H expresses a contact relatively to the DOFs. Signorini’s Law. The conditions of contact are given by the Signorini’s law, for n n each point P of the contact area, as: 0 ≤ δP ⊥ fPn ≥ 0 where δP is the interpenetration distance evaluated at P (shortest euclidian distance to the other object’s surface) and fPn the amplitude of the normal force needed to solve the contact. In the case of frictional sliding, a tangential component fPt is introduced, leading to a contact force fP = fPn + fPt . From a mathematical stand point, this law translates the orthogonality between the contact force and the interpenetration n n fP = 0. This means that either an interpenetration occurs, requiring distance δP a non-null normal force to bring back the contact, or that the constraint is not violated because the distance to the surface is non-null, therefore no force is required to correct the position. Delassus Operator. When dealing with simulations, the motion of the objects is discretized into a series of time steps. Some of the external forces are known at the beginning of a time step (gravity, user-specified forces, etc.) while others only appear during the time step, and depend on the current state of the mechanical system. This is the case of the contact forces. Such forces are called implicit, while the known ones are called explicit. Dealing with implicit forces leads in general to solving a non-linear problem. If the deformations are linear (or linearized during the time step), a way of dealing with both implicit and explicit forces is to split the computation in two steps. First we compute a configuration called free motion, noted xf , in which we take into account only the explicit forces, not the contacts. Second, a collision detection is performed and a corrective motion xc is computed. The correction xc is such that the final position x = xf + xc verifies the unilateral constraints. Separating explicit and implicit forces independently is a consequence of the superposition principle, and therefore can only be applied if the equations of motion are linearized. After computing xf we nf ree can evaluate the actual contact violation δP and solve the contact problem nf ree n T n where C is the compliance matrix of the mechanical δP = HCH fP + δP
Interactive Contacts Resolution Using Smooth Surface Representation
853
system, and the matrix HCH T expresses the contact’s coupling in the contact space1 This operator is well known in mechanics as the Delassus operator, and the previous equation is called a linear complementary problem (LCP). A LCP can be solved in different ways, using a Lemke or Gauss-Seidel technique for instance. Once the contact problem is solved, we obtain the contact force fPn . Since fPn is defined in the contact space, it has to be transformed back to the DOFs space before being applied to P . We write f = H T fPn , and the solution x verifying the constraints is then determined by x = xf + (CH T )fPn . 2.2
Solving Contacts with Lagrange Multipliers
Lagrange multipliers is a well known mathematical method to define bilateral constraints, although there exist a few references of work using Lagrange multipliers to solve unilateral constraints, such as [12] for instance. In this section we describe our contact modeling algorithm and show the equivalence between the definition of a contact in classical mechanics (Signorini’s law and Delassus operator) and the use of Lagrange multipliers. Solving Unilateral Contacts. If we assume that several objects are in contact (these objects can be deformable, rigid or even inert) then we can define a mechanical system representing this set of objects. Whether it is static or dynamic, the stiffness or mass matrix of the system will have a similar structure, i.e. a block diagonal matrix where each block is the stiffness or mass matrix of an object within the mechanical system. Without lack of generality, lets assume the system is static, and that its stiffness matrix is K. In the absence of contacts, each block of K is independent of the other ones. When contacts are detected, we introduce Lagrange multipliers in the system, thus creating a dependency between certain DOFs. Then, if we take into account the decomposition x = xf +xc , the contact problem can be described as: ⎧ ⎧ = f f Kxf Kxf = ⎨ ⎨ c T =H λ ⇔ Kx HK −1 H T λ = δ − Hxf ⎩ ⎩ H(xc + xf ) = δ xc = K −1 H T λ Two steps are required to solve these equations. First we compute the free motion from the explicit forces (xf = K −1 f ). Given xf , we perform a collision detection that allows us to evaluate H, and therefore δ. We then solve HK −1 H T λ = δ − Hxf and obtain λ. Since we are dealing with unilateral constraints, not all constraints are necessarily needed to enforce the inequality condition on x. Redundant constraints (for which the corresponding value in λ is negative) are then deactivated. This is the so called status method. At this point, we can evaluate the corrective motion xc = K −1 H T λ and compute the new position x = xf + xc . However, this new configuration does not necessarily meet the 1
In the case of a static system, we have C = K −1 . For a dynamic system, the previous equations involve the acceleration x ¨, thus requiring a time integration to determine −1 M D . x. Using an Euler implicit integration scheme, this leads to C = Δt 2 + Δt + K
854
J. Dequidt, J. Lenoir, and S. Cotin
initial constraints since we use a linear approximation of the local shape at the point of contact. As a consequence, an iterative scheme is introduced, during which a collision detection is performed on the new configuration x to check if some contacts are still violated (lines 18 to 22 in the algorithm below). If it is the case, a new evaluation of H is performed, and a new value of xc is computed. This is repeated until all current contacts are solved, and in most cases, less than 10 iterations and required. Since K −1 does not need to be recomputed, if the collision detection is handled efficiently (see section 2.3), these iterations lead to a limited overhead. At this point, all contacts initially detected are solved. However, when solving these contacts, it is likely that new ones will appear. This is typical of any collision response algorithm. In our case we solve all contacts within a given time step, rather than the next time step. This explains the main loop (lines 7 to 22) in the algorithm below. Checking for new contacts within the same time step adds a computational overhead but ensures a more consistent (and contact free) configuration at the beginning of the following time step. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Solve Kxf = f contact = () , done = true for i ← 1 to n do if DetectCollision(Pi (xf )) then contact+ = (Hi , δi ) done = false while !done do repeat Solve HK −1 H T .λ = δ − Hxf // I) Constraints deactivation using status method done = true if ∃ i | (λi < 0) then Remove from contact : contactj | (λj = min(λi )) done = false until done ; xc = K −1 H T λ // II) Constraints activation done = true for i ← 1 to n do if DetectCollision(Pi (xf + xc ) ) then contact+ = (Hi , δi ) done = false
Equivalence with the Mechanics of Contacts. One can note that HK −1 H T represents the contacts coupling, which is exactly the meaning of the Delassus operator defined in section (2.1) with K −1 ≡ C the compliance matrix of the mechanical system. Moreover, the Lagrange multipliers λ give the force in the contact space, which is equivalent to fPn in the Delassus operator approach. This means the corrective motion computed using Lagrange multipliers is identical to the one derived from the Delassus operator, i.e. xc = K −1 H T λ = CH T fPn . The
Interactive Contacts Resolution Using Smooth Surface Representation
855
equivalence between Signorini’s law / Delassus operator and our approach based on Lagrange multipliers / status method is very important as it shows that the contacts are modeled accurately with our approach. 2.3
Collision Detection
Implicit Surface Modeling. For organic shapes (i.e. shapes that do not exhibit sharp features) a fast collision detection can be performed by using an implicit description of the surface, rather than a triangulation. The surface is then described using a combination of geometrical primitives and a convolution filter. The primitives can be either points, segments, triangles or other simple shapes. The convolution filter h is defined as a function from R3 −→ R+ with a finite support or fast decay to 0 and the resulting surface is f (P ) = h(P ) ⊗ s = iso, where iso is an isosurface value and f (P ), with the potential at P a point in R3 . In addition, pathologies such as tumors or aneurysms can be modeled by locally modifying the potential field. Collision Detection. Given a function g defined as g = f − iso and a point P at two different time steps t and t + 1, the collision detection consists in finding where [Pt , Pt+1 ] intersects the surface f . This is equivalent to finding the first root i0 of g on the interval [Pt , Pt+1 ]. This is achieved using a modified version of the Newton-Raphson algorithm. From i0 and −∇g(i0 ) the surface gradient at i0 , we can compute a linear approximation of the surface. This approximation defines the tangent plane −∇g(i0 ) × P = i0 at i0 which parameters are used by our contact algorithm. Since this tangent plane is only a valid approximation of the surface around i0 , the correction xc might not be on the actual surface (see section 2.2). Therefore we need to update the tangent plane based on the corrected position x = xf + xc . This is done by estimating the gradient ∇g at x, which provides the direction to reach the surface with a minimal distance. This gives us a new point ik = x + ν∇g, where ν is a scalar value. This new point and its normal are used as an updated linear approximation of the surface. From a mathematical point of view, such updates of the contact point and its normal is close to the secant method algorithm (because the gradient is evaluated using finite differences) to find the root of a function. Indeed, in convex cases, the distance between xc and the surface decreases through the successive iterations and finally lead to a point of the surface. Such method is proved to converge in convex cases and we use a dedicated strategy to solve the concave cases (see Section ).
3
Results
We have applied our method to a complex simulation: the navigation of a catheter and guidewire inside a reconstructed vascular network to perform a
856
J. Dequidt, J. Lenoir, and S. Cotin
Fig. 1. Simulation of catheter and guidewire navigation from the aorta to the common carotid artery (left). Simulation of coil deployment inside an aneurysm (right). Contact locations and the corresponding contact planes (in white) are shown.
virtual angiography and the deployment of a coil inside an aneurysm to perform an embolization. The catheter, guidewire and coil models consist of a series of non-linear deformable beam elements. Each device is composed from 100 to 200 beam elements while the vascular network is constituted of more than 4, 000 vessels and undergoes periodic deformations due to both cardiac and respiratory motions. When entering the cerebrovascular system, where the diameter of the vessels is very small, the catheter or guidewire are constantly colliding and sliding along the vessels wall (see Figure 1). Similarly, when deployed within an aneurysm, the coil becomes highly constrained, and proper contact modeling becomes of prime importance to guarantee a correct behavior during the simulation (see Figure 1). We have performed a series of simulations on a Dual Core processor machine with 2 GB of memory and obtained real-time computation rates (25 Hz). These timings include the computation and inversion of the system stiffness matrix K at each time step, as well as collision detection and collision response. Since the contacts are solved in the contact space, the size of the system is the number c of contact (defining n as the number of DOFs, c ≤ n and usually c n). It is also important to mention that, in order to enforce the convergence and stability of the contact algorithm, we use a subdivision strategy where each time step is subdivided into a variable number of sub-steps. The initial time step is subdivided if not all contacts have been solved after N iterations of the main loop of the algorithm (see section 2.2). This subdivision strategy allows us to solve complex contact configurations and to handle concave cases has a succession of convex cases.
Interactive Contacts Resolution Using Smooth Surface Representation
4
857
Conclusion
In this paper we have proposed an efficient method for solving complex contacts between various types of physics-based objects, in particular deformable structures. The proposed algorithm is accurate since the exact forces required to solve the contacts are computed and coupling between contacts is taken into account. Computational efficiency is achieved by solving contacts in the contact space to reduce the size of the system of equations. The approach uses an iterative scheme to solve all contacts at each time step and a subdivision strategy insures the robustness of the algorithm even in the case of complex contact configurations.
References 1. Cotin, S., Delingette, H., Ayache, N.: Real-time elastic deformations of soft tissues for surgery simulation. IEEE Transactions on Visualization and Computer Graphics 5(1), 62–73 (1999) 2. Picinbono, G., Delingette, H., Ayache, N.: Real-time large displacement elasticity for surgery simulation: Non-linear tensor-mass model. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 643–652. Springer, Heidelberg (2000) 3. Mosegaard, J., Herborg, P., Sørensen, T.: A gpu accelerated spring-mass system for surgical simulation. In: MMVR. Proceedings of the 13th Medicine Meets Virtual Reality conference, pp. 342–348 (2005) 4. Miller, K., amd Dane Lance, G.J., Wittek, A.: Total lagrangian explicit dynamics finite element algorithm for computing soft tissue deformation. Communications in Numerical Methods in Engineering 23(2), 121–134 (2007) 5. Chou, W., Wang, T.: Human-computer interactive simulation for the training of minimally invasive neurosurgery. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1110–1115. IEEE Computer Society Press, Los Alamitos (2003) 6. Kry, P.G., Pai, D.K.: Continuous contact simulation for smooth surfaces. ACM Transactions on Graphics 22(1), 106–129 (2003) 7. Garcia-Perez, V., Munoz-Moreno, E., de Luis-Garcia, R., Alberola-Lopez, C.: A 3d collision handling algorithm for surgery simulation based on feedback fuzzy logic. In: International Conference on Information Technology in Biomedicine (2006) 8. Cotin, S., Duriez, C., Lenoir, J., Neumann, P., Dawson, S.: New approaches to catheter navigation for interventional radiology simulation. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 534–542. Springer, Heidelberg (2005) 9. Renouf, M., Acary, V.: Comparison and coupling of algorithms for collisions, contact and friction in rigid multi-body simulations. In: Proceedings of ECCM - Solids, Structures and Coupled Problems in Engineering, Lisbon, Portugal (2006) 10. Le Garrec, J., Andriot, C., Merlhiot, X., Bidaud, P.: Virtual grasping of deformable objects with exact contact friction in real time. In: Int. Conf. in Central Europe on Computer Graphics, Visualization and Computer Vision, pp. 87–92 (2006) 11. Duriez, C., Andriot, C., Kheddar, A.: Signorini’s contact model for deformable objects in haptic simulations. In: International Conference on Intelligent Robots and Systems, IROS, IEEE/RSJ, pp. 3232–3237 (2004) 12. Lenoir, J., Fonteneau, S.: Mixing deformable and rigid-body mechanics simulation. In: Computer Graphics International, Hersonissos, Crete - Greece, pp. 327–334 (2004)
Using Statistical Shape Analysis for the Determination of Uterine Deformation States During Hydrometra M. Harders and G. Sz´ekely Virtual Reality in Medicine Group Computer Vision Lab, ETH Zurich CH-8092 Z¨ urich, Switzerland {mharders, szekely}@vision.ee.ethz.ch
Abstract. A fundamental prerequisite of hysteroscopy is the proper distension of the uterine cavity with a fluid, also known as hydrometra. For a virtual reality based simulation of hysteroscopy, the uterus deformation process due to different pressure settings has to be modeled. In previous work we have introduced a hybrid method, which relies on precomputed deformation states to derive the hydrometra changes during runtime. However, new offline computations were necessary for every newly introduced organ mesh. This is not viable if a new surgical scene is to be generated for every training session. Therefore, we include the deformation states during hydrometra into our previously developed statistical shape model for undeformed organ instances. This allows deriving the hydrometra steps together with new undeformed uterus meshes. These can then be used during the interactive simulation for predicting uterus deformation without time-intensive precomputation steps.
1
Introduction
During hysteroscopy [1] – the endoscopic inspection of the uterus – a hydrometra is maintained, i.e. the uterus is distended with liquid media to access the uterine cavity. In- and outflow of the distension fluid is accomplished through the rigid endoscope and controlled via valves, while the pressure of the liquid is provided by a pump. An essential element of the procedure is the selection of correct pressure settings for the hydrometra according to muscle tone and uterine wall thickness [2]. In addition to uterus distension, the fluid circulation also improves visibility during the intervention by reducing obscurations caused by endometrial bleeding, small floating tissue fragments or air bubbles. According to [3], a number of complications can be encountered related to the distension fluid handling, for instance fluid overload due to absorption and intravasation, which can have serious adverse effects, possibly leading to cerebral edema or even death. Correct handling of pressure and flow induced by the liquid is therefore a key skill required from gynecological surgeons, thus necessitating appropriate training approaches. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 858–865, 2007. c Springer-Verlag Berlin Heidelberg 2007
Using Statistical Shape Analysis
859
Fig. 1. Side view of uterus triangle mesh (with transparent outer surface) showing computed hydrometra states at increasing pressure of the distension fluid
Virtual reality based surgical simulation [4] offers a promising complementary teaching tool to today’s educational paradigms. Our current research targets the development of a high fidelity simulator for procedural training of hysteroscopy [5], addressing key elements such as proper cavity inflation and correct management of fluid pressure settings. A core module of the simulation system is the real-time uterine deformation during the varying hydrometra stages. The almost instantaneous response of the uterine cavity to pressure adjustments of the distension fluid exceeds by far the real-time capabilities of most known deformation models. Therefore, we have suggested in [6] a novel approach, which combines accurate Finite Element Method (FEM) computations with Free-Form Deformation (FFD) approaches to achieve physically realistic real-time distension of the uterine cavity. In the proposed method, first offline precomputations are carried out to determine an accurate response of the tissue model to the fluid pressure. Thereafter, these data are used during real-time interaction for model adjustments based on given pressure states. Unfortunately, merging this method with our training scene generation framework [7] posed some problems. The underlying idea of the latter is to prevent the trainee from repeatedly being confronted with the same surgical scene and thus to avoid becoming acquainted with a specific anatomy. Therefore, the generation of a new scenario for each single training session considers the full natural variability of the healthy anatomy. Based on a predefined set of anatomical gynecological measurements, new surface meshes of the uterus can be intuitively derived prior to the training. However, following the previously mentioned strategy, every new geometry would require the offline precomputation of the varying hydrometra states via FEM. This usually requires significant manual processing to setup the new loadcases. Moreover, depending on the mesh resolution, considerable computation times can result. For instance in [8] it is reported that 40 load steps for the inflation of a 56’000 element uterus mesh required about 17 hours of CPU time on a Superdome HP 9000 Enterprise application server (2.2 GFLOPS peak). Finally, it should also be mentioned that appropriate FE software needs to be available to perform the actual computations. In order to avoid all these problems, we propose to extend our statistical shape model of the uterus to also include uterine deformation states during hydrometra. By
860
M. Harders and G. Sz´ekely
Fig. 2. Influence of number of interpolation states on mean (left) and maximum (right) Euclidean error [cm]. Mean and standard deviation for all organ instances at all deformation states are depicted.
extending the parameter vector with the deformed shapes, new instances of the hydrometra steps can be generated directly via the statistical shape model. Related activities have been carried out using statistical models in cardiac motion analysis (e.g. [9]) or longitudinal human bone growth studies (e.g. for mandibles [10]). In [11] a similar technique has been applied to estimate the shape of a prostate phantom deformed during transrectal ultrasound probe insertion. This work focuses, however, on deriving a patient-specific model for intra-operative support and the input data are based only on a single simplified anatomical phantom.
2 2.1
Method Computation of Deformation States
The first step in our approach is the computation of the deformation states. As starting point we use N = 16 triangle meshes of healthy organ geometries (given [1] [1] [1] [M] [M] [M] by polygonal models p ˆi = [xi , yi , zi , . . . , xi , yi , zi ]T , with M = 630 vertices) which were segmented from MRI data obtained in a volunteer study. During the segmentation step correspondences between mesh vertices are implicitly ensured by starting with the same coarse organ mesh for all uteri initialized according to surface landmarks and following a fixed subdivision scheme during segmentation [12], resulting in identical mesh topologies Θ. All shapes were translated and rotated into a common organ coordinate system, however, no normalization was applied in order to also capture organ size differences. Consistent tetrahedral meshes for these surface representations are then automatically generated using the commercial ANSYS ICEM CFD Tetra Mesher tool. Thereafter, FEM computation of the hydrometra states are carried out based on the resulting tetrahedral models. Note that our presented statistical model would work with arbitrary deformation algorithms and that the presented FEM computation is only one possible approach to obtain the hydrometra states. The uterine
Using Statistical Shape Analysis
861
Fig. 3. Shape variability captured by the first eigenmodes, plotted by k → ( kl=1 λl / N−1 l=1 λl (left) and normalized variances of decorrelated shape parameters, plotted by k → λλk1 (right)
soft tissue is represented as a homogeneous, isotropic and nonlinear hyper-elastic material, with the polynomial strain energy function given by 1 W = μ1 (J1 − 3) + μ2 (J1 − 3)3 + κ(J3 − 3)2 2
(1)
where the Ji are the reduced invariants of the right Cauchy-Green deformation tensor, the μi [N/m2 ] are the material parameters and κ [N/m2 ] is the bulk modulus [13]. The material parameters are determined according to the in-vivo tissue aspiration experiment described in [13], while the bulk modulus is set to κ = 107 N/m2 in order to model quasi-incompressible behavior. The validity of this material description has been experimentally shown in [8] by comparison with in-vivo measurements. The vertices around the outer cervical and tubal ostia are fixed to provide appropriate boundary conditions, while the remaining nodes are free to move. Hydrostatic pressure face loads are applied to all surfaces of the uterine cavity. Note that due to the correspondences between the meshes, coherent boundary conditions can be applied to all segmented models. 20 organ deformation states {qji | j ∈ {1, . . . , 20}} are obtained for increasing hydrostatic pressures of discrete steps of 1 kPa up to 20 kPa (about 150 mmHg) for all organ models p ˆ i . These FEM computations are carried out with the commercial package MarcMentatT M . A subset of the resulting hydrometra stages are depicted for one exemplary uterus model in Figure 1. 2.2
Reduction of Parameter Space
As described in [6], the final goal of our endeavor is to import the precomputed deformation states into our hysteroscopy simulator, in order to determine at runtime the shape of the uterine cavity by interpolating between them according to the interactively modified fluid pressure. Qualitative findings of our earlier work indicated that the overall deformation process could be approximated sufficiently well based on a small subset of the computed hydrometra states. In
862
M. Harders and G. Sz´ekely
Fig. 4. The first eigenmode of the extended statistical uterus model captures the overall size of the uterus. Both the undeformed (left) and the √ deformed states (middle: j1 = 10, ¯ + ω λ1 u1 with ω ∈ {−2, 0, 2} (top to right: j2 = 20) are obtained by evaluating p bottom).
order to quantify the influence of the number of deformation states included in the piecewise linear interpolation, we examined the Euclidean error between the vertices of interpolated hydrometra states and the computed deformations. Figure 2 shows mean and standard deviation of the average and maximum errors of all organ instances at all deformation states, where state 0 denotes the undeformed organ mesh. When compared to the mean of the average edge lengths of all meshes of 0.403 cm (σ = 0.0503 cm) and the mean of all minimal edge lengths of 0.0331 cm (σ = 0.0054 cm), using three states already appears to be sufficiently precise. Therefore, in the following only a reduced set of hydrometra states will be taken into account. Nevertheless, the described method is of course also valid for a larger number of deformation states. Finally, it should be noted that slightly larger errors occur in the early stages of the uterus’ extension. Therefore, the interpolation errors could potentially be further decreased by an appropriate selection of less uniformly distributed samples. However, due to the already small error this issue was not further investigated. 2.3
Statistical Model of Hydrometra States
In the next step we combine the vertices of an undeformed uterus shape and the reduced set of hydrometra states into one instance specific shape vector pi = [ˆ pi qji 1 . . . qji s ]T where S = {j1 , j2 , . . . , js } denotes the reduced set of indices of selected deformation states. By computing the instance specific differences from the respective mean shapes we obtain the sM × N matrix ΔP = [Δp1 . . . ΔpN ] with Δpi = pi − p ¯ . The matrix ΔP is rank deficient, since sM N ; only a set
Using Statistical Shape Analysis
863
Fig. 5. The second eigenmode of the extended statistical uterus model represents the change of angle between cervix and fundus. Both the undeformed (left) and the √ de¯ + ω λ2 u2 formed states (middle: j1 = 10, right: j2 = 20) are obtained by evaluating p with ω ∈ {−2, 0, 2} (top to bottom).
of N − 1 eigenvalues Λ of the corresponding covariance matrix will be non-zero. Therefore, we follow the alternative approach discussed in [14] to determine the corresponding set of eigenvectors U, by performing the singular value decomposition of the reduced N × N covariance matrix 1 P CA ˜ ˜ ˜ T ΔPT ΔP = U ΛU . (2) N−1 The sought-after eigenvectors can then be determined according to U = 1 ˜ The scaling factor is necessary to normalize the vectors. ((N − 1)λ)− 2 ΔPU. ˜ As depicted in FigThe corresponding eigenvalues are given directly by Λ = Λ. ure 3, the first four eigenmodes already capture 94% of the overall variation in the test collection. Thus, our set of undeformed and deformed organ triangle ˘ ⊂ U. meshes can be compactly described by a reduced set of eigenvectors U ˘ vertices of new organ instances and their associated hyBy evaluating p ¯ + Ub drometra states can be derived. Due to space limitations, only a subset of the variation of the two most dominant eigenmodes of the considered population are displayed in Figures 4 and 5. ˜ = Σ
3
Discussion
In order to assess the performance of our approach, we derived new instances ¯ = 11) and the associated hydrometra states by determining of organ shapes (N
864
M. Harders and G. Sz´ekely
Fig. 6. Comparison of automatically derived (middle) and computed (right) hydrometra deformation states for two undeformed uterine meshes (left). The Euclidean error between the meshes is shown color-coded (red = 0 cm). A maximal error of 0.03996 cm and 0.04827 cm results.
√ p ¯ + 4k=1 ωk λk uk with weighting parameter −2 ≤ ωk ≤ 2. The latter are then compared to FEM deformation calculations based on the undeformed meshes. A mean maximal vertex distance error of 0.0353 cm (σ = 0.0081 cm) results, which is on the order of the minimal edge lengths in the meshes. Figure 6 shows two examples of uterine shapes for which the maximum hydrometra distension is derived with our method and compared to a computed deformation state. The maximum error is 0.03996 cm and 0.04827 cm, respectively. It should be mentioned that the proposed approach based on precomputation of deformation states is only valid, if the boundary conditions do not considerably change during the intervention. This is usually the case in hysteroscopic procedures, since the surrounding tissues are not directly accessible. Moreover, extreme modifications of the cavity are not possible in our system, since cutting into the myometrium is limited to a thin layer during endometrial ablation. Therefore, we can assume, that the uterus’ response remains relatively constant throughout the intervention. The piecewise linear interpolation allows to update large meshes consisting of more than 50’000 tetrahedra in real-time.
4
Conclusion and Future Work
We have presented an extension to our statistical shape model of the uterus to include uterine deformation states during hydrometra. By extending the training set with the deformed shapes, new instances of the hydrometra states can be generated directly by the statistical shape model. This avoids explicit time-intensive
Using Statistical Shape Analysis
865
precomputations for newly derived organ instances. Our validation study demonstrated that the deformations predicted by the extended statistical model are in good agreement with the results of nonlinear FEM calculations. The behavior of the uterus under changing fluid pressure settings can be well approximated by linear interpolation in real-time, even for large meshes.
Acknowledgment This work has been performed within the frame of the Swiss National Center of Competence in Research on Computer Aided and Image Guided Medical Interventions (NCCR CO-ME) supported by the Swiss National Science Foundation.
References 1. Bulletin, A.T.: Hysteroscopy. Int. J. Gyn. Obstet. 45(2), 175–180 (1994) 2. Petrozza, J.: Hysteroscopy (2004), http://www.emedicine.com/med/topic3314.html 3. Mencaglia, L., Hamou, E.: Manual of gynecological hysteroscopy - diagnosis and surgery. Endo-Press, Germany (2001) 4. Basdogan, C., Sedef, M., Harders, M., Wesarg, S.: Virtual reality supported simulators for training in minimally invasive surgery. IEEE Computer Graphics and Applications 27, 54–66 (2007) 5. Harders, M., Bajka, M., Spaelter, U., Tuchschmid, S., Szekely, G.: Highly-realistic, immersive training environment for hysteroscopy. In: Proc. of Medicine Meets Virtual Reality, pp. 176–181 (2006) 6. Sierra, R., Z´ atonyi, J., Bajka, M., Sz´ekely, G., Harders, M.: Hydrometra simulation for vr-based hysteroscopy training. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 575–582. Springer, Heidelberg (2005) 7. Sierra, R., Bajka, M., Karadogan, C., Sz´ekely, G., Harders, M.: Coherent scene generation for surgical simulators. In: Cotin, S., Metaxas, D.N. (eds.) ISMS 2004. LNCS, vol. 3078, pp. 221–229. Springer, Heidelberg (2004) 8. Weiss, S., Bajka, M., Nava, A., Mazza, E., Niederer, P.: A finite element model for the simulation of hydrometra. Technology and Health Care 12(3), 259–267 (2004) 9. Chandrashekara, R., Rao, A., Sanchez-Ortiz, G., Mohiaddin, R., Rueckert, D.: Construction of a statistical model for cardiac motion analysis using nonrigid image registration. Inf. Process Med. Imaging, 599–610 (2003) 10. Andresen, P., Bookstein, F., Conradsen, K., Kreiborg, S.: Surface-bounded growth modeling human mandibles. IEEE Transactions on Medical Imaging 19 (2000) 11. Mohamed, A., Davatzikos, C., Taylor, R.: A combined statistical and biomechanical model for estimation of intra-operative prostate deformation. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 452–460. Springer, Heidelberg (2002) 12. Sierra, R., Zsemlye, G., Szekely, G., Bajka, M.: Generation of variable anatomical models for surgical training simulators. Medical Image Analysis 10(2), 275–285 (2006) 13. Kauer, M., Vuskovic, V., Dual, J., Szekely, G., Bajka, M.: Inverse finite element characterization of soft tissues. Medical Image Analysis 6(3), 275–287 (2002) 14. Cootes, T., Cooper, D., Taylor, C., Graham, J.: Active shape models - their training and application. Computer Vision and Image Understanding 61(1), 38–59 (1995)
Predictive K-PLSR Myocardial Contractility Modeling with Phase Contrast MR Velocity Mapping Su-Lin Lee, Qian Wu, Andrew Huntbatch, and Guang-Zhong Yang Institute of Biomedical Engineering, Imperial College London, UK {su-lin.lee,q.wu,a.huntbatch,g.z.yang}@imperial.ac.uk
Abstract. With the increasing versatility of CMR, further understanding of intrinsic contractility of the myocardium can be achieved by performing subject-specific modeling by integrating structural and functional information available. The recent introduction of the virtual tagging framework allows for visualization of the localized deformation of the myocardium based on phase contrast myocardial velocity mapping. The purpose of this study is to examine the use of a non-linear, Kernel-Partial Least Squares Regression (K-PLSR) predictive motion modeling scheme for the virtual tagging framework. The method allows for the derivation of a compact non-linear deformation model such that the entire deformation field can be predicted by a limited number of control points. When applied to virtual tagging, the technique can be used to predictively guide the mesh refinement based on the motion of the coarse grid, thus greatly reducing the search space and increasing the convergence speed of the algorithm. The effectiveness and numerical accuracy of the proposed technique are assessed with both numerically simulated data sets and in vivo phase contrast CMR velocity mapping from a group of 7 subjects. The technique presented has a distinct advantage over the conventional mesh refinement scheme and brings CMR myocardial contractility analysis closer to routine clinical practice.
1 Introduction Heart failure due to coronary artery disease has considerable morbidity and poor prognosis. An understanding of the underlying mechanics governing myocardial contraction is a prerequisite for interpreting and predicting changes induced by heart disease. Gross changes in contractile behavior of the myocardium are readily detected with existing techniques. For more subtle changes during early stages of cardiac dysfunction, however, it requires a sensitive method for measuring, as well as a precise criterion for quantifying, normal and impaired myocardial function. For this purpose, Cardiovascular Magnetic Resonance (CMR) imaging has taken a key role in investigating regional contractile function as it provides a non-invasive and relatively easy way of assessing the intramural motion of the myocardium. Thus far, the main techniques of CMR in measuring myocardial contraction include tagging, displacement encoding (DENSE), and myocardial velocity mapping. In vivo studies using tagging and velocity mapping have demonstrated the value of N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 866–873, 2007. © Springer-Verlag Berlin Heidelberg 2007
Predictive K-PLSR Myocardial Contractility Modeling
867
CMR in illustrating the direct link between diseases such as ischemic heart disease and cardiomyopathies with intrinsic myocardial contractility. Diffusion tensor imaging augmented with strain rate from velocity mapping has also been shown to play an important role in elucidating the relationship between fiber orientation and fiber shortening. With the increasing versatility of CMR, a natural step towards the further understanding of intrinsic myocardial contractility is to perform subject-specific biomechanical modeling by effectively integrating the structural and functional information available. To this end, 3D modeling of the heart with the use of different deformation tracking methods has been investigated. Currently, most of the work on tracking motion of the heart has incorporated MR tagging [1]. Although tagging has been shown to be a reliable means of determining the intramural motion of the myocardium, the issue of tag fading and the complicated post-processing steps involved in deriving strain distribution have imposed significant limitations on the technique. There has also been much work on biomechanical modeling of the heart, incorporating parameters such as material properties and fiber orientation. Biomechanical modeling, however, requires extensive a priori data that is difficult to obtain for each subject. It is usual for a general model of the heart to be developed which is then mapped to each subject. The alternative of myocardial velocity imaging has traditionally been riddled with problems such as blood flow artifact, limitations on velocity sensitivity, and low SNR. However, recent work on improved pulse sequence design with blood saturation to limit the blood flow artifact has shown great promise of the technique in providing reliable, detailed myocardial contractility information that can be directly used for cardiac modeling [2]. Previous research has shown that it is possible to use a virtual tagging framework to derive strain distribution from the myocardial velocity data [3]. The concept of virtual tagging is based on the superimposition of an artificial tag pattern onto the CMR velocity data and observing its subsequent deformation. At a given interval of the cardiac cycle, if the deformation of the virtual tags reflects the true motion of the myocardium, the associated displacement, and hence the velocity distribution, should be identical to the directly measured CMR velocity data in a leastmean-squares sense. With this framework, the deformation of the myocardium is controlled by the underlying velocity vectors, and therefore it removes the need for a priori information for myocardial contractility analysis. While the technique is promising, it is computationally demanding. The purpose of this study is to examine the use of a non-linear, Kernel-Partial Least Squares Regression (K-PLSR) predictive motion modeling scheme for the virtual tagging framework. The basic idea of the technique is to use K-PLSR to derive a compact non-linear deformation model such that the entire deformation field can be predicted by a limited number of control points. When applied to virtual tagging based on myocardial velocity mapping, the technique can be used to predictively guide the mesh refinement based on the motion of the coarse grid, thus greatly reducing the search space and increasing the convergence speed of the algorithm. For assessing the effectiveness and numerical accuracy of the proposed technique, both numerically simulated data sets and in vivo phase contrast CMR velocity mapping of the left ventricle are used.
868
S.-L. Lee et al.
2 Methods 2.1 K-PLSR Predictive Model for Virtual Tagging The Virtual Tagging framework [3] combines the advantages of both the MR tagging and velocity mapping techniques. With this technique, a superimposition of an artificial tag pattern onto the CMR velocity data is the basis of this framework; the tag pattern can take any desired configuration. In this work, the 3D tag configuration is defined by the subdivision solids volume model. The deformation of the virtual tags is controlled by the underlying velocity data and provides easy visualization and direct assessment of myocardial motion. During optimization, the objective function directly minimizes the difference between the velocity vectors and the application of a mass conservation constraint ensures physically meaningful results. This results in the following cost function: G 2 E = ∑ Δvi Δt 2 + λ ∑ ξi − 1 S i (1) i
i
G where vi is the difference between the measured velocity and the virtual tagging simulation velocity, Δt is the time interval between consecutive timeframes, ξi is the ratio of volumes between the measured and virtual tagging simulation elements, and S i is the surface area of each element. In Eq. (1), λ is a weighting factor that controls how much the virtual tags deform, causing either the velocity error measure or the volume control to have a more significant impact on the deformation. One of the significant problems of virtual tagging is its computational complexity. To circumvent this problem, predictive motion modeling based on non-linear regression is applied. Regression analysis is a statistical tool that examines the dependence of a number of response variables on a number of predictors. K-PLSR is a technique to construct a nonlinear regression model in high dimensional feature spaces. This technique is an extension of PLSR, which can extract correlation between input and output data that is highly collinear – making it ideal for problems inappropriate for multi-linear or principal components regression. In PLSR, a simultaneous decomposition of X and Y , the predictors and response variables respectively, are performed, i.e., X = TPT + E Y = UQT + F
(2)
where T and U are extracted score vectors, P and Q are loading matrices, and E and F are residuals, to give the maximal covariance of X and Y . The response variables are predicted as:
ˆ = TDQT Y
(3)
where D is a diagonal matrix containing the regression weights. For describing non-linear motion models, K-PLSR can be used, with which the input variables are first mapped by a function Φ to a feature space where a linear PLSR is performed [4]. In the original space, this results in a nonlinear regression
Predictive K-PLSR Myocardial Contractility Modeling
869
model. In this study, as we are dealing with more observed variables than measured objects, the kernel Gram matrix K = ΦΦT between all mapped input data N points {Φ(xi )}i= 1 is calculated as:
⎡ K (x 1, x 1 ) K (x 1, x 2 ) ⎢ ⎢ ⎢ K (x 2 , x 1 ) K (x 2 , x 2 ) K = ⎢⎢ # # ⎢ ⎢ ⎢K (x N , x 1 ) K (x N , x 2 ) ⎣⎢
" K (x 1 , x N ) ⎤ ⎥ ⎥ " K (x 2 , x N ) ⎥ ⎥ ⎥ % # ⎥ ⎥ " K (x N , x N )⎥⎥ ⎦
(4)
where each element of the matrix is a calculation of a kernel function K (x 1 , x 2 ) based on Mercer’s theorem [5]. For this application, a Gaussian kernel
K (x , y ) = e
⎛ x − y 2 ⎞⎟ ⎜ ⎟⎟ − ⎜⎜ ⎟ ⎜⎜⎝ d ⎠⎟
(5)
is used, where d is the width of the Gaussian function. While it is possible to use a single value of d for each left ventricle model, we examine the distribution of d across the left ventricle for optimal prediction. Predictions are made by:
ˆ = F B = K U (TT KU)-1 TT Y Y t t t
(6)
where Kt is the “test” Gram matrix whose elements are calculated by both the testing and training points. A modification of the NIPALS algorithm is used to calculate the K-PLSR [6]. In our application, X and Y are the coarse mesh points and fine mesh points, respectively, describing the myocardium of the left ventricle. 2.2 Data Acquisition and Model Creation For validating the proposed predictive modeling scheme, short-axis velocity mapping images of the heart were acquired from 7 normal subjects using a gradient-echo phase-contrast protocol (TR = 53ms, TE = 7.1ms, in-plane pixel resolution = 1.17×1.17mm, FOV = 30×30cm, VENC = -15 to +15 cm/s) on a 1.5T Siemens Sonata MRI scanner. The sequence consisted of specially designed black-blood RF pulse being applied every other time frame followed by the imaging pulse. It also incorporated a k-space view-sharing scheme to reduce the total scan time needed, hence allowing one reference image and three orthogonally encoded velocity images to be acquired. Diaphragmatic navigator echoes were implemented which permits free-breathing data acquisition and ensures geometrical and functional consistency of the 3D cine myocardial velocity data. A total of 12 to 14 short axis slices were obtained for each subject with 10 to 17 timeframes spanning the entire cardiac cycle. Both a rigid body motion correction [7] and a Total Variational (TV) restoration technique [5] were applied to restore the data. The TV technique optimizes the variance of the image and constrains the total amount of variance removed from the image to be equal to the estimated variance of the noise component.
870
S.-L. Lee et al.
For the 7 subjects studied, the epicardial and endocardial borders of the left ventricle were semi-automatically segmented using CMRtools (CVIS, London, UK) from the magnitude images, from which the corresponding velocity maps were extracted. The resultant surface mesh was used to build a volumetric model of hexahedral elements. To generate elements of sufficient size for a smoother mesh, the elements were divided using subdivision solids [8], an extension of the subdivision surfaces refinement rules developed by Catmull and Clark [9]. During this division, each hexahedral element was subdivided into a further eight elements. Under the existing virtual tagging framework, the refinement of the mesh is usually performed by coarse-to-fine mesh propagation. Linear interpolation is first used to define the initial position of the finer mesh control points, followed by locally searching for the optimal mesh control points by minimizing Eq. (1). In the presence of large, nonlinear deformation, the initial control points defined by interpolation are usually far away from the desired location, leading to poor convergence of the optimization process due to local minima. With the proposed K-PLSR predictive motion modeling, the refined mesh location is much closer to the desired configuration as dictated by the underlying myocardial velocity data, and thus the convergence of the algorithm is greatly improved. In order to assess the performance of the proposed algorithm with known ground truth data, numerical simulation representing a twisting cube and an artificial left ventricle was also developed. The simulated left ventricle exhibits the properties inherent in a normal left ventricle with radial thickening and longitudinal shortening. Furthermore, the volume of the simulated set obeys the mass conservation rule, similar to the properties of cardiac muscle. A leave-one-out analysis was performed on the simulated and in vivo left ventricles, with each training set consisting of the entire series save one, and the prediction performed on the left ventricle left out.
3 Results Fig. 1 illustrates the result of applying the proposed K-PLSR motion modeling scheme to the twisting cube data set. During simulation, the coarse mesh points were used as input and fine mesh points as output. It is evident that despite the presence of large non-linear distortion, the proposed K-PLSR method is able to predict the distortions involved. With K-PLSR, the error involved is represented in a false colormap in Fig. 1(b). The average error involved was 3.6×10-5±2.4×10-5 (normalized to the width of the cube). This translated to less than 2% of a voxel width when the object is represented with a 512×512×512 grid. To facilitate the visualization of the error distribution, the color map in Fig. 1(b) was normalized to the scale of the average error, despite the fact that it is negligible. For the simulated left ventricular data set, the graph in Fig. 2(b) shows the effect of the number of latent vectors used for prediction on the simulated left ventricle model. The maximum average error across the cycle is approximately 0.08mm. The same analysis performed with linear PLSR showed a maximum average error of 0.19mm, highlighting the necessity of a nonlinear regression. The simulated data set has a 75mm diameter and is 64mm in length.
Predictive K-PLSR Myocardial Contractility Modeling
871
8.3×10-5
0
Fig. 1. An example of the predictive properties of the proposed K-PLSR modeling technique. From a training set consisting of a series of distorted cubic shapes, predictions were made on intermediate shapes (top). The associated prediction error between the predicted shapes and the ground truth data are shown (bottom). The maximum color scale corresponds to 8.3×10-5 normalized to the width of the cube.
(a)
(b)
Fig. 2. (a) The effect of the number of latent vectors used for the K-PLSR model on the prediction accuracy on the distorted cubic shapes. (b) The mean prediction error on the leaveone-out analysis of the simulated left ventricle versus the number of latent vectors used.
The proposed technique was subsequently applied to the myocardial velocity data for the 7 subjects studied. Fig. 3(top) provides an example of the short axis image at each phase of the cardiac cycle and the corresponding myocardial velocity components along the x, y, and z axes. Fig. 3(middle) shows the virtual tagging deformation of that in vivo left ventricle from diastole to systole, with the longitudinal strain overlaid. In Fig. 3(bottom), plots of the error at each node of the predicted left ventricle shape are shown for the example in vivo set.
872
S.-L. Lee et al. 120ms
212.5ms
352.5ms
487.5ms
577.5ms
0.8
- 0.8
0.8
0
Fig. 3. (top) Short axis slices of the myocardium at five selected phases of the cardiac cycle, and the corresponding myocardial velocity components along the x, y and z directions. (middle) The longitudinal strain as derived from the virtual tagging framework by using the proposed predictive motion modeling scheme, and the corresponding prediction error at each node in mm (bottom). Table 1. The mean, maximum and standard deviation of the prediction error for the 7 subjects studied by using the proposed K-PLSR predictive motion modeling scheme Subject
Mean Error (mm)
Max Error (mm)
Std Error (mm)
1 2 3 4 5 6 7
0.119 0.153 0.127 0.159 0.489 0.346 0.187
0.585 0.748 1.213 0.832 2.493 1.978 1.035
0.071 0.259 0.076 0.006 0.282 0.197 0.110
Finally, Table 1 summarizes the mean, maximum, and standard deviation of the errors of the proposed method to the 7 subjects studied. It is evident that the predictions are close to the ground truth with mean errors across all subjects less than 1mm. Finding a single optimal kernel parameter and the predicted nodes for a single shape takes on average 2 minutes. This time will increase should a distribution of optimal kernel parameters be required but this is still significantly less time than that required by virtual tagging alone.
Predictive K-PLSR Myocardial Contractility Modeling
873
4 Discussions and Conclusions In summary, we have presented a predictive framework for the examination of myocardial contractility by combining the advantages of both K-PLSR and virtual tagging frameworks. The main strengths of the virtual tagging framework remain in the flexibility of the technique in establishing material correspondence of the myocardium across the entire cardiac cycle. The results have shown that the proposed non-linear predictive model is a promising technique for improving the performance of the virtual tagging framework. The use of K-PLSR allows the derivation of a compact non-linear deformation model such that the entire deformation field can be predicted by a limited number of control points. The technique can be used to predictively guide the mesh refinement based on the motion of the coarse grid, thus greatly reducing the search space and increasing the convergence speed of the algorithm. The technique presented has a distinct advantage over the conventional mesh refinement scheme and brings the idea of CMR myocardial contractility analysis closer to routine clinical practice. Acknowledgments. The authors would like to thank Karim Lekadir for discussions on statistical modeling and Prof. David Firmin for his help with image acquisition.
References [1] Axel, L., Dougherty, L.: MR imaging of motion with spatial modulation of magnetization. Radiology 171(3), 841–845 (1989) [2] Jung, B., Schneider, B., Markl, M., et al.: Measurement of left ventricular velocities: phase contrast MRI velocity mapping versus tissue-doppler-ultrasound in healthy volunteers. J. Cardiovasc. Magn. Reson. 6(4), 777–783 (2004) [3] Masood, S., Gao, J., Yang, G.-Z.: Virtual tagging: numerical considerations and phantom validation. IEEE Transactions on Medical Imaging 21(9), 1123–1131 (2002) [4] Ablitt, N.A., Jianxin, G., Keegan, J., et al.: Predictive cardiac motion modeling and correction with partial least squares regression. IEEE Transactions on Medical Imaging 23(10), 1315–1324 (2004) [5] Ng, Y.-H.P., Yang, G.-Z.: Vector-valued image restoration with applications to magnetic resonance velocity imaging. Journal of WSCG 11(1) (2003) [6] Rosipal, R., Trejo, L.J.: Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. Journal of Machine Learning Research 2, 97–123 (2002) [7] Arai, A.E., Gaitha III, C.C., Epstein, F.H., et al.: Myocardial velocity gradient imaging by phase contrast MRI with application to regional function in myocardial ischemia. Magnetic Resonance in Medicine 42(1), 98–109 (1999) [8] MacCracken, R., Joy, K.I.: Free-Form Deformations With Lattices of Arbitrary Topology. In: the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 181–188 (1996) [9] Catmull, E., Clark, J.: Recursively generated B-spline surfaces on arbitrary topological meshes. Computer-Aided Design 10(6), 350–355 (1978)
A Coupled Finite Element Model of Tumor Growth and Vascularization Bryn A. Lloyd, Dominik Szczerba, and G´abor Sz´ekely Computer Vision Laboratory, ETH Z¨ urich, Switzerland {blloyd, domi, szekely}@vision.ee.ethz.ch
Abstract. We present a model of solid tumor growth which can account for several stages of tumorigenesis, from the early avascular phase to the angiogenesis driven proliferation. The model combines several previously identified components in a consistent framework, including neoplastic tissue growth, blood and oxygen transport, and angiogenic sprouting. First experiments with the framework and comparisons with observations made on solid tumors in vivo illustrate the plausibility of the approach. Explanations of several experimental observations are naturally provided by the model. To the best of our knowledge this is the first report of a model coupling tumor growth and angiogenesis.
1
Introduction
Modeling of tumor development including angiogenesis is of great clinical relevance, since it potentially can increase our understanding of the processes taking place and eventually allow predictions on reactions to external influences such as anti-angiogenic drugs. We present a new, consistent framework to simulate solid tumor growth from the early avascular phase to the later angiogenesis driven uncontrolled growth. First results demonstrate the plausibility of the model and its capability to account for various behaviors observed in real tumors. The literature on cancer is enormous and we can only give a brief overview on the subject here. Tumor growth is characterized by a dangerous change in the control mechanisms, which normally maintain a balance between the rate of proliferation and the rate of apoptosis (controlled cell death). It is commonly believed that at the beginning of tumor development, the transformed cells proliferate and form a cluster which gradually increases in size. In this early phase, tumors growing in vivo tend to develop without a dedicated vascular network, relying on diffusion from the neighboring healthy vascularized tissue for the supply of oxygen and nutrients and the removal of wastes (e.g. CO2 ). The diffusion principle is very inefficient though and eventually is not sufficient to support the developing neoplasm. Probably because of low oxygen tension the tumor cells and their neighbors start producing various signals, which cause a cascade of growth factors, so-called tumor angiogenesis growth factors. This triggers endothelial cells to multiply and migrate towards the tumor, thereby creating a specialized vascular network, which is able to provide the tumor with oxygen N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 874–881, 2007. c Springer-Verlag Berlin Heidelberg 2007
A Coupled Finite Element Model of Tumor Growth and Vascularization
875
and nutrients, and remove wastes. This formation of a vascular system is called tumor-induced angiogenesis. The biological background on the transport mechanisms involved in solid tumor growth and their influence on the effectiveness on drug delivery is given by Jang et al. [1]. A good review of the role of angiogenesis in general is given by Carmeliet and Jain [2]. In this paper we present a first attempt to create a model of a growing tumor. After giving an overview of previous approaches to simulate tumor development, we present our own simulation framework, which includes several major biophysical and bio-chemical processes in a completely coupled fashion.
2
Tumor Model Literature Overview
Methods to simulate tumor development have been extensively studied for the last three decades and a detailed overview of all the methods proposed so far is beyond the scope of this paper. A review of different models was presented by Araujo and McElwain [3]. We selectively group the most popular methods into mathematical models, cellular automata and finite element methods. Diffusion models usually simulate the movement of tumor growth factors and nutrients by a diffusion process. An early example of a spherical tumor growth model based on the diffusion principle was proposed by Greenspan in 1976 [4]. This preliminary work addresses the very first phases of tumor growth, the socalled multi-cell spheroid stage, controlled by the availability of diffusing nutrients. More recent reports have associated diffusion models with the spread of infiltrating cells in metastatic tumors [5,6]. Cellular automata have often been applied to model cell dynamics and also tumor development as a stochastic process. A very recent hybrid approach is described by Mallet and Pillis [7]. The authors present a model using cellular automata and partial differential equations to describe the interactions between a growing tumor next to a nutrient source and the immune system of the host organism. In general, such models are very interesting to study cellular metabolism and the temporal and spatial dynamics of tumor development. Typically, however, they do not address oxygen and nutrients delivery by means of explicitly modeled vascular systems, instead treating them in a rudimentary, implicit manner as (eventually evolving) concentration maps. Additionally, the important mechanical interactions between the tumor and healthy tissue are not included. A bio-mechanical tissue growth model was employed by Davatzikos and coworkers [8,9] using methods originating from structure mechanics. The growth was prescribed as a local expansion, mathematically resembling linear expansion due to temperature increase. Clatz et al. [10] extended this approach by a reaction-diffusion equation regulating cell movement, in order to simulate the mass-effect. A general treatment of elastic growth is given in [11]. In terms of tumor growth, the pre-strain should be related to the actual cell activity, e.g. the average number of inert, necrotic, apoptotic and proliferative cells, depending on various factors, such as the availability of oxygen and nutrients, etc. Although Wasserman et al. [12] proposed to explicitly address these dependencies of the
876
B.A. Lloyd, D. Szczerba, and G. Sz´ekely
mitotic rate in a unified framework, they eventually used a constant rate in their simulations, i.e. the growth rate was not coupled to any other processes. Non-tumor and tumor-induced angiogenesis has been modeled by many authors. Apart from the above mentioned implicit representations (density maps), there have been several attempts to treat the evolving vascular systems explicitly. A group of approaches describe the formation of blood vessels by sprouting angiogenesis in response to chemical stimuli (e.g. [13,14]). In Plank and Sleeman [15] a non-lattice model was proposed to simulate sprouting angiogenesis. Compared to previous strategies, which restricted endothelial cell movement to the four directions of a Cartesian grid, they allow continuous directions, while still restricting their analysis to very simple domains (squares). In [16] we extended this approach to a 3D finite element model, embedded in a domain represented by a tetrahedral mesh. Although several models of tumor-induced angiogenesis exist, there are only few papers, which model the coupling between angiogenesis and the growing tissue in both directions. The very recent work by Mackling et al. [17] is an exception in this respect, as it aims to address the coupling between the vascular blood supply and growth. Like its predecessors (e.g. [5]) it does not, however, include any bio-mechanical aspects and treats the vasculature implicitly as an evolving density map. Many of the previously published models only integrate one specific aspect of tumor growth, e.g. pure expansion or angiogenesis in a static tissue domain with constant (non-evolving) distributions of growth factors. This paradigm should, however, be re-considered, since various phenomena at different scales are coupled and should be combined into a complete model framework.
3
Model Description
In this section we introduce the components of our simulation framework and explain how they are integrated into a consistent description of the involved biological phenomena. Our modeling starts at an early phase, representing the multi-cell spheroid stage. To initialize the pathology growth inside the hosting tissue, we place a small avascular ball inside a larger domain representing the healthy tissue. The domain is meshed using NETGEN [18], while maintaining an interface separating the pathology from the healthy tissue. The following bio-physical and bio-chemical principles are included in our model: 1. Tissue grows as long as it receives sufficient oxygen, which is consumed continuously. 2. Oxygen diffuses from the healthy tissue, or from an explicitly modeled vessel into the tumor. 3. Low oxygen level (hypoxia) causes the production of tumor angiogenesis growth factors (TAF). 4. Endothelial cells proliferate and cause capillary sprouting (angiogenesis), in the direction of higher TAF concentration. This in turn improves the nutrient and oxygen supply to the tumor cell cluster.
A Coupled Finite Element Model of Tumor Growth and Vascularization
877
The time scales at which the diffusion of molecules takes place (ns-s) and at which the cells proliferate (days) allows us to treat the processes in a quasistatic manner, i.e. at a given time step we compute a concentration map of oxygen given the sources and consumption by the tissue based on the actual tumor and vascular geometry. In this first version of our tumor growth framework we are using a linear elasto-plastic expansion model. The growth process is relatively slow, therefore, we assume that over time the neighboring tissue will accommodate for the stretching by increased growth or cell migration. The expansive growth is prescribed by the pre-strain term 0 . In our implementation, the deformation in one iteration is independent of the stresses computed in the previous iteration. This constitutes an elasto-plastic growth law. In each simulation step we compute the new deformation caused by a volume expansion in the tumor domain using the finite element method (FEM). In a first approximation the volume expansion due to increased proliferation is t+Δt 0 (t) =
t
N
∂N ∂t dt
N (t + Δt, x, θ) − N (t, x, θ) ∼ = N (t, x, θ)
(1)
where N (t, x, θ) is the number of cells in a finite element at time t and position x. For a constant growth rate the strain is Δt T2 , where T2 is the cell population doubling time. The proliferation rate and, therefore, the number of cells depends on certain environmental factors θ such as oxygen availability. Based on descriptions in the literature [5] we have defined this dependency with the piece-wise linear function for 0 = Δt T2 f (PO2 (t)), which depends on the oxygen partial pressure at time t. The function f (PO2 ) is depicted in Fig. 2 and relates the deviation from normal oxygen pressure ΔPO2 to the amount of growth. For values above the threshold h1 the cells display a high proliferative potential, while this decreases with the availability of oxygen. Between h1 and h2 there is no growth since most of the cells are quiescent. Finally, for PO2 deviations below h2 the cells cannot survive and become necrotic. Since the actual growth will also depend on the stress exerted by the non-proliferating surrounding tissue as it is deformed, the actual strain in the tumor will be smaller than the prescribed strain. Oxygen originates from the blood which is transported by the vascular system. Our previous work on tumor-induced angiogenesis [16] described a method to compute the pressures and flow in a functional vessel network. We follow [16] in computing the pressures at the nodes connecting separate vessel elements (pipes) with Hagen-Poiseulle’s law. The pressures can in turn be used to compute other relevant properties, such as the flow. In order to calculate the oxygen delivered by the vessels we need to accommodate the transport through the vessel wall, which depends on the transmural pressure and vessel wall thickness. By integrating over the surface of the vessels inside each tetrahedron of the tissue domain, we can estimate the total amount of oxygen delivered by the explicit vessel network. Since we are mainly interested in the development of the tumor and not the healthy tissue, we have chosen to use a dual representation; on the one hand explicitly modeled vessels growing into the tumor, and on the other a postulated
878
B.A. Lloyd, D. Szczerba, and G. Sz´ekely
homogeneous source of oxygen from an implicit pre-existing regular vascular system in the healthy tissue. In the implicit representation oxygen diffuses into the tumor from the boundary. The problem can be stated as a reaction diffusion equation with constant boundary conditions at the tumor surface due to the homogeneous source of oxygen in the healthy tissue ∂c = DO2 ∇2 c + RO2 , ∂t
(2)
where c is the oxygen partial pressure. The diffusion constant of oxygen in tissue 2 is typically estimated at around DO2 = 1.5 · 10−5 cm [19], while the reaction s rate RO2 depends on the tissue type and can vary depending on the metabolic 3 O2 activity. Typical values here are approximately 1.9 · 10−4 cm cm3 s [19]. The oxygen delivered by these two sources are added to obtain the total oxygen supply in the tumor and the neighboring tissue. If oxygen levels are below a threshold, angiogenesis growth factors are produced by the affected cells. A prominent group of angiogenesis growth factors are the vascular endothelial growth factors (VEGF). Currently we treat the multitude of angiogenesis growth factors as a single generic growth factor. We set the threshold below which the tissue starts producing TAF to an oxygen partial pressure of h2 (see above). Again, a diffusion equation similar to the one above is solved, with the hypoxic cell clusters (finite elements) as sources of TAF which diffuses into the neighboring tissue. For the TAF diffusion constant we use the 2 estimated value for VEGF (DT AF = 1.0 · 10−6 cm s [20]). Based on this quasi-stationary distribution of TAF we model the formation of capillaries by endothelial cell proliferation and migration as suggested in our previous work [16]. In this method, the growing vessel tree is treated explicitly by defining the dynamics of the vessel tips, which move as a consequence of diffusive (random) motility, directed motility along the gradient of the growth factors (convection) and inertial motility reducing the tendency to rapidly change direction. This model can account for the sprouting form of angiogenesis and is easily included into our framework. Specifically, it generates a network of pipes, necessary for the computation of the oxygen supply. Currently, we do not address the wall re-modeling in capillaries, e.g. due to shear stress.
4
Results
In this section we present an exemplary simulation of a developing solid tumor. Starting from a small avascular tumor of 0.8mm diameter, the simulation progresses in relatively small time steps (Δt/T2 = 0.09). Figure 1 captures the morphology of the tumor, its neo-vascularization and some growth relevant physiological factors. Figure 1 a) depicts a slice through the tumor. The endothelial cell density, computed as the surface area of the vessel system in each tetrahedron element is shown. The TAF concentration shown in Fig. 1 b) is high in tumor regions, where the vasculature has not yet developed sufficiently. Finally, the vasculature in this time step is shown in Fig. 1 c). It consists of more than
A Coupled Finite Element Model of Tumor Growth and Vascularization
(a)
(b)
879
(c)
Fig. 1. Intermediate stage of a developing solid tumor. a) depicts the endothelial cell density, b) shows the TAF distribution. In c) the explicit vessel system is displayed.
100’000 vessel segments. Additional images and animations of an in silico growing tumor can be found at: https://www.vision.ee.ethz.ch/∼blloyd/miccai07
5
Discussion
We have investigated the dependency of the developing tumor on individual components of the framework. The growth of a tumor has been simulated with angiogenesis (the full model) and without angiogenesis, relying solely on diffusion. In the beginning it can be observed that both models grow approximately equally fast. However, when the critical mass is reached, the full model continues to grow, whereas the avascular model finally reaches a plateau. The volumetric growth of the diffusion model is very similar to a Gompertz growth curve [21]. Gompertzian growth has been used to describe the growth rate of solid avascular tumors and has been successfully applied in specific clinical applications [10]. In the second curve depicting the growth of the full model, there is a long phase of exponential growth, with a sudden change at t/T2 ∼ = 4, at which point the 1.5
2
f(Δ PO )
3
volume[cm ]
1
0
1
Gompertz Full model Diffusion model
0.5
a−bx
−e
y(x) = k e
−1 h3
h2 h1 ΔP O
2
(a)
0
0 0
k = 0.78 a = 2.26 b = 0.55
1
2
3
4 time t/T2
5
6
7
8
(b)
Fig. 2. In a) the dependency of expansive growth on oxygen partial pressure is shown. The x-axis measures the deviation of oxygen partial pressure from a normal state. b) depicts the growth of a diffusion dependent tumor and a vascularized tumor (T2 is the doubling time).
880
B.A. Lloyd, D. Szczerba, and G. Sz´ekely
gradient is reduced noticeably, before it again starts to grow exponentially. This effect can occur, if the vessels have grown such that they supply sufficient oxygen for the tissue to support its growth, turning off the angiogenic switch. Suddenly, however, the tumor reaches a size at which hypoxia levels pass a threshold and the angiogenic switch is turned on again. This effect must be further investigated in order to understand its relation to underlying model assumptions. We found close similarities to different morphological observations on leiomyomas, which are the most common type of benign tumors affecting women above age 30. The vascularity generated is highly reminiscent to casts presented by [22]. Because of the tissue growth it tends to be compressed at the boundary of the tumor. This effect has been observed in real tumors and is referred to as the vascular capsule. We also could reproduce the migration of a myoma if the seed is placed closer to the surface. The tumor is effectively pressed out of the tissue, eventually forming an acute angle with the surface as often observed interoperatively.
6
Conclusion and Future Work
We have presented a framework to model tumor growth, including sprouting angiogenesis. Some of the major processes, i.e. chemical transport, mechanical deformation due to growth and the explicit development of a vascular system have been treated in a coupled way. First results demonstrate the feasibility of the approach. We are, however, aware of the limitations of the current model. The description of tissue as elasto-plastic, allows us to simulate the growth with relatively small geometrical errors. However, it does not allow us to account for increased stress in grown tumors and the stress dependency of cell proliferation. For this reason a nonlinear hyperelastic model, including a treatment of residual stress, will be necessary. Since the time scale at which growth takes place is typically much larger than the time of viscoelastic relaxation, it is reasonable to assume that viscoelastic effects can be neglected for solid tumor growth. Several other extensions and improvements are planned, including vessel re-modeling due to shear stress, and a more detailed representation of the metabolic activity and treatment of individual growth factors. As part of an effort to validate our model, we will extend it to cover the development of malignant penetrating tumors. This will allow us to use existing animal models for validation by in vivo observations based on MR imaging. Acknowledgment. This work has been performed within the frame of the Swiss National Center of Competence in Research on Computer Aided and Image Guided Medical Interventions (NCCR Co-Me) supported by the Swiss National Science Foundation.
References 1. Jang, S.H., Wientjes, M.G., Lu, D., Au, J.L.S.: Drug delivery and transport to solid tumors. Pharmaceutical Research 20(9), 1337–1350 (2003) 2. Carmeliet, P., Jain, R.K.: Angiogenesis in cancer and other diseases. Nature 407(6801), 249–257 (2000)
A Coupled Finite Element Model of Tumor Growth and Vascularization
881
3. Araujo, R.P., McElwain, D.L.S.: A history of the study of solid tumour growth: the contribution of mathematical modelling. Bull. Math. Biology 66(5), 1039–1091 (2004) 4. Greenspan, H.P.: On the growth and stability of cell cultures and solid tumors. Journal of Theoretical Biology 56(1), 229–242 (1976) 5. Cristini, V., Lowengrub, J., Nie, Q.: Nonlinear simulation of tumor growth. Journal of Mathematical Biology V46(3), 191–224 (2003) 6. Tracqui, P.: From passive diffusion to active cellular migration in mathematical models of tumour invasion. Acta Biotheoretica 43(4), 443–464 (1995) 7. Mallet, D.G., Pillis, L.G.D.: A cellular automata model of tumor-immune system interactions. Journal of Theoretical Biology 239(3), 334–350 (2006) 8. Kyriacou, S.K., et al.: Nonlinear elastic registration of brain images with tumor pathology using a biomechanical model. IEEE Trans. Med. Imaging 18(7), 580–592 (1999) 9. Mohamed, A., Davatzikos, C.: Finite element modeling of brain tumor mass-effect from 3D medical images. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 400–408. Springer, Heidelberg (2005) 10. Clatz, O., et al.: Realistic simulation of the 3D growth of brain tumors in MR images coupling diffusion with mass effect. IEEE Trans Med Imaging 24(10), 1334– 1346 (2005) 11. Goriely, A., et al.: Elastic growth models. In: Proc. of BIOMAT-2006 (2007) 12. Wasserman, R., et al.: A patient-specific in vivo tumor model. Mathematical Biosciences 136(2), 111–140 (1996) 13. Anderson, A.R., Chaplain, M.A.: Continuous and discrete mathematical models of tumor-induced angiogenesis. Bull. Math. Biology 60(5), 857–899 (1998) 14. Levine, H.A., et al.: Mathematical modeling of capillary formation and development in tumor angiogenesis: penetration into the stroma. Bull. Math. Biology 63(5), 801–863 (2001) 15. Plank, M.J., Sleeman, B.D.: Lattice and non-lattice models of tumour angiogenesis. Bull Math Biology 66(6), 1785–1819 (2004) 16. Szczerba, D., Sz´ekely, G.: Simulating vascular systems in arbitrary anatomies. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 641–648. Springer, Heidelberg (2005) 17. Macklin, P., Lowengrub, J.: Nonlinear simulation of the effect of microenvironment on tumor growth. Journal of Theoretical Biology 245(4), 677–704 (2007) 18. Sch¨ oberl, J.: Netgen - an advancing front 2d/3d-mesh generator based on abstract rules. Computing and Visualization in Science 1, 41–52 (1997) 19. Salathe, E.P., Xu, Y.H.: Non-linear phenomena in oxygen transport to tissue. Journal of Mathematical Biology 30(2), 151–160 (1991) 20. Gabhann, F.M., Popel, A.S.: Interactions of VEGF isoforms with VEGFR-1, VEGFR-2, and neuropilin in vivo. Am. J. Physiol. Heart Circ. Physiol. 292(1), 459–474 (2007) 21. Winsor, C.P.: The Gompertz Curve as a Growth Curve. National Academy of Sciences 18, 1–8 (1932) 22. Walocha, J.A., et al.: Vascular system of intramural leiomyomata revealed by corrosion casting and scanning electron microscopy. Hum. Repr. 18(5), 1088–1093 (2003)
Autism Diagnostics by 3D Texture Analysis of Cerebral White Matter Gyrifications Ayman El-Baz1 , Manuel F. Casanova2, Georgy Gimel’farb3 , Meghan Mott2 , and Andrew E. Switala2 1
2
Bioengineering Department, University of Louisville, Louisville, KY, USA Department of Psychiatry and Behavioral Science, University of Louisville, USA 3 Department of Computer Science, University of Auckland, New Zealand
Abstract. The importance of accurate early diagnostics of autism that severely affects personal behavior and communication skills cannot be overstated. Neuropathological studies have revealed an abnormal anatomy of the cerebral white matter (CWM) in autistic brains. We explore a possibility of distinguishing between autistic and normal brains by a quantitative shape analysis of CWM gyrifications on 3D proton density MRI (PD-MRI) images. Our approach consists of (i) segmentation of the CWM on a 3D brain image using a deformable 3D boundary; (ii) extraction of gyrifications from the segmented CWM, and (iii) shape analysis to quantify thickness of the extracted gyrifications and classify autistic and normal subjects. The boundary evolution is controlled by two probabilistic models of visual appearance of 3D CWM: the learned prior and the current appearance model. Initial experimental results suggest that the proposed 3D texture analysis is a promising supplement to the current techniques for diagnosing autism.
1
Introduction
Autistic Spectrum Disorder (A.S.D.), or autism is a complex neurological disability that typically appears during the first three years of life and impacts development of social interaction and communication skills. Each individual is affected differently and at varying degrees, from milder forms in which intellectual ability is high but social interaction is low, to the most severe cases typified by unusual, self-injurious, and aggressive behaviors. The latter may persist throughout life and inflict a heavy burden on those who interact with autistic persons. Cognitive impairments may also last over time and often result in mental retardation in the majority of autistic individuals [1]. Neuropathological studies of autism have shown that children with autism have ordinary-size-brains at birth, but experience an acceleration of brain growth resulting, between two and four years of age, in increased brain volume relative to the normal (control) group [2,3,4]. By adolescence and adulthood, differences in the mean brain size between the two groups diminish largely as a result of increased relative growth in the control group; nonetheless, there exists an abnormal anatomy of cerebral white matter (CWM) in autistic brains [3,4]. In N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 882–890, 2007. c Springer-Verlag Berlin Heidelberg 2007
Autism Diagnostics by 3D Texture Analysis of CWM Gyrifications
883
addition, the deficits in the size of the corpus callosum and its sub-regions are well established in patients with autism relative to controls. To overcome limitations of the conventional volumetric-based diagnostics, we propose to quantitatively analyze shapes of CWM gyrifications considered as a texture on 3D PD-MRI images and use the found CWM abnormalities for robust classification of autistic vs. normal subjects. To the best of our knowledge, such an approach to automatically diagnose autism by 3D texture analysis of CWM gyrifications is the first of its kind. Our objective is to quantify differences between the shape of CWM gyrifications for autistic and normal subjects. The proposed diagnostics is based on the three-step texture analysis of 3D PD-MRI brain images that is detalized in Sections 2 and 3: (i) CWM segmentation from a 3D PD-MRI image using an evolving deformable boundary guided by probabilistic models of current and learned prior visual appearance of CWM; (ii) extraction of gyrifications from the segmented CSM, and (iii) quantification of the thickness of the CSM gyrifications to perform classification. Experimental results and conclusions are given in Section 4.
2
Cerebral White Matter Segmentation
Accurate CWM segmentation from a 3D PD-MRI image is a challenging problem because intensities in the CWM and surrounding organs are not clearly distinguishable. Thus, we segment the PD-MRI image using a conventional 3D parametric deformable boundary [6] but control its evolution with two original probabilistic models of visual appearance, namely, a learned CWM appearance prior accounting for translation- and rotation-invariant pairwise voxel interaction and a mixed model of voxel intensities in the current CWM and surrounding tissues. The appearance prior is a Markov-Gibbs random field (MGRF) with multiple pairwise interaction having analytical identification (parameter estimation) from training data. The voxel-wise model of the current CSM appearance is extracted from the multi-modal mixed intensity distribution by its precise approximation with a linear combination of discrete Gaussians (LCDG) similar to the LCGapproximation in [7]. Let (x, y, z) be Cartesian coordinates of 3D points. A finite 3D arithmetic lattice R = [(x, y, z) : x = 0, . . . , X − 1; y = 0, . . . , Y − 1, z = 1, . . . , Z − 1] supports a 3D image g : R → Q and its 3D region map m : R → L where Q = {0, 1, . . . , Q − 1} and L = {cwm, bg} are finite sets of intensities and region labels, respectively. Each label, mx,y,z , indicates whether a voxel gx,y,z in the corresponding intensity data set g belongs to CWM, or background. We use a conventional deformable boundary [6] that evolves in the direction minimizing its energy E depending on internal, ζint (b), and external, ζext (b), forces: (ζint (b) + ζext (b)) db (1) E = Eint + Eext = b
where b = [Pk : k ∈ K = {1, . . . , K}] is a parametric surface with K vertices Pk = (xk , yk , zk ). But we introduce a new type of the external energy involving
884
A. El-Baz et al.
Fig. 1. An initial (left) and normalized (right) PD-MRI image
Fig. 2. Central-symmetric 2D (left) and 3D (right) neighborhoods for the eight distance ranges [dν,min = ν − 0.5, dν,max = ν + 0.5); ν ∈ N = {1, . . . , 8}
the learned and on-going (current) visual appearance of CWM. Each image is normalized by mapping the signal range [qmin , qmax ] for each 3D data set to [0, 255] as in Fig. 1 in order to account for global contrast and offset deviations of intensities due to different sensors. The normalized images are considered as samples of a prior MGRF model of the CWM appearance. To exclude any image alignment before segmentation, we use a generic translation- and rotation-invariant MGRF with only voxel-wise and central-symmetric pairwise voxel interaction. The latter is specified by a set N of characteristic central-symmetric voxel neighborhoods {nν : ν ∈ N} on R and a corresponding set V of Gibbs potentials, one per neighborhood. A central-symmetric voxel neighborhood nν embraces all voxel pairs such that (x, y, z)-coordinate offsets between a voxel (x, y, z) and its neighbor (x , y , z ) belong to an indexed semi-open interval [dν,min , dν,max ); ν ∈ N ⊂ {1, 2, 3, . . .} of the inter-voxel distances: dν,min ≤ (x − x )2 + (y − y )2 + (z − z )2 < dν,max . Figure 2 illustrates the neighborhoods nν for the uniform distance ranges [ν − 0.5, ν + 0.5); ν ∈ N = {1, . . . , 8}. Learning the appearance prior. Let S = {(gt .mt ) : t = 1, . . . , T } be a training set of 3D images with known region maps. Let Rt = {(x, y, z) : (x, y, z) ∈ R ∧ mt;x,y,z = cwm} denote the part of R supporting CWM in the t-th training pair (gt , mt ); t = 1, . . . , T . Let Cν,t be a family of voxel pairs in R2t with the co-ordinate offset (ξ, η, γ) ∈ nν in a particular neighborhood. Let Fvox,t and Fν,t be a joint empirical probability distribution of voxel intensities and of intensity co-occurrences, respectively, in the training CWM from gt : Fvox,t = fvox,t (q) =
|Rt,q | |Rt |
: q∈Q
and Fν,t =
fν,t (q, q ) =
|Cν,t;q,q | |Cν,t |
: (q, q ) ∈ Q2
where Rt,q = {(x, y, z) : (x, y, z) ∈ Rt ∧ gx,y,z = q} is a subset of voxels supporting the intensity q and Cν,t;q,q is a subset of the voxel pairs cξ,η,γ (x, y, z) = ((x, y, z), (x + ξ, y + η, z + γ)) ∈ R2t supporting the intensity co-occurrence (q, q ) in the training CWM from gt . Let Vvox = [Vvox (q) : q ∈ Q] be a potential function of voxel intensities that describes the voxel-wise interaction. Let Vν = [Vν (q, q ) : (q, q ) ∈ Q2 ] be a potential function of intensity co-occurrences in the neighboring voxel pairs that describes the pairwise interaction in the neighborhood nν ; ν ∈ N.
Autism Diagnostics by 3D Texture Analysis of CWM Gyrifications
885
The MGRF prior model of the t-th training pair is specified by the joint Gibbs probability distribution on the sublattice Rt : Pt =
T 1 T exp |Rt | Vvox Fvox,t + ν∈N ρν,t Vν,t Fν,t Zt
(2)
where ρν,t = |Cν,t |/|Rt | is the average cardinality of nν with respect to Rt . To simplify notation, let the CWM volumes in the training images be similar, so that |Rt | ≈ Rcwm and |Cν,t | ≈ Cν,cwm for t = 1, . . . , T . Here, Rcwm and Cν,cwm are the average cardinalities over the training set S = {(gt .mt ) : t = 1, . . . , T }. Assuming the independent samples, the joint probability distribution of intensities for all the training CWM images is as follows: PS =
T 1 exp T Rcwm Vvox Fvox + ν∈N ρν VνT Fν Z
(3)
where ρν = Cν,cwm /Rcwm , and the marginal empirical distributions of intensities Fvox,cwm and intensity co-occurrences Fν,cwm describe now all the CWM images from the training set. To identify the MGRF model described in Eq. 3, we have to estimate the Gibbs Potentials V. In this paper we introduce a new analytical maximum likelihood estimation for the Gibbs potentials (the mathematical proof for this new estimator is shown in our web site1 ). 1 Vvox,cwm (q) = log fvox,cwm (q) − Q q∈Q κ∈Q log fvox,cwm (κ); (4) Vν,cwm (q, q ) = λρν (fν,cwm (q, q ) − fvox,cwm (q)fvox,cwm (q )) ; (q, q ) ∈ Q2 where the common factor λ is also computed analytically. LCDG-models of Current appearance. Non-linear intensity variations in a data acquisition system due to a scanner type and scanning parameters affect visual appearance of CWM in each data set g to be segmented. Thus in addition to the learned appearance prior, we describe an on-going CWM appearance with a marginal intensity distribution within an evolving boundary b in g. This distribution is considered as a dynamic mixture of two probability distributions that characterize the CWM and its background, respectively, and is partitioned into these two models using the EM-based approach in [7]. Boundary evolution under the two appearance models. We guide the boundary evolution in such a way that the following external energy term in Eq. (1) combining the learned prior and current appearance models within the boundary is minimized: ζext (Pk = (x, y, z)) = −pvox,cwm (gx,y,z )πp (gx,y,z |S)
(5)
where pvox,cwm (q) is the marginal probability of the intensity q in the above LCDG-model for the CWM and πp (q|S) is the prior conditional probability 1
http://louisville.edu/speed/bioengineering/faculty/bioengineering-full/ dr-ayman-el-baz/dr-ayman-el-bazs-lab.html
886
A. El-Baz et al.
of q, given the fixed current intensities in the characteristic central-symmetric neighborhood of Pk for the MGRF prior model of Eq. (3): exp (EP (gx,y,z |S)) πP (gx,y,z |S) = q∈Q exp (EP (q|S)) where EP (q|S) is the conditional Gibbs energy of pairwise interaction for the voxel P provided that an intensity q is assigned to it but the current intensities in all its neighbors over the characteristic neighborhoods nν ; ν ∈ N, remain fixed: EP (q|S) = Vvox,cwm (q) +
(Vν,cwm (gx−ξ,y−η,z−γ , q) + Vν,cwm (q, gx+ξ,y+η,z+γ ))
ν∈N (ξ,η,γ)∈nν
The evolution terminates after the total energy Er of the 3D region r ⊂ R inside the boundary b does not change: EP (gx,y,z |S) (6) Er = ∀P=(x,y,z)∈r
3
Quantitative Analysis of CWM Gyrifications
Our main hypothesis is that thickness of gyral CWM for normal subjects is greater than for autistic subjects as Fig. 3 suggests. To quantify this feature, we need first to automatically extract CWM gyrifications from the segmented CWM and then analyze their differences in order to classify normal and autistic persons.
pi(d) po(d)
0.02 0.015 0.01 0.005 0 0
Fig. 3. Segmented CWM for a control (left) and an autistic (right) patient. Note that the CWM gyrifications the autistic person appear thinner than for the normal subject.
f(d) Empirical density f(d) inside cortex grification pi(d) outside cortex grification po(d)
t=2.35
5
10
d (cm)
15
Fig. 4. Coronal section (left) in the 3D distance map for the segmented CWM (the boundary found by segmentation is shown in green) and the estimated class densities (right) obtained from the mixed empirical distance density for the segmented CWM
To extract gyrifications, we calculate the distance map inside the segmented 3D CWM by a fast marching level set method in [8]. The map gives the minimum Euclidean distance from each inner point of the segmented object to the object boundary (Fig. 4). Using the EM-based approach in [7], the mixed empirical marginal distribution of these distances is partitioned into two probability
Autism Diagnostics by 3D Texture Analysis of CWM Gyrifications
(a)
(b)
(c)
887
(d)
Fig. 5. 2D (a,b; red) and 3D (c, d; pink) visualization of the extracted CWM gyrifications
0.6
Controls
0.4 0.2 0 0
0.5
1
1.5
d
2
(a)
1
Average CDF for Autistics Subjects
0.8 0.6 Average CDF for Normal Subjects
0.4 0.2 0 0
0.5
1
1.5
d2
(b)
1 0.8
Cumulative Distribution Function
Autistic
Cumulative Distribution Function
1
0.8
Average Cumulative Distribution Functions
Cumulative Distribution Function
models: of the CWM gyrifications (class 1) and all other CWM tissues (class 2), respectively, shown in Fig. 4. Then the gyrifications are extracted using the optimum threshold that separates the two classes (see Fig. 5). We propose to use the cumulative distribution function (CDF) of distances in a distance map inside the extracted CWM gyrifications as a generalized shape feature of the CWM structure. Figure 6 shows the CDFs for 14 subjects (7 autistic and 7 normal ones) selected for training. It is evident that the two classes, autistic and normal, are clearly separable using these CDFs. ρ = 0.13 Autistics ρ = 0.069
0.6 0.4
Control 0.2 0 0
0.5
1
1.5
d
2
(c)
1
ρ = 0.008
0.8
Autistics 0.6
ρ = 0.231
0.4 0.2 0 0
Control 0.5
1
1.5
d
2
(d)
Fig. 6. Cumulative distance distributions (a) inside the segmented training distance maps for 14(seven normal and seven autistic) subjects; the average CDFs for autistic and normal subjects (b), and the proposed classification (c,d) of unknown subjects (shown by green) by using the Levy distance (ρ) to the average CDFs: (c) the normal and (d) the autistic subject
To classify the CDFs, the Levy distances between a CDF F = [Fu : u = 0, 1, . . . , dmax ] in question for the distance map inside the extracted CWM gyrifications and the average CDFs FA/N in Fig. 6 serving as the templates of autistic or normal subjects are calculated [10] : ρ(Fu , FA/N ) = min{α : FA/N (d − α) − α ≤ Fu (d) ≤ FA/N (d + α) + α}.
4
α>0
Experimental Results and Conclusions
The proposed approach has been tested on the 39 PD-MRI images of postmortem brains for 23 autistic patients (the mean interval between death and autopsy: 25.8 hours) and 16 controls (the mean interval between death and autopsy: 20.4 hours) obtained from the Autism Tissue Program (ATP). The brain tissues were scanned by a 1.5 Tesla GE MRI system with voxel resolution 0.3125 × 0.3125 × 1.6 mm3 using a proton density weighted imaging sequence protocol [5]). The
888
A. El-Baz et al.
A
C
S
(a)
(b)
(c)
(d)
(e)
Fig. 7. Results of 3D CWM segmentation projected onto 2D axial (A), coronal (C), and saggital (S) planes for visualization: 2D profiles of the original PD-MRI images (a), pixel-wise Gibbs energies (b) for ν ≤ 8, our segmentation (c), the segmentation with the algorithm in [9], and (e) the radiologist’s segmentation
“ground truth” diagnosis to evaluate the classification accuracy for each patient was given by the Autism Diagnostic Interview-Revised (ADIR). Figure 7 demonstrates results of the CWM segmentation. The Gibbs energies for each CWM voxel are higher than for any other brain tissues. This is why the proposed approach is very accurate. The boundary evolution terminates after 226 iterations due to close to zero changes in the total energy. The error of our segmentation with respect to the radiologist’s “ground truth” is 1.49%. To highlight the advantages of our approach, we compare it to the most popular level-sets-based segmentation by Vese and Chan [9] where the level set evolution is controlled by region statistics, e.g. mean and variance. The segmentation error for their approach in the experiment in Fig. 7 is 9.6%. The main problem in the Vese and Chan’s segmentation is that the errors usually occur just at the CWM gyrifications, which are the main features to discriminate between the autistic and normal subjects. The motivation behind our segmentation was to exclude such errors as far as possible. The training subset for classification (14 persons shown in Fig. 6) was arbitrarily selected among all the the 39 subjects. The accuracy of classification of both the training and test subjects was evaluated using the χ2 -test at the three confidence levels – 85%, 90% and 95% – in order to examine significant differences in the Levy distances. As expected, the 85% confidence level yielded the best results – the correctly classified 22 out of 23 autistic subjects (a 96% accuracy), and 15 out of 16 control subjects (a 94% accuracy). At the 90% confidence level, 22 out of 23 autistic subjects were still classified correctly, however, only 14 out of 16 control subjects were correct, bringing the accuracy rate for the control group down to 88%. The 95% confidence level obviously gives the smaller accuracy rates for both the groups, namely, 20 out of 23 correct answers for autistic subjects (87%) and still 14 out of 16 control subjects (88%). Our preliminary
Autism Diagnostics by 3D Texture Analysis of CWM Gyrifications
889
explanation of the cases that are misclassified using the proposed approach is that there is a distortion in the geometry of the extracted cerebral white matter gyrifications. This distortion is due to fixation problems and removal of the brain from the skull because large deep cuts create distortions commonly revealed in MRI scans. The classification based on traditional volumetric approach is 11 out of 23 autistic subjects (a 48% accuracy), and 6 out of 16 control subjects (a 38% accuracy) at a 85 confidence interval, these results highlight the advantage of the proposed diagnostic approach. To show that the proposed approach is general and is not limited to ex-vivo data we tested the proposed approach on in-vivo data (Savant series [11]). The complete description of the Savant series and the classification results are shown in our web site2 . In total, these preliminary results show that the 3D texture analysis of the PD-MRI brain images is able to accurately discriminate between the autistic and normal subjects. Our proposal substantially differs from the known diagnosing techniques that exploit only volumetric descriptions of different brain structures and thus are in principle more sensitive to the selection of ages and segmentation errors [3,4]. Contrastingly, the proposed approach derives efficient quantitative classification features from 3D shapes of different brain structures. Our experiments demonstrate statistically significant differences in the proposed generalized geometric characteristics of CWM gyrifications for 39 normal and autistic subjects under consideration. In the future, we are going to investigate different brain structures in order to quantitatively characterize the development and changes of an autistic brain over time. Our investigation will not be limited to only the CWM but will also study the gray matter. Also, to validate and possibly modify the proposed approach, we will test it on larger data sets with known ground truth (doctors’ diagnosis).
References 1. Minshew, N., Payton, J.: New perspectives in autism, part i. the clinical spectrum of autism. Curr. Probl. Pediatr. 18, 561–610 (1988) 2. Kanner, L.: Autistic disturbances of affective contact. Nervous Child 2, 250–250 (1943) 3. Aylward, E., Minshew, N., Field, K., Sparks, B., Singh, N.: Effects of age on brain volume and head circumference in autism. Neurology 59(2), 175–183 (2002) 4. Courchesne, R., Carper, R., Akshoomoff, N.: Evidence of brain overgrowth in the first year of life in autism. JAMA 290, 337–344 (2003) 5. Schumann, C., Buonocore, M., Amaral, D.: Magnetic resonance imaging of the post-mortem autistic brain. J. Autism. Dev. Disord. 31(6), 561–568 (2001) 6. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision 1, 321–331 (1987) 7. Farag, A., El-Baz, A., Gimel’farb, G.: Precise segmentation of multimodal images. IEEE Transactions on Image Processing 15(4), 952–968 (2006) 2
http://louisville.edu/speed/bioengineering/faculty/bioengineering-full/dr-aymanel-baz/dr-ayman-el-bazs-lab.html
890
A. El-Baz et al.
8. Adalsteinsson, D., Sethian, J.: A fast level set method for propagating interfaces. Journal of Computational Physics 118, 269–277 (1995) 9. Vese, L., Chan, T.: A multiphase level set framework for image segmentation using the Mumford and Shah model. International Journal of Computer Vision 50(3), 271–293 (2002) 10. Lamperti, J.W.: Probability. J. Wiley & Sons, New York (1996) 11. Hermelin, B.: Bright Splinters of the Mind, A Personal Study of Research with Autistic Savants. Jessica Kingsley Publishers Ltd (2001)
3-D Analysis of Cortical Morphometry in Differential Diagnosis of Parkinson’s Plus Syndromes: Mapping Frontal Lobe Cortical Atrophy in Progressive Supranuclear Palsy Patients Duygu Tosun1 , Simon Duchesne2 , Yan Rolland3 , Arthur W. Toga1 , Marc V´erin4 , and Christian Barillot2 1
Laboratory of Neuro Imaging UCLA School of Medicine, Los Angeles, CA, USA Visages U746, INSERM-INRIA-CNRS-Univ-Rennes1, IRISA, Rennes, France Dpartement de Radiologie et d’Imagerie Mdicale, CHR Hpital Sud, Rennes, France Clinique Neurologique, Unit de Recherche Universitaire ”Comportement et Noyaux Gris Centraux”, CHU de Rennes, Rennes, France 2
3 4
Abstract. With the ability to study brain anatomy in vivo using magnetic resonance imaging, studies on regional brain atrophy suggest possible improvements for differential diagnosis of movement disorders with parkinsonian symptoms. In this study, we investigate effects of different parkinsonian syndromes on the cortical gray matter thickness and the geometric shape of the cerebral cortex. The study consists of a total of 24 patients with a diagnosis of probable progressive supranuclear palsy (PSP), multiple systems atrophy (MSA) or idiopathic Parkinson’s disease (IPD). We examine dense estimates of cortical gray matter thickness, sulcal depth, and measures of the curvature in a surface-based cortical morphometry analysis framework. Group difference results indicate higher cortical atrophy rate in the frontal lobe in PSP patients when compared to either MSA or IPD. These findings are indicative of the potential use of routine MRI and cortical morphometry in performing differential diagnosis in PSP, MSA and IPD. Keywords: Cortical morphometry; Differential diagnosis; Parkinsonian syndromes; Sulcal atrophy.
1
Introduction
Neurodegenerative brain diseases possess unique morphological signatures; detection of such signs may prove useful in improving diagnosis, particularly for diseases in which there are few other diagnostic tools. Differential diagnosis of patients with Parkinsonian syndromes is a challenging but clinically important task, necessary to guide prognosis and treatment strategies. For instance, studies based on volumetric morphometry or visual atrophy ratings on MR images have N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 891–899, 2007. c Springer-Verlag Berlin Heidelberg 2007
892
D. Tosun et al.
shown differences in the regional rates of atrophy in the brainstem (midbrain and pons), the striatum, the cerebellum, the lateral and third ventricles as well as frontal and posterior inferior cortical regions [1,2]. Midbrain atrophy is useful in differentiating progressive supranuclear palsy (PSP) from idiopathic Parkinson’s disease (IPD) [3], while striatal abnormalities and cerebellar atrophy are more common in multiple-system atrophy (MSA) [4]. Even though these studies suggest that regional atrophy rates are potential morphological indexes required to establish an accurate diagnosis and follow disease progression, a rigorous differential diagnosis framework must account for a number of confounding factors. First, regional atrophy rates are scalar measures that do not take into account the intrinsic geometry of the structures of interest. These are of particular concern when studying disease that have effects on the cerebral cortex because of its elaborate inward and outward folds in three dimensional space. Secondly, quantitative, rather than qualitative measurements must be performed for an objective assessment of atrophy. In this study, we attempted to address these concerns by engineering an automated surface-based 3-D cortical morphometry framework that focuses on group differences in the geometry of the cerebral cortex. A battery of algorithms has been integrated in the proposed framework. These algorithms have been described and validated in detail in various publications [5,6,7,8]. We specifically aimed to test the hypothesis that there exists group differences in cortical atrophy patterns among PSP, MSA, and IPD. We tested this hypothesis by investigating the cortical gray matter thickness and the amount of dilation and filling of cortical sulcal spaces with cerebro-spinal fluid reflected on the cortical geometry.
2
Materials and Methods
2.1
Subjects and Data Acquisition
The study group consisted in 8 patients with a diagnosis of probable IPD without dementia (4 females and 4 males; age (sd) 58.5 (9.0)), 8 patients with a diagnosis of probable MSA (3 females and 5 males; age (sd) 58.6 (7.1)), and 8 patients with a diagnosis of probable PSP (4 females and 4 males; age (sd) 65.1 (9.4)). All groups were age-matched (Tukey-Kramer HSD, alpha = 0.05). All patients were seen at the Movement Disorder clinic of the Centre Hospitalier Universitaire de Rennes (Rennes, FR). Diagnosis was established following the NNIPPS (Neuroprotection and natural history and biology in parkinsonian plus syndromes 1 ) study clinical and imaging protocol, which include clinical criteria 1
The NNIPPS project is investigating the neuroprotective efficacy and safety of riluzole in MSA and PSP in the setting of a large, multi-center (49 sites), randomized, parallel-group, placebo-controlled trial in several European countries (France, UK, Germany). The imaging component of the NNIPPS project sought to understand structural MRI changes in these diseases, to define and validate a standardized acquisition protocol, and to construct prospectively validated image assessment instruments on MRI.
3-D Analysis of Cortical Morphometry in Differential Diagnosis
893
from Gilman et al. [9] for MSA and from Litvan et al. [10] for PSP. The diagnosis was also supported by long-term (>36 months) clinical and neurological follow-up, and response to treatment in the case of IPD patients. MR brain images of each participants were acquired on a GE Signa 1.5 Tesla MR scanner using a T1-weighted spoiled gradient echo (SPGR) pulse sequence with the following parameters: TE = 5 ms; TR = 24 ms; 45◦ flip angle; matrix size = 256 × 256; FOV = 240 mm × 240 mm; slice thickness = 1.2 mm, with 124 contiguous sagittal cross-sections. 2.2
Cortical Surface Reconstruction
The skull, scalp, extra-cranial tissue, cerebellum, and brain stem (at the level of the diencephalon) were removed from each image data using a template-based segmentation with competitive level sets and fuzzy controls [8]. The Colin27 average brain from Montreal Neurological Institute database (the average of 27 T1 weighted MRI acquisitions from a single subject) served as the high definition structural brain template to define our anatomical target (i.e., cerebrum).The software yielded good cerebrum extraction in most cases; in several cases, however, additional manual editing was required to remove retained non-cerebral tissue. Cross-sectional view of T1-weighted MR image and the extracted cerebral volume are shown in Figs. 1(a) and (b). After correcting for intensity non-uniformity in MR data using the Nonparametric Non-uniform intensity Normalization (N3) [11], each individual’s cortical surface was extracted using a cortical reconstruction method using implicit surface evolution (CRUISE) technique developed by Han et al. [5] and shown to yield an accurate and topologically correct representation that lies at the geometric center of the cortical gray matter tissue [6]. CRUISE is a data-driven method combining a robust fuzzy segmentation method, an efficient topology correction algorithm, and a geometric deformable surface model. Each resulting cortical surface was represented as a triangle mesh comprising of approximately 300,000 mesh nodes. Reconstructed cortical surface for a sample brain is shown in Fig. 1(c). 2.3
Cortical Morphometry
Geometrically, the cerebral cortex is a thin, folded sheet of gray matter that is 15 mm thick, with an average thickness of approximately 2.5 mm [12,13]. Cerebral degeneration can result in dilation of cortical sulcal openings and compensatory filling of freed space by cerebrospinal fluid (CSF), affecting the 3-D geometry of the cerebral cortex. This phenomenon is thought to be the combined result of the reduction of thickness of the gyral grey matter (GM) mantle and atrophy of gyral white matter (WM). We consider four measurements: cortical gray matter thickness to examine the cortical gray matter tissue atrophy; and geodesic sulcal depth along with two measures of curvature characteristics of the cortex to examine sulcal dilation. While cortical thickness is a metric of three-dimensional geometry of gray matter tissue sheet, geodesic sulcal depth and curvature measures rely on an overall two-dimensional approximation to this three-dimensional
894
D. Tosun et al.
(a)
(b)
(g)
(c)
(h)
(d)
(i)
(e)
(j)
(f)
(k)
Fig. 1. Cross-sectional view of (a) T1-weighted MR image and (b) extracted cerebral volume; (c) Reconstructed cortical surface; Cross-sectional view of (d) gray matter tissue classification and (e) cortical gray matter thickness volume; (f) Cortical thickness displayed on the cortical surface; Cross-sectional view of (g) outer cortical surface and (h) geodesic sulcal depth volume for left hemisphere; (i) Sulcal depth displayed on the cortical surface; Curvature features (j) curvedness and (k) shape index displayed on the cortical surface
cortical sheet (i.e., central cortical surface [cf Section 2.2]). These measures are depicted in Figs. 1(d)-(k) and explained in detail herein. Cortical Gray Matter Thickness. The cortical GM is bounded by the CSF on the outside, and by the WM on the inside. Adopting an Eulerian approach, the cortical thickness at each point in the GM tissue sheet can be defined to be the sum of the geodesic distances from the point to the GM/WM and GM/CSF interfaces. Let Ω ⊂ 3 (i.e., GM tissue sheet) be a spatial region with simply connected inner boundary ∂0 Ω (i.e., GM/WM interface) and outer boundary ∂1 Ω (i.e., GM/CSF interface). Then, the geodesic distances from each point x ∈ Ω to the boundaries are computed by solving the following pair of PDEs: < ΔD0 (x), C0 (x) >= 1, with D0 (∂0 Ω) = 0, < ΔD1 (x), C1 (x) >= 1, with D1 (∂1 Ω) = 0.
(1)
According to these PDEs constructed from the geometry of the problem, D0 (x) (and D1 (x)) is defined as the length of the correspondence trajectory C0 (and C1 ) that travel from x through Ω up to the ∂0 Ω (and ∂1 Ω). Accordingly, the thickness T (x) of Ω at x is T (x) = D0 (x) + D1 (x).
(2)
The boundaries, ∂0 Ω and ∂1 Ω, have sub-voxel resolution — i.e., partial voluming — and are usually represented as level sets of scalar functions. We use a Lagrangian approach to compute the values D0 and C0 at grid points in Ω adjacent to the boundary ∂0 Ω. At the remain grid points in Ω, we adopted a fast marching framework to compute the values D0 and C0 simultaneously [14]. The values
3-D Analysis of Cortical Morphometry in Differential Diagnosis
895
of D1 and C1 were computed in a similar way. This hybrid Eulerian-Lagrangian framework yields cortical thickness estimates with sub-voxel accuracy of a Lagrangian approach with the speed of an Eulerian approach. Cross-sectional view of the GM tissue classification and the corresponding cortical thickness volume of a sample subject and the resulting cortical thickness map on the surface are shown in Figs. 1(d)-(f). Geodesic Sulcal Depth. Depth within sulcal regions is defined as the length of the geodesic path connecting any point on the cortical surface through the sulcal opening to the outer cortical surface (see Fig. 1(g)). Changes in sulcal depth are associated with a combination effect of non-uniform rate of GM thickness reduction across the cerebrum and atrophy of the gyral WM. To compute the geodesic sulcal depth, we first generated an outer cortical surface that tightly surrounds the cortical surface without entering into the sulcal folds. The left and right cortical hemispheres were automatically identified by defining a cut around the corpus callosum using the knowledge of the locations of the midsagittal plan after a rigid alignment to the template brain’s cortical surface. Focusing on one cortical hemisphere — i.e., allows to study sulcal regions in medical surface —, the outer cortical surface was computed by deforming the cortical surface according to an evolution equation: Φt (x, t) = R(x)κ(x)ΔΦ(x, t).
(3)
In this context, the deforming surface is implicitly embedded as the zero-level set of a scalar function Φ(x, t) (i.e., signed distance function). Φt (x, t) and ΔΦ(x, t) are the time derivative and the spatial gradient of Φ(x, t), respectively. κ(x), mean curvature function, and R(x), barrier force, form the speed function that control the evolution of the level set function. While mean curvature component of the speed function aims to unfold sulcal folds, its barrier force term anchors the gyral crown points (i.e., visible cortex from outside) by preventing the surface from evolving in the inward direction. The result of this procedure is illustrated in Fig.1(g) where a cross-section of the outer cortical surface (in red) is shown along with the original cortical surface (in blue). The geodesic sulcal depth from the original cortical surface to the outer cortical surface can be calculated by the approach used for cortical thickness computation. We set ∂0 Ω and ∂1 Ω at the original cortical surface and at the outer surface, respectively; and Ω ⊂ 3 was defined as the spatial region between these boundary surfaces (i.e., sulcal openings). A cross section from the resulting sulcal depth volume of left cortical hemisphere of a sample subject and the corresponding surface map are shown in Figs. 1(h) and (i). Curvature Characteristics of the Cortex. Enlargement of the cerebral sulci (i.e., dilation of cortical sulcal openings and compensatory filling of freed space by CSF) have a direct effect on the 3-D folding geometry of the cerebral cortex. An analysis on the curvature characteristics of the cortex assesses changes in the type and magnitude of cortical folds. We consider two curvature measures:
896
D. Tosun et al.
shape index and curvedness. Given the two principal curvatures κ1 and κ2 , where κ1 ≤ κ2 , shape index (SI) and curvedness (CN) are defined as [15] κ2 + κ1 2 κ21 + κ22 . (4) SI = arctan , CN = π κ2 − κ1 2 The scaling on the shape index is such that it provides a space of shapes with the topology of a line segment, i.e., [−1, 1]. In this shape space, each surface point is classified as spherical cup point (i.e., SI = −1), saddle point (i.e., SI = 0), spherical cap point (i.e., SI = 1), or smooth transitions between these shape. This shape classification is invariant under global scaling of the surface. The missing scale (or size) information is captured by the curvedness measure. SI and CN maps of a sample individual are shown in Figs. 1(j) and (k), respectively. The SI successfully distinguished the cortical features such as sulci and gyri and CN gave the size of the folding. 2.4
Cortical Normalization
We used a surface-based 3-D cortical warping technique [7] to establish a dense point correspondence between anatomically homologous surface points of individual surfaces on a common reference surface coordinate system. This common coordinate system is defined by the template brain’s cortical surface. The pairwised surface warping algorithm aims to align the geometric features of two cortical surfaces in a multi-scale framework using the sphere as a canonical joint coordinate system. Without requiring any manually identified landmark curves, this approach defines an anatomical homology between two cortical surfaces based on a similarity between the geometric features densely defined at each cortical surface point. Therefore, each subject cortical surface was spatially normalized with respect to the geometry of a representative template. Population averages as well as point-based statistical comparison were then computed within the common cortical coordinate system of the template brain. 2.5
Statistical Analysis of the Cortical Shape Measures
The cortical shape measures (i.e., cortical thickness, geodesic sulcal depth, shape index, and curvedness) were computed for 24 individuals in the study group. The shape measures were smoothed using a surface-based intrinsic isotropic smoothing filter of radius 10 mm. The smoothing was performed to reduce computational noise on the shape measures. Statistical analysis involved applying a hypothesis testing in order to determine whether 3-D shape changes in cerebral cortex could differentiate Parkinson’s plus syndromes MSA and PSP from IPD. We tested the equality of group means for each shape measure using Student’s t-test. The significance level of each individual test was corrected for multiple comparisons using False Discovery Rate (FDR).
3-D Analysis of Cortical Morphometry in Differential Diagnosis
3
897
Results and Discussion
The mean difference maps on the template brain surface are shown in the first three columns of Fig. 2, and the regions with significant mean difference (p < 0.05 after FDR correction) are highlighted in the last three columns of Fig 2. Population average maps revealed different profiles of cortical thickness, sulcal depth, and curvedness across groups. Subtle yet significant differences in cortical folding shape (i.e. shape index) were found particularly in the frontal lobe when comparing PSP patients to IPD patients. GM thickness maps displayed tissue loss in frontal lobe and motor cortex, reaching significance in the motor cortex region when comparing PSP to MSA patients. When comparing MSA and PSP to IPD separately for differences in sulcal depth and curvedness maps, we observed reversed patterns of significance. These were more pronounced in MSA versus PSP significance maps, especially in the left frontal lobe area. Although our statistics on cortical thickness did not reveal a strongly distinguishing cortical atrophy pattern between MSA, PSP, and IPD groups, our strong findings on curvedness and sulcal depth measures in the frontal lobe suggest that atrophy of gyral WM in PSP patients is significantly different than MSA as well as IPD patients. IPD
MSA
PSP
IPD vs MSA IPD vs PSP MSA vs PSP
(a)
(b)
(c)
(d) Population average maps
Significant mean difference maps
Fig. 2. Statistical analysis on (a) cortical thickness, (b) sulcal depth, (c) curvedness, and (d) shape index measures. Color coding for significance maps: Population 1 > Population 2 colored in red/yellow and Population 2 > Population 1 colored in blue/cyan. (p < 0.05, corrected for multiple comparisons using False Discovery Rate(FDR)).
898
D. Tosun et al.
We interpret these finding as indicative of increased sulcal atrophy in the frontal lobe in PSP patients when compared to either MSA or IPD. Our findings in Broadmann areas 4 and 6 (i.e., the primary motor cortex and primary visual cortex areas), and Broadmann area 8 (i.e., frontal cortex area including the frontal eye field) agree with the cognitive decline in PSP patients that may, in part, be explained by the associated visual-motor deficit and frontal-lobe dysfunction [16]. These preliminary results confirm our hypothesis of distinct patterns of sulcal atrophy and GM tissue loss, demonstrating the potential use of routine MRI and cortical morphometry in performing differential diagnosis in PSP, MSA and IPD. These findings encourage further investigation in a larger data set in the development of a differential diagnosis framework. Our morphometry analysis benefits from the use of different shape measures in studying cortical atrophy. Future work includes systematic studies on how different shape measures correlate in measuring change in 3-D cortical geometry and longitudinal evaluation of cerebral morphological changes.
References 1. Paviour, D.C., Price, S.L., Jahanshahi, M., Lees, A.J., Fox, N.C.: Longitudinal MRI in progressive supranuclear palsy and multiple system atrophy: rates and regions of atrophy. Brain 129(4), 1040–1049 (2006) 2. Oba, H., Yagishita, A., Terada, H., Barkovich, A.J., Kutomi, K., Yamauchi, T., Furui, S., Shimizu, T., Uchigata, M., Matsumura, K., Sonoo, M., Sakai, M., Takada, K., Harasawa, A., Takeshita, K., Kohtake, H., Tanaka, H., Suzuki, S.: New and reliable MRI diagnosis for progressive supranuclear palsy. Neurology 64(12) 3. Warmuth-Metz, M., Naumann, M., Csoti, I., Solymosi, L.: Measurement of the midbrain diameter on routine magnetic resonance imaging: a simple and accurate method of differentiating between parkinson disease and progressive supranuclear palsy. Arch. Neurol. 58, 1076–1079 (2001) 4. Bhattacharya, K., Saadia, D., Eisenkraft, B.e.a.: Brain magnetic resonance imaging in multiple-system atrophy and parkinson disease: a diagnostic algorithm. Arch. Neurol. 59, 835–842 (2002) 5. Han, X., Pham, L.D., Tosun, D., Rettmann, M.E., Xu, C., Prince, J.L.: CRUISE: Cortical reconstruction using implicit surface evolution. Neuro Image 23(3) (2004) 6. Tosun, D., Rettmann, M.E., Resnick, S.M., Pham, D.L., Prince, J.L.: Cortical reconstruction using implicit surface evolution: Accuracy and precision analysis. Neuro Image (2005) 7. Tosun, D., Prince, J.L.: Cortical surface alignment using geometry driven multispectral optical flow. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, Springer, Heidelberg (2005) 8. Ciofolo, C., Barillot, C.: Brain segmentation with competitive level sets and fuzzy control. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 10–15. Springer, Heidelberg (2005) 9. Gilman, S., Low, P.A., Quinn, N., Albanese, A., Ben-Shlomo, Y., Fowler, C.J., Kaufmann, H., Klockgether, T., Lang, A.E., Lantos, P.L., Litvan, I., Mathias, C., Oliver, E., Robertson, D., Schatz, I., Wenning, G.K.: Consensus statement on the diagnosis of multiple system atrophy. J. Neurol. Sci. 163, 94–98 (1999)
3-D Analysis of Cortical Morphometry in Differential Diagnosis
899
10. Litvan, I., Agid, Y., Calne, D., Campbell, G., Dubois, B., Duvoisin, R.C., Goetz, C.G., Golbe, L.I., Grafman, J., Growdon, J.H., Hallett, M., Jankovic, J., Quinn, N.P., Tolosa, E., Zee, D.S.: Clinical research criteria for the diagnosis of progressive supranuclear palsy (steele-richardson-olszewski syndrome): report of the ninds-spsp international workshop. Neurology 47, 1–9 (1996) 11. Sled, J.G., Zijdenbos, A.P., Evans, A.C.: A non-parametric method for automatic correction of intensity non-uniformity in mri data. IEEE Trans. Medical Imaging 17, 87–97 (1998) 12. Griffin, L.D.: The intrinsic geometry of the cerebral cortex. J. Theoretical Biology 166(3), 261–273 (1994) 13. Beatty, J.: The Human Brain: Essentials of Behavioral Neuroscience. Sage Publications, Inc, Thousand Oaks (2001) 14. Prados, E., Soatto, S.: Fast marching method for generic shape from shading. In: Paragios, N., Faugeras, O., Chan, T., Schn¨ orr, C. (eds.) VLSM 2005. LNCS, vol. 3752, Springer, Heidelberg (2005) 15. Koenderink, J.J., van Doorn, A.J.: Surface shape and curvature scales. Image and Vision Computing 10(8), 557–565 (1992) 16. Pillon, B., Dubois, B., Lhermitte, F., Agid, Y.: Heterogeneity of cognitive impairment in progressive supranuclear palsy, Parkinson’s disease, and Alzheimer’s disease. Neurology 36(9), 1179 (1986)
Tissue Characterization Using Fractal Dimension of High Frequency Ultrasound RF Time Series Mehdi Moradi1 , Parvin Mousavi1, and Purang Abolmaesumi1,2 1
2
School of Computing, Queen’s University, Kingston, Canada Department of Electrical and Computer Engineering, Queen’s University {moradi,pmousavi,purang}@cs.queensu.ca
Abstract. This paper is the first report on the analysis of ultrasound RF echo time series acquired using high frequency ultrasound. We show that variations in the intensity of one sample of RF echo over time is correlated with tissue microstructure. To form the RF time series, a high frequency probe and a tissue sample were fixed in position and RF signals backscattered from the tissue were continuously recorded. The fractal dimension of RF time series was used as a feature for tissue classification. Feature values acquired from different areas of one tissue type were statistically similar. For animal tissues with different cellular microstructure, we successfully used the fractal dimension of RF time series to distinguish segments as small as 20 microns with accuracies as high as 98%. The results of this study demonstrate that the analysis of RF time series is a promising approach for distinguishing tissue types with different cellular microstructure.
1
Introduction
Ultrasound-based tissue characterization techniques rely on different scattering patterns of ultrasound in tissues with dissimilar cellular microstructures. Although the exact physical mechanisms that govern these patterns are not well understood [1], microstructure-induced differences in ultrasound-tissue interaction are documented both at clinical (2-10 MHz) frequencies [2] and at higher frequencies [1,3]. In other words, ultrasound Radio-Frequency (RF) echoes contain information about tissue characteristics. However, it is challenging to disentangle this information from the variations in the signal caused by the systemdependent effects, such as mechanical and electrical properties of the transducer and diffraction effects due to the finite aperture of the transducer. This fundamental restriction of ultrasound-based tissue characterization techniques limit their sensitivity and specificity in diagnosis of cancer lesions [4,5]. In a new approach to analysis of RF echoes for tissue characterization, we have recently proposed that if a specific location in tissue undergoes continuous interactions with ultrasound, the time series of the RF echo signals (see Figure 1) from that location carries “tissue characterizing” information [6,7]. In other words, although variations in the intensity of one sample of RF echo over time are partly due to the electronic noise of the ultrasound machine or the errors caused during N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 900–908, 2007. c Springer-Verlag Berlin Heidelberg 2007
Tissue Characterization Using Fractal Dimension
901
Fig. 1. RF time series: Sequential echoes received from one location of tissue
the beam-forming process [8], they depend on the tissue type as well. Specifically, the fractal dimension (FD) of the RF time series was successfully used to detect cancerous lesions in prostate tissue [7], as well as to distinguish different animal tissues [6]. These tissue characteristic variations might be due to the vibrations in the microstructural level induced by the continuous emission of ultrasound beams. However, more studies on the origin of this phenomenon are necessary. Specifically, in the current paper we report the results of our new research that addresses three fundamental questions about the RF time series: 1) Are the results of tissue classification based on the RF time series, correlated with the microstructure of the tissue? In other words, is this approach more successful in separating tissue types with significantly different microstructures? 2) What is the effect of utilizing high frequency ultrasound on the outcome of the method? It is a well-known fact that at very high frequencies the scattering of ultrasound is primarily caused by the cellular microstructure [1] as opposed to tissue macrostructure. Therefore, the dependence of the FD of RF time series on cellular microstructure should be more evident in high frequency data. 3) Are the tissue characterizing properties of RF time series emphasized in A-mode? Some of the system related alterations in the RF time series acquired in B-mode, including the beam-forming, are absent in A-mode imaging. To answer these questions, we analyzed RF echo time series acquired using high frequency ultrasound A-mode probes. Based on microscopic studies, we showed that at these frequencies, the separability of tissues based on FD of RF time series is closely related to differences in tissue microstructures. We used FD of the RF times series to successfully distinguish segments as small as 20 microns of animal tissues of dissimilar microstructures with accuracies as high as 98%. Furthermore, the FD values calculated from the RF time series of different tissues showed statistically significant differences, far beyond the variations in FD values in one tissue type. These experiments suggest the presence of microstructurerelated information in the RF time series. Some of the pathologic conditions, including cancer, are characterized by a dramatic change in the microstructure of the affected tissue. Therefore, the proposed tissue characterization approach could lead to an effective method for diagnosis of certain types of cancer [7]. The rest of this paper is organized as follows: Section 2 introduces the data
902
M. Moradi, P. Mousavi, and P. Abolmaesumi
collection method, feature extraction and classification approaches, Section 3 presents our results and discussions, and Section 4 provides a summary and the conclusions.
2
Methods
To study the tissue characterizing capabilities of RF echo time series acquired at higher frequencies, we used four different tissue types: bovine liver, pig liver, bovine muscle, and chicken breast. As illustrated in Figure 2, the cellular structure of both bovine and pig liver are characterized by hepatocyte cells (of slightly different shape and density), whereas bovine muscle and chicken breast both have fibrous structures formed by sarcomeres. The high frequency ultrasound RF time series in this study were collected using a Vevo 770 high resolution ultrasound system (VisualSonics Inc., Toronto, Canada) with RMV706 and RMV711 scanheads (see Table 1 for the specifications). The depth of scanning was about 1mm which corresponded to 512 signal samples. While the tissue and the probe were fixed in place, we continually acquired 500 ultrasound A-lines (frame rate: 60 fps) from one spot of the tissue. In other words, we formed a time series of length 500 from each sample of the ultrasound A-line. With each probe type, we acquired two separate lines of RF time series from two different areas of each tissue type. Table 1. Specifications of the high frequency ultrasound scanheads Model Broadband frequency Center frequency Axial resolution RMV711 Up to 82.5 MHz 55 MHz 30 μ Up to 60 MHz 40 MHz 40 μ RMV706
2.1
Feature Extraction
In this study, tissue types were characterized by the average of FDs computed for all the RF time series in a Region of Interest (ROI). Our high frequency data was acquired in A-mode. Therefore, ROIs were simply segments of RF lines (each signal sample covered a depth of almost 2 microns). FD of time series originating from natural processes has been extensively studied as a parameter that quantifies nonlinear internal dynamics of complex systems [9,10]. In such systems, the mechanisms of interaction that give rise to the output time series are not well understood. FD has been shown to have low sensitivity to the noise-induced variations [11]. In RF time series analysis, the microstructural information is received along with noise-related variations. Therefore, we chose FD to characterize the RF time series. We used Higuchi’s algorithm [12] for computation of the FD of time series which can be summarized as follows: Each sample of the RF data forms a time series {X(1), X(2), ..., X(N )}
Tissue Characterization Using Fractal Dimension
(a) Bovine liver
(b) Pig liver
(c) Chicken breast
(d) Bovine muscle
903
Fig. 2. Images of the cellular structure of tissue types used in this study at 200X magnification (acquired from H&E stained slides with a Zeiss AxioImager M1 microscope)
over sequential ultrasound frames, where N = 500 for our high frequency RF data. From this time series, we first construct k new time series of form: Xkm : {X(m), X(m + k), X(m + 2k)..., X(m + [
N −m ].k)} k
(1)
where k is the sampling time interval (which determines the scale, k < N ) and m = 1, 2, ..., k − 1. Both m and k are integers. The length of each time series, Lm (k), is defined as: [ N −m ] k N −1 |X(m + ik) − X(m + (i − 1).k)| (2) Lm (k) = 1/k × ( N −m ) × [ k ].k i=1 The average value of Lm (k) over k sets, L(k), is the so-called length of the time series at scale k. This procedure is repeated for each k ranging from 1 to kmax . A line is fitted to values of ln (L(k)) versus ln (1/k) and the slope of this line is considered as the FD. The number of the samples, N , and the nature of the time series determine the optimal value of the parameter kmax . For the current study, the value of kmax was optimized based on the average classification accuracy acquired. We examined kmax values between 4 and 56. Feature extraction for each A-line involved computation of FD of 512 time series of length 500. We call the output of this process an FD vector.
904
2.2
M. Moradi, P. Mousavi, and P. Abolmaesumi
Classification
All classification results reported in the current paper were acquired with a Bayesian approach. If ω1 and ω2 represent ROIs from two categories of tissue involved in one of our classification experiments, and x represents the feature value of a given ROI (which its category is unknown), Bayes’ rule states that the classification can be performed based on the following inequalities: P (x|ω1 )P (ω1 ) ≷ P (x|ω2 )P (ω2 )
(3)
P (ω1 ) and P (ω2 ) are a priori probabilities (which can be simply calculated as the ratio of the number of ROIs in each category to the total number of ROIs in the two categories). P (x|ω1 ) and P (x|ω2 ) are the probability density functions (pdf) of feature values in categories 1 and 2 respectively. We fit a Gaussian pdf to the distribution of the feature in each category. We followed a leave-10%-out approach to validate the classification procedure. In other words, we randomly partitioned the data in each category to 10 folds. We evaluated the pdfs on 90% of the data samples, classified the remaining 10% based on the evaluated pdfs, and repeated the procedure for all 10 portions of the data. We repeated the whole leave-10%-out process 200 times (each time with a random partitioning of the ROIs to 10 folds). The mean accuracies and standard deviations reported in our results were recorded over these 200 trials.
3
Results and Discussions
FD vectors from the same tissue types: The first step in our analysis was to perform one-way ANalysis Of VAriance (ANOVA) tests on pairs of FD vectors from the same tissue types. ANOVA is a statistical test in which the null hypothesis is the equality of means in samples from two different populations. As Table 2 illustrates, when two FD vectors from the same tissue type were compared, the p-values in ANOVA tests were relatively large, also the samples from two lines could not be distinguished (classification accuracies close to 50%). The ROI size used for classification was 20 microns (10 samples) and Kmax = 16. Table 2. Comparison of two FD vectors from two RF lines of one tissue type Tissue type ANOVA p-value accuracy in separating ROIs from the two lines RMV711 results on RMV711 - mean (STD) Bovine liver 0.47 52% (3.7) 0.007 47% (3.9) Pig liver 0.0001 59% (3.1) chicken breast 0.68 53% (4.3) Bovine muscle
Tissue Characterization Using Fractal Dimension
905
Table 3. Comparison and classification of data from different tissue types ANOVA p-value (RMV711) Bovine liver - chicken breast 0 0 Bovine liver - bovine muscle 0 Chicken breast - pig liver 0 Pig liver - bovine muscle 0 Bovine liver - pig liver Chicken breast - bovine muscle 5.7 × 10−13 Average Tissue types
mean (STD) mean (STD) Res: 2μ Res: 20μ (RMV711) (RMV711) 81.1% (2.5) 92.2% (5.8) 84.1% (2.3) 95.5% (4.0) 84.6% (2.3) 96.0% (4.2) 89.2% (2.1) 98.2% (3.1) 73.7% (3.0) 83.7% (7.1) 64.1% (3.1) 72.2% (8.9) 79.5% 89.6%
mean(STD) Res: 20μ (RMV706) 96.9% (3.5) 93.7% (5.3) 92.3% (5.3) 90.0% (6.4) 65.1% (7.9) 63.1% (8.1) 83.2%
FD vectors from different tissue types (Kmax = 16): We performed the ANOVA tests on FD vectors of different tissue types (Table 3). The p-values were all virtually zero and showed that the vectors were statistically different in all six pairs. Two separate FD vectors from each tissue type, computed from the data acquired on RMV711 scanhead, were available. We combined the two vectors of each tissue type to acquire a single vector of length 1000 and used the Bayesian approach described in the previous section to perform pairwise classifications. The results for these classification trials which were in single RF sample resolution, are reported in column 3 of Table 3. It is interesting to note that even at this resolution, we were successful in classification when the two involved tissue types were from different microstructural categories (rows 1-4); however, when pig liver was compared with bovine liver (row 5) or the two fibrous tissue types were compared (row 6), the classification at this extremely high resolution produced low accuracies. Furthermore, we examined the performance of our approach at a lower resolution. We averaged 10 samples of each FD vector to acquire vectors of length 50 (100 after combining the two lines from RMV711). Each element of these vectors represented an ROI of size 20 microns. The results of pairwise classification experiments at this level of resolution are presented in column 4, Table 3. In general, the classification accuracy is significantly higher in this lower resolution (overall accuracy of 89.6% on RMV711). The mean accuracy for tissues in different microstructural categories was around 95% (rows 1-4) and for tissues with similar microstructures was around 80% (rows 5-6). For validation purposes, the classification process (at 20 micron resolution) was repeated on a similar dataset that was acquired on scanhead RMV706 (which operates at a lower frequency and axial resolution). The results are reported in column 5, Table 3. In general, the overall outcome declined in comparison with RMV711 data (average over all: 83.2%). However, the same pattern of performance (excellent on different microstructures, moderate on similar microstructures) was observed. The overall decrease in the classification results can be explained by the lower axial resolution of RMV706.
906
M. Moradi, P. Mousavi, and P. Abolmaesumi
90
90 88 average accuracy (%)
average accuracy (%)
85
80
75
70
65
86 84 82 80
10
20
30 Kmax
(a)
40
50
78 0
5
10 ROI size
15
20
(b)
Fig. 3. (a) The average of classification accuracy over six pairs of tissue for different values of Kmax (at resolution of 10 samples). (b) The average of classification accuracy over six pairs of tissue for different number of samples in an ROI (Kmax = 16).
Optimal Kmax value: We examined different possible values for Kmax (or maximum scaling level of the signal) in Higuchi’s algorithm. In Figure 3-a, the average accuracy of tissue classification over six pairs of tissue types is plotted against the values of Kmax between 4 and 56. Values between 10 and 32 resulted in very similar outcomes. The Higuchi algorithm becomes increasingly computationally expensive for large values of Kmax . We chose Kmax = 16 as a reasonably small number that also resulted in maximum accuracy. This is also in agreement with previous findings about the optimal Kmax value on RF time series acquired from human prostate specimens [7]. Optimal ROI size: As Figure 3-b illustrates, the classification accuracy increased when ROIs of larger size were used. However, we were limited by the size of the dataset. Increasing the size of the ROI to over 10 samples meant that the Gaussian PDFs were estimated on less than 100 data points and tested on less than 10 points in our leave-10%-out classification approach. It appears that this low number of samples was not sufficient for training and testing of the classifiers and therefore, we witnessed an unexpected and irregular decrease in the accuracy of classification for ROIs of sizes larger than 10 samples. Comparison with results at 6.6 MHz: As previously reported, even at frequencies normally utilized on clinical machines (2-10MHz), the RF time series contain tissue characterizing information [7,6]. However, the maximum resolution is much lower. For comparison, we used a Sonix RP (Ultrasonix Inc., Vancouver, Canada) ultrasound machine to collect RF time series at 6.6 MHz from the same specimens that we had scanned at high frequencies. The temporal length of time series (number of frames taken from each cross-section) was 255 and the data was collected with a BPSL9-5/55/10 probe at the rate of 22 frames per second. ROIs of size 8 × 44 RF samples (equivalent to 0.03cm2) of the tissue were used in classification; 150 ROIs from each tissue type were available. Results reported in Table 4 show an overall accuracy of around 76.5%.
Tissue Characterization Using Fractal Dimension
907
Table 4. Results of applying the proposed tissue classification approach to the data acquired on a clinical ultrasound machine (probe center frequency: 6.6 MHz) Tissue types classification accuracy (STD) Bovine liver - chicken breast 82.9% (6.4) 80.7% (6.8) Bovine liver - bovine muscle 71.4% (6.7) Chicken breast - pig liver 74.8% (7.5) Pig liver - bovine muscle 69.3% (5.3) Bovine liver - pig liver 79.6% (5.9) Chicken breast - bovine muscle Average over all six tissue pairs 76.5%
4
Conclusions
In this paper we reported the exploitation of high-frequency RF time series for tissue characterization. We used fractal dimension of RF echo time series acquired on ultrasound probes operating at center frequencies of 55 MHz and 40 MHz to successfully characterize tissue types of different microstructure at the resolution of only 20 microns. The correlation of variations in RF time series with tissue microstructure was evident. FD vectors acquired from different areas of one tissue type were very similar (Bayesian approach resulted in only around 50% successful separation of ROIs from two areas of one tissue type). For two different tissue types which were from a similar category of microstructure (both mammalian liver or both fibrous muscles), the ROIs could be separated with accuracies around 80%. For tissue types from different microstructural categories, the classification of ROIs based on the Gaussian approach was nearly perfect (up to 98% accuracy). The same approach was also applied to data acquired on a clinical ultrasound machine and an average accuracy of around 77% was observed at the resolution of 0.03cm2 . These findings strongly suggest that the microstructure of the tissue has an effect on the variations of the RF time series. This concept can potentially be used in ultrasound-based detection of pathologic conditions such as cancer. Acknowledgement. The authors would like to thank Mr. G. Leney from VisualSonics Inc. and Mr. R. Watering for their help in data collection. This work was supported by Natural Sciences and Engineering Research Council of Canada (NSERC), and Institute of Robotics and Intelligent Systems (IRIS).
References 1. Foster, F.S., Pavlin, C.J., Harasiewicz, K.A., Christopher, D.A., Turnbull, D.H.: Advances in ultrasound biomicroscopy. Ultrasound in Med. & Biol. 26, 1–27 (2000) 2. Akashi, N., Kushibiki, Dunn, N.C.F.: Acoustic properties of selected bovine tissues in the frequency range 20-200 MHz. J. of Acoust. Soc. Am. 98(6), 3035–3039 (1995) 3. Goss, S.A., Johnston, R.L., Dunn, F.: Compilation of empirical ultrasonic properties of mammalian tissues. II. J. of Acoust. Soc. Am. 68, 93–108 (1980)
908
M. Moradi, P. Mousavi, and P. Abolmaesumi
4. Scheipers, U., Ermert, H., Garcia-Schurmann, H.J.S.M., Senge, T., Philippou, S.: Ultrasonic multifeature tissue characterization for prostate diagnosis. Ultrasound Med. Biol. 20(8), 1137–1149 (2003) 5. Lizzi, F.L., Feleppa, E.J., Astor, M., Kalisz, A.: Statistics of ultrasonic spectral parameters for prostate and liver examination. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 44(4), 935–942 (1997) 6. Moradi, M., Mousavi, P., Isotalo, P.A., Siemens, D.R., Sauerbrei, E.E., Abolmaesumi, P.: A new approach to analysis of RF ultrasound echo signals for tissue characterization: results of animal studies. In: Proceedings of SPIE conference on Medical Imaging, vol. 6513, pp. 65130P1–65130P10 (2007) 7. Moradi, M., Abolmaesumi, P., Isotalo, P.A., Siemens, D.R., Sauerbrei, E.E., Mousavi, P.: A new feature for detection of prostate cancer based on RF ultrasound echo signals. In: IEEE Ultrasonics Symposium, pp. 2084–2087. IEEE Computer Society Press, Los Alamitos (2006) 8. Thomenius, K.: Evolution of ultrasound beamformers. In: Proc. IEEE Intl Ultras Symp., pp. 1615–1622. IEEE Computer Society Press, Los Alamitos (1996) 9. Accardo, A., Affinito, M., Carrozzi, M., Bouquet, F.: Use of the fractal dimension for the analysis of electroencephalographic time series. Biological Cybernetics 77(5), 339–350 (1997) 10. Henderson, G., Ifeachor, E., Hudson, N., Goh, C., Outram, N., Wimalaratna, Percio, C.D., Vecchio, F.: Development and assessment of methods for detecting dementia using the human electroencephalogram. IEEE Transactions on Biomedical Engineering 53(8), 1557–1668 (2006) 11. Shono, H., Goldberger, C.K.P.A.L., Shono, M., Sugimori, H.: A new method to determine a fractal dimension of non-stationary biological time-serial data. Computers in Biology and Medicine 30(4), 237–245 (2000) 12. Higuchi, T.: Approach to an irregular time series on the basis of the fractal theory. Physica D 31(2), 277–283 (1988)
Towards Intra-operative 3D Nuclear Imaging: Reconstruction of 3D Radioactive Distributions Using Tracked Gamma Probes Thomas Wendler1 , Alexander Hartl1 , Tobias Lasser1, Joerg Traub1 , Farhad Daghighian2 , Sibylle I. Ziegler3 , and Nassir Navab1 1
3
Computer Aided Medical Procedures (CAMP), TUM, Munich, Germany 2 IntraMedical Imaging LLC, Los Angeles, California, USA Nuclear Medicine Department, Klinikum rechts der Isar, TUM, Munich, Germany
Abstract. Nuclear medicine imaging modalities assist commonly in surgical guidance given their functional nature. However, when used in the operating room they present limitations. Pre-operative tomographic 3D imaging can only serve as a vague guidance intra-operatively, due to movement, deformation and changes in anatomy since the time of imaging, while standard intra-operative nuclear measurements are limited to 1D or (in some cases) 2D images with no depth information. To resolve this problem we propose the synchronized acquisition of position, orientation and readings of gamma probes intra-operatively to reconstruct a 3D activity volume. In contrast to conventional emission tomography, here, in a first proof-of-concept, the reconstruction succeeds without requiring symmetry in the positions and angles of acquisition, which allows greater flexibility. We present our results in phantom experiments for sentinel node lymph node localization. The results indicate that 3D intra-operative nuclear images can be generated in such a setup up to an accuracy equivalent to conventional SPECT systems. This technology has the potential to advance standard procedures towards intra-operative 3D nuclear imaging and offers a novel approach for robust and precise localization of functional information to facilitate less invasive, image-guided surgery.
1
Introduction
Nuclear medicine has become one of the most dynamic branches of medicine in today’s diagnostic field. It is based on the use of radio-labeled, highly specific tracers that target functions in the body and can be imaged using gamma cameras, SPECT (single photon emission tomography) or PET (positron emission tomography) [1, 2]. In the particular case of SPECT a 3D gamma-emitting volume is reconstructed from several radial 2D orthographic projections acquired with gamma cameras [2]. As in CT (computed tomography), the 3D volume reconstruction requires a complete set of projections and cylindrical symmetry of them. This and further issues like bulky equipment (≈ 2 × 2 × 3 [m3 ]), resolution (≈ 5 [mm]) and acquisition time (≈ 20 [min] pro bed position) make 3D nuclear imaging systems unsuitable for intra-operative use, so they are mostly restricted to diagnosis and planning. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 909–917, 2007. c Springer-Verlag Berlin Heidelberg 2007
910
T. Wendler et al.
In order to overcome these limitations, hand-held non-imaging radioactivity detectors like gamma probes were introduced [3]. These are common diagnostic devices nowadays, especially during surgery and sentinel lymph node determination [4]. The main advantages of these devices lie in their portability (≈ 200[g], ≈ 1 × 1 × 10 [cm3 ]), simplicity, and the possibility of miniaturizing them for intra-operative use. Hand-held gamma cameras have also entered the field recently [5], however the restrictions imposed by size (≈ 10 × 10 × 50 [cm3 ]) and weight (≈ 3 [kg]) limit their usefulness. The combination of intra-operative nuclear devices with position and orientation (’pose’) tracking has extended the use of this technology further. Wendler et al. introduced tracking of beta-probes for activity surface reconstruction, visualization and intra-operative guidance [6]. The beta radiation considered there was emitted from superficial tissue, and consequently, only an activity surface reconstruction was proposed. Benlloch et al. proposed tracking of 2D gamma cameras and the use of 2D acquisitions for limited 3D intra-operative functional imaging [7]. In that approach the 3D position is reconstructed from two 2D images taken with an angle close to 90 degrees or three 2D images with relative angles close to 120 degrees. However, these constraints greatly reduce the flexibility of this approach due the size of the devices and the requirements on the acquisition angles. Moreover, the detectable information is limited to point-like sources as a Computer Vision approach based on triangulation of corresponding points is taken for 3D reconstruction. In this work we introduce a novel approach employing tracked gamma probes and algorithms for 3D tomographic reconstruction based on gamma readings and the synchronized 3D pose of the detector. Thus, most of the limitations complicating intra-operative nuclear imaging are removed. This approach includes the advantages of previous systems and surpasses them by allowing easier handling, faster acquisition times and most importantly, 3D intra-operative functional imaging of general distributions. Applications like partial lymphadenectomy and sentinel lymph node detection [8, 9] will greatly profit from the additional flexibility and the 3D nature of this kind of imaging. The added depth information allows detection of occlusions, aiding the resection of lymph nodes around a tumor, and increases the possibility to identify additional marked nodes not visible in pre-operative images, which are difficult to distinguish using conventional gamma probes or cameras (which additionally suffer from accessibility issues). Thus the proposed imaging modality will permit better intra-operative control of the operation yielding less invasive and more reliable procedures.
2 2.1
Materials and Methods Hardware Components
Image acquisition is facilitated by a setup of a standard, collimated gamma probe together with an optical tracking system as outlined in figure 1, a detailed description can be found in [10].
Towards Intra-operative 3D Nuclear Imaging
911
Fig. 1. The system setup consists of a 4-camera optical tracking system (exemplary camera marked as A), a gamma probe with a custom-made collimator and infrared markers to facilitate tracking (B) attached to a control unit (C), sending the data to a central workstation which also gathers the tracking data. A foam phantom with injected radioactivity (D) was placed inside the abdomen phantom (E), again with attached infrared markers. A tracked laparoscope (F) is used to generate an augmented reality visualization.
For validation, PET/CT images of the phantoms were generated a Biograph 16 PET/CT scanner by Siemens Medical Solutions (Erlangen, Germany). Intra-operative 3D visualization as well as data synchronization was implemented by extending a framework for medical augmented reality systems [11]. A tracked and calibrated camera (Telecam SL NTSC by Karl Storz, Tuttlingen, Germany) was also added to the setup for augmented reality visualization. The calibration of this camera was done using the procedure described in [12]. The visualization itself includes the rendering of the pre-operative PET/CT onto the registered image of the camera using 3D textures, where each voxel is rendered with a color and transparency as a function of its value. The most adequate choice of visualization was not part of the scope of this work, however it is subject of evaluation within our group [13]. During the scan, the positions of the acquired measurements are also augmented onto the scene as a point cloud (figure 2(a)), color-coded according to the measured activity (no activity in green, red for positions with activity). For better 3D perception the points can optionally be displayed as vectors, adding the missing orientation component to the visualization. Afterwards, a volume rendering of the reconstructed volume is overlaid onto the image (figure 2(b)). The user can acquire more points at any point if the reconstructed image is not of the desired quality. The different reconstruction methods may be switched on-the-fly. 2.2
Modelling and Reconstruction
The reconstruction problem in emission tomography consists of determining the activity distribution in a finite volume in space based on the readings of a sensor
912
T. Wendler et al.
(a) AR view of measurements.
(b) AR view of reconstruction.
Fig. 2. (a) Green and red points mark the acquisition path of the gamma probe (red for active, green for inactive), the reconstruction grid is marked in cyan. (b) Reconstructed activity distribution marked in yellow.
array, for example a SPECT device. Let xj , j = 1, . . . , M , denote an equidistantly spaced discrete 3D grid of voxels, where each voxel xj has a total activity of fj . Further let gi denote the reading of each sensor i in the array, i = 1, . . . , N . As in standard practice, we assume that each reading gi is a linear function of the activity fj of all voxels xj , in short gi =
M
Hij fj .
(1)
j=1
For a given sensor i, Hij denotes a vector correlating the effect of the acitivities fj at positions xj , j = 1, . . . , M , to the sensor reading gi . The entire matrix Hij is also known as the system matrix, or as the forward model describing radiation propagation through the volume to the detector positions. The process of reconstruction then solves the ill-posed problem of determining the set (fj )j=1,...,M given the readings of the N sensors, or in terms of equations, the solution of the linear equation system formed by equation (1) for i = 1, . . . , N . In the case of intra-operative gamma probes, each probe readout gi is to be accompanied by a position and orientation vector pair (pi , dˆi ) provided by the tracking system. These readings can be considered independently, assigning to each of them an own row in the system matrix. Hij represents in this case the influence of the activity in voxel j on the readout of the probe at the pose (pi , dˆi ). Care has to be taken to ensure a synchronized readout and that the coordinate systems of the tracked probe and the reconstruction grid are properly co-calibrated. To determine the system matrix (Hij ) a forward model has to be developed, describing the propagation of gamma radiation through space and characterizing the detection process at the sensor. In our model we consider the field of view and the sensitivity of the probe, the stochastic nature of radioactive decay and detection as well as the absorption in the detector, geometrical attenuation and
Towards Intra-operative 3D Nuclear Imaging
913
background noise. More details of this modelling process can be found in [14]. To determine the unknown variables a set of measurements was acquired at different distances and positions from the probe and the parameters are fitted to these measurements using a best-neighbor optimizer. To solve the ill-posed equation system (1) many algorithms have been proposed for SPECT and PET [2]. In our implementation we used both algebraic (randomized algebraic reconstruction technique, ART) and stochastic approaches (maximum-likelihood expectation maximization, ML-EM) to obtain an approximated solution. We also evaluated other numerical tools such as SVD with Tikhonov regularization [15] to approximately solve the linear system.
3
Experiments
To validate the developed theory a set of phantom experiments was performed. A foam phantom of an organ was injected in 4 positions with a double marked solution containing T c-99m and F -18 (50 [kBq] and 20 [kBq] respectively per milliliter) to simulate marked lymph nodes as one expect in a sentinel lymph node localization procedure. F -18 was used to enable imaging with PET/CT and did not influence the gamma probe readings (sensitivity of 5.6 [cps/M Bq] for a point source of F -18 at a distance of 5 [cm] along the axis of the probe). After acquiring a PET/CT (5 [min] per bed position) of the phantom, several gamma probe scans were performed for each experiment by three different test persons. Slight movement and deformation of the phantom during the scan was permitted to introduce realistic artifacts. The test persons were able to do reconstructions on-line to assess the quality of the generated images and plan further acquisition paths. The points already acquired were also augmented onto the image for this reason. The voxel size used was 5.3 × 5.3 × 6 [mm3 ] (similar to the one of a conventional SPECT) for a volume of 20 × 7 × 17 (width × height × depth). The number of iterations for ML-EM and ART were fixed empirically to yield qualitative good results (14 and 20 iterations respectively) and the initial guess was set to 0.
4
Results
The experiments yielded promising results. The qualitative comparison of the reconstructed images with PET is satisfying up to the level that it may be considered for intra-operative image-guidance (figures 3(a) vs. 3(b)). The quantitative comparison using normalized cross-correlation and the deviation of the centroid of the detected blobs from the ones visible in PET is outlined in Table (1). These measurements include not only the reconstruction error (when solving the linear system) but also the registration error of the probe and the reconstruction grid as well as tracking inaccuracy, and as such may be used as system performance measure. The variant of randomized ART employed is not performing well as an inversion routine for our forward model, both quantitatively and qualitatively, as seen
914
T. Wendler et al.
(a) Reconstructed slice.
(b) Corresponding PET slice.
Fig. 3. Comparison of one reconstructed slice and the corresponding PET slice. MLEM was used for inversion, the acquisition consisted of 5198 readings (≈ 6 [min]). Table 1. Evaluation of quality of the reconstructed images versus PET for different inversion schemes. The first column contains the normalized cross-correlation of both modalities, the remaining columns display the mean deviation in [mm] of the centroids of the detected objects in the reconstruction from the ones visible in PET. Values are given as mean ± standard deviation for the experiments. NCC Method ART 0.252 ± 0.199 ML-EM 0.512 ± 0.077 SVD 0.521 ± 0.014
Deviation in x 2.686 ± 9.020 0.778 ± 2.973 0.167 ± 4.775
Deviation in y 3.488 ± 8.836 3.502 ± 3.674 2.017 ± 3.798
Deviation in z 1.251 ± 5.570 2.717 ± 4.174 6.136 ± 4.156
in Table (1). Reconstructions using regularized SVD inversion yield good qualitative results, while being visually adequate. Our clear method of choice is ML-EM. This algorithm is fast (less than 20 seconds in a up-to-date standard PC) and the reconstruction evince good quantitative and qualitative performance. Deviations in the y axis (dorsal direction) are more marked than in the other dimensions. This is due to the nature of the acquisition where readings are mainly acquired on top of the abdomen phantom and thus lacking projections to better resolve the dorsal axis. Overall however, deviations are on the order of the pixel size of the reconstruction grid.
5
Discussion
Gamma probes have been used for sentinel lypmh node determination for years [3, 4]. The high sensitivity and specificity values achieved speak of a robust technique especially for melanoma and breast cancer [8, 9]. Hand-held gamma cameras enhance this technique even further by adding imaging [5]. The inclusion of 3D imaging thus would not improve the standard technique significantly in terms of sensitivity and specificity. However, we believe that the major
Towards Intra-operative 3D Nuclear Imaging
915
contribution of this work is the gain of full 3D perception of the node distribution and thus the capability of allowing precise localization in depth, which would be almost impossible with the current technology. This is especially true in the case of a partial lymphadenectomy where current technology only allows a rough distinction of the affected nodes resulting in suboptimal resection and higher morbidity. The presence of background activity is an issue to be investigated in the future mostly for molecular markers like F -18-FDG. Here most probably the sparse information would not be enough to guarantee a valid reconstruction that is comparable to pre-operative imaging systems. A solution for this could be the use of compressed sensing approaches like the ones proposed in [16]. In the case of the proposed applications (partial lymphadenectomy and sentinel lymph node localization), this however does not play any role, since the marking is achieved by injecting radioactivity before resection, thus making valid the assumption that at a certain instant in time only downstream lymph nodes will present radioactive uptake (several hot spots and almost no background). The inclusion of tracking into the operation room should not change the workflow dramatically, nor add much complexity to the operation room. A relevant issue is the robustness of optical tracking systems mostly in terms of occlusion problems. We believe that this will not be a major issue in this application if several cameras are placed in the operation room. Furthermore, the scan can be performed by one person while the surgical team can remain at a proper distance from the patient, thus avoiding occlusions and guaranteeing a better tracking. The influence of the proposed changes in the work-flow is subject of current research. In regard to the patient dose, our system would require no extra activity than the one used for radio-guided lymphadenectomy or sentinel lymph node localization (≈ 2 [mSv] and < 1 [mSv] respectively versus the 5 − 7 [mSv] of pre-operative imaging). We believe that this burden can be neglected when considering the improvement in therapy we aim at. The reconstructed images are valid as long as the reconstructed region containing the activity does not move or deform. The intra-operative, on-the-fly nature of the reconstruction makes it valuable for correction and deformation of the pre-operative imaging data and thus update the surgical plan. This will enable more precise intra-operative localization. The imaging process is short and can be repeated as many times as needed. A final issue is the fact that the reconstruction obtained shows a blurring of the activity blobs in dorsal direction. This effect can be explained due to the missing information in that axis for the reconstruction. However, since the blobs can clearly be recognized in the images and their upper border is placed correctly when compared with the PET images, we strongly believe that the images are sufficient for precise localization in the suggested applications, which was previously not possible at all. Overall accuracy of the reconstructions are within theoretical system limits and are on par with conventional SPECT systems
916
T. Wendler et al.
in terms of resolution and contrast. Defining if the current implementation would make sense for specific clinical applications in terms of specifications is part of our work-in-progress.
6
Conclusions
This paper presents an approach towards intra-operative 3D nuclear imaging employing a tracked gamma probe. The resulting reconstruction accuracy is comparable to SPECT and suggests further development and clinical evaluation of this technique. In addition, the tracking system can further be taken advantage of by using it to track surgical instruments enabling navigation within the same coordinate system as the reconstruction. For the proposed applications in particular this would enable guided biopsy of the sentinel lymph nodes and precise node resection in partial lymphadenectomy.
References 1. Phelps, M.E.: PET: The merging of biology and imaging into molecular imaging. J. Nuc. Med. 41, 661–681 (2000) 2. Wernick, M.N., Aarsvold, J.N.: Emission Tomography: The Fundamentals of PET and SPECT. Academic Press, London (2004) 3. Hoffman, E.J., et al.: Intraoperative probes and imaging probes. Eur. J. Nucl. Med. Mol. Imaging 26, 913–935 (1999) 4. Harish, K.: Sentinel node biopsy: concepts and current status. Front. Biosci. 10, 2618–2644 (2005) 5. Pitre, S., et al.: A hand-held imaging probe for radio-guided surgery: physical performance and preliminary clinical experience. Eur. J. Nucl. Med. Mol. Imaging 30, 339–343 (2003) 6. Wendler, T., et al.: Navigated three dimensional beta probe for optimal cancer resection. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 561–569. Springer, Heidelberg (2006) 7. Benlloch, J.M., et al.: The gamma functional navigator. IEEE Trans. Nucl. Sci. 51, 682–689 (2004) 8. Focht, S.L.: Lymphatic mapping and sentinel lymph node biopsy. AORN J 69, 802–809 (1999) 9. Reintgen, D., et al.: Lymphatic mapping and sentinel lymph node biopsy for breast cancer. Cancer J. 8 (Suppl. 1), 15–21 (2002) 10. Wendler, T., et al.: Real-time fusion of ultrasound and gamma probe for navigated localization of liver metastases. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, Springer, Heidelberg (2006) 11. Sielhorst, T., et al.: Campar: A software framework guaranteeing quality for medical augmented reality. International Journal of Computer Assisted Radiology and Surgery 1(Supp. 1), 29–30 (2006) 12. Feuerstein, M., et al.: Automatic patient registration for port placement in minimally invasive endoscopic surgery. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 287–294. Springer, Heidelberg (2006)
Towards Intra-operative 3D Nuclear Imaging
917
13. Traub, J., et al.: Hybrid navigation interface for orthopedic and trauma surgery. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 373–380. Springer, Heidelberg (2006) 14. Hartl, A.: Gamma-probe modeling for reconstruction. Technical report, Computer Aided Medical Procedures (CAMP), TUM, Munich, Germany (2007) 15. Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia (1998) 16. Wakin, M., et al.: An architecture for compressive imaging. In: Proc. ICIP 2006, Atlanta, GA (2006)
Instrumentation for Epidural Anesthesia King-wei Hor1 , Denis Tran1 , Allaudin Kamani2 , Vickie Lessoway3, and Robert Rohling1 1
2 3
Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada {kingh, denist, rohling}@ece.ubc.ca Department of Anesthesia, B. C. Women’s Hospital, Vancouver, BC, Canada Department of Ultrasound, B. C. Women’s Hospital, Vancouver, BC, Canada
Abstract. A low-cost, sterilizable and unobtrusive instrumentation device was developed to quantify and study the loss-of-resistance technique in epidural anesthesia. In the porcine study, the rapid fall of the applied force, plunger displacement and fluid pressure, and the oral indication of the anesthesiologists were shown to be consistent with the loss-ofresistance. A model based on fluid leakage was developed to estimate the pressure from the force and displacement measurements, so that the pressure sensor could be omitted in human studies. In both human (in vivo) and porcine (in vitro) subjects, we observed that the ligamentum flavum is less amenable to saline injection than the interspinous ligament.
1
Introduction
Epidural anesthesia (or epidural) is an important and widely accepted analgesia technique in obstetrics to effectively alleviate labor pain. To facilitate the delivery of the local anesthetic, a catheter is inserted through a needle into the epidural space, a narrow space surrounding the dura mater within the spinal column. A widely accepted method known as the loss-of-resistance technique is used to indicate entry of the needle tip into the epidural space, located anterior to the ligamentum flavum. When the needle advances through the supraspinous ligament, interspinous ligament and then the ligamentum flavum, the anesthesiologist continuously feels a high resistance to injection of saline or air. Upon entry into the epidural space, the ease of injection causes the anesthesiologist to feel the loss-of-resistance and needle advancement is then halted. Like all other obstetric interventions, epidural anesthesia involves risks. Complications can include backache, headache, shivering, hypotension, bladder dysfunction and inadequate pain relief. More rare are the inadvertent dural puncture, fetal distress, neurologic injury, cardiac arrest, allergic shock and maternal death. Although the use of conventional epidurals has increased over a few decades, it continues to have a failure rate in the range of 6–25% [1,2]. One study shows a success rate of 60% after 10 attempts and 84% after 60 attempts [3]. Epidurals are considered more difficult than other regional anesthetic techniques [4]. Although residents can practice on cadavers or simulated tissues N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 918–925, 2007. c Springer-Verlag Berlin Heidelberg 2007
Instrumentation for Epidural Anesthesia
919
and ligaments, none provide accurate haptic feedback [5]. Much of the experience is gained by performing the epidural anesthesia on actual human patients but patient risk is involved. Furthermore, the risk of complication is increased due to anatomical variations from patient variability (such as age, height, BMI, ethnicity, etc.). Improving the learning curve while avoiding patient risks would be beneficial, so there have been attempts to construct epidural simulators [5,6]. These simulators provide force feedback with sub-optimal realism [5] and have not found wide acceptance. This may be due to subtleties and dynamic interactions that exist only while performing the actual epidural on human subjects in vivo. Anesthesiologists continue to rely on the loss-of-resistance technique as the only feedback mechanism to indicate entry into the epidural space, but the technique is not completely reliable [7]. There are no external physical characteristics of the patient that can provide information of the exact location of the epidural space. Having to solely rely on this technique means the patient is exposed to all its associated risks. Recently, ultrasound is being used in a limited fashion to help visualize the involved anatomy but further scientific evaluation and validation is required [8]. Since the loss-of-resistance is a crucial technique, there is a need for good instrumentation. Specifically, the first goal is to determine whether the loss-of-resistance can be measured accurately with unobtrusive sensors. Another goal is to determine whether any tissue properties can be derived from the measurements. The overall goal is to gain a deeper understanding of the loss-of-resistance method which may help further research in more accurate simulators and computer models. To quantifiably detect the loss-of-resistance, the force of the thumb acting on the plunger of the syringe Fa , the displacement of the plunger relative to the barrel D, and the pressure of the saline fluid P were instrumented, as shown in Figure 1. These measured quantities were used to investigate the influence of the tissue type in both laboratory and clinical settings. Ultrasonography was used to validate the loss-of-resistance technique by comparing the depth of the depicted epidural space to the length of the inserted portion of the needle. Consistency of the measured location of the epidural space in the ultrasound image and by the loss-of-resistance technique is essential for furthering fundamental research towards the goal of real-time ultrasound-assisted guidance with sensory feedback.
2 2.1
Methods Instrumentation
Instrumenting the three physical quantities, Fa , D and P , required three individual sensors. The SLB-25 force sensor (Transducer Techniques, Temecula, CA) was used to measure Fa , the CSPR IP65 displacement sensor (MTS System, Cary, NC) was used to measure D, and the PX302 pressure sensor (Omega Engineering, Stamford, CT) was used to measure P , as illustrated in Figure 1. A custom-built stainless steel harness, fitted to the anesthesiologist’s thumb, was used to mount the force sensor. The pressure sensor was connected to a three-way
920
K. Hor et al.
Fig. 1. The loss-of-resistance was instrumented by using three sensors that measured the applied force Fa , the plunger displacement D and the saline fluid pressure P
stopcock that was attached to the needle seat of the syringe. The ring magnet of the magnetostrictive displacement sensor was attached to the plunger and its transducer rod was attached to the barrel by a custom-built stainless steel harness. The ring did not touch the rod, so friction was negligible and allowed the anesthesiologist to retain the full feeling of loss-of-resistance. Glass syringes (JH-0550 Epidural Catheterization Kit, Arrow International, Reading, PA) were used for the studies. The Q8 data acquisition board (Quansar, Markham, ON) was used to capture the sensor signals to a PC at a sampling rate of 0.01 s. For calculations, a moving average filter with an interval size of 0.1 s was used to remove the majority of the noise. 2.2
Modeling
Three models relating the measurement variables were investigated: the static, dynamic and decay model. Each model incorporates different physical and empirical properties to relate the pressure to the force and displacement measurements. The static model describes a non-dynamic system with no motion or fluid flow. The fundamental relationship describing the pressure P (t) varying over time t of an incompressible static fluid is P (t) = F (t)/A, where F (t) is the force acting on the fluid over an area A. For the epidural syringe, it was observed that some of the force exerted by the thumb was lost through several factors such as friction, viscosity and off-axis force. Therefore, a coefficient ka was introduced to account for such losses. Hence, the equation for the static model is P (t) = ka
Fa (t) A
(1)
The dynamic model accounts for fluid motion since fluid is continuously injected into the tissue to detect the loss-of-resistance. Given the ratio of the crosssectional areas of the barrel and needle is approximately 19, and the speed of the plunger is 10 mm/s (far exceeding speeds observed in epidurals), Bernoulli’s
Instrumentation for Epidural Anesthesia
921
equation and the continuity of flow equation for the fluid flowing through two connecting tubes imply the difference in pressure is approximately 20 Pa. The resulting pressure difference is relatively small and not measurable by the instrumentation. Therefore, it was assumed that the pressures from which the fluid flows from the barrel to needle and other cylindrical connections were approximately constant, and the pressure losses from dynamic flow were omitted from modeling. The decay model includes fluid leakage between the plunger-barrel interface. Two cases were examined: a stationary and a moving plunger. When the plunger was stationary, the pressure, caused by Fa , was observed to decay exponentially from the initial pressure (at the time when the plunger stopped moving). Since the plunger was motionless, small changes in Fa did not affect the initial pressure because it was countered by static friction. If the change in Fa was large, it caused the plunger to move. When the plunger was in motion, pressure did not decay (although some leakage still occurred) because it was continually and directly affected by Fa . Thus, the pressure is expressed as dD ka FaA(t) dt = 0 (2) P (t) = Fa (ti ) − t−ti dD ka A e τ dt = 0 where τ is the exponential time constant and ti is the time at which the plunger stops moving. A plunger was considered stationary if the speed from the displacement profile was less than 0.18 mm/s which was chosen to be just beyond the noise level of the displacement sensor. Both ka and τ were determined empirically in bench tests by measuring the pressure values for a set of constant forces. Thus, ka was found to be 0.900±0.005 and τ was determined to be 23 ± 8 s. Although a more sophisticated model may be derived, this addressed the main characteristics of the glass syringe that was designed specifically for this low friction application. 2.3
Porcine Study
Epidurals were performed by an experienced anesthesiologist in a manner consistent with clinical practice. The subject was a pig (Sus scrofa domestica) that had been culled and prepared according to guidelines for human consumption the same day as the experiments. The hold on the syringe was slightly different to compensate for the instrumentation device, but the loss-of-resistance technique remains unchanged. Ten trials were performed and the punctures took place either between the L2-L3 or L3-L4 vertebrae. During the procedure, an operator monitored the device and acquired the data. When the loss-of-resistance was felt, the anesthesiologist immediately communicated orally to the operator so that the time was recorded. Once the needle breached the epidural space, it was marked at the surface of the puncture and ultrasound (GE Voluson 730 Pro, GE Healthcare, Chalfont St. Giles, Buckinghamshire, UK) was used to image the needle and the epidural space. The software-based ruler was used to measure the puncture path length, the distance between the base of the puncture and
922
K. Hor et al.
the tip of the needle, in the ultrasound image. A caliper was used to measure the actual puncture path length (the mark to the needle tip). The loss-of-resistance was determined by the times of the minimum slopes of the pressure and displacement measurements. The force profiles nearby the loss-of-resistance tended to vary depending on the anesthesiologist’s actions, so the loss-of-resistance was calculated by averaging the time between 90% of the local extrema. The paired student t-test (α = 0.05) was used to compare all three times obtained from the force, displacement, and pressure profiles. The mean time of the three estimated times was compared with the time verbally indicated by the anesthesiologist. The physical models (Equations 1 and 2) were used to estimate the pressure from the force and displacement measurements. The mean error and standard deviation between the estimated and actual pressures were calculated. The paired student t-test (α = 0.05) was performed on the ten paired depth measurements (ultrasound and actual needle), and the mean and standard deviation were calculated. Just prior to the loss-of-resistance, two regions were also observed (see Section 3) in the displacement profile: a sloped region indicating the needle was in the interspinous ligament and an near-flat region indicating the needle was in the stiffer ligamentum flavum. The mean flow rate, the mean and maximum applied force, the mean and maximum actual pressures, and the mean and maximum calculated pressures were calculated over all trials for each region. In a second, smaller study, the epidural space depth was directly estimated by first manually identifying the epidural space in the ultrasound image without the needle. The depth was then measured vertically to the skin surface (shortest distance) since the needle path was unknown. That depth was compared to a depth indirectly determined by estimating the length of the needle tip to the skin surface using the puncture path length and its angle in the ultrasound image. Three trials were performed for each of the L3-L4 and L4-L5 interspaces on a second pig. The mean and standard deviation were used for comparison. 2.4
Human Study
The clinical study was performed on eleven consenting pregnant women who were in labor or prior to Cesarean section.1 The epidural was performed using the instrumentation device without use of the pressure sensor to avoid contamination. Sterility was maintained by wearing the force sensor under a sterile glove and covering the displacement sensor with a sterile drape prior to mounting it to the sterilized harness attached to the syringe barrel. The needle was initially advanced into the interspinous ligament at either the L2-L3 or L3-L4 interspace. At that time, the data was captured until loss-of-resistance was achieved. Successful delivery of the epidural anesthesia was confirmed by medical assessment of the patient. Measurements of the interspinous ligament and ligamentum flavum regions were performed (except for the actual pressure) over all subjects, as described in Section 2.3. 1
Approved by the Clinical Review and Ethics Board of the University of British Columbia and B. C. Women’s Hospital.
Instrumentation for Epidural Anesthesia 25
Displacement (mm)
Force (N)
5 4 3 2 1 0 0
10
20
40
20
15 0
30
Pressure (kPa)
6
10
Time (s)
10
10
20 10
20
30
20 10 0 0
30
20
30
40
30
Time (s)
20
Time (s)
Pressure (kPa)
30
10
20
0 0
30
40
Pressure (kPa)
Pressure (kPa)
20
30
Time (s)
40
0 0
923
10
20
Time (s)
30
30 20 10 0 0
10
Time (s)
Fig. 2. The top graphs show the applied force, plunger displacement and fluid pressure for a typical epidural procedure. The solid vertical lines indicate the time of loss-ofresistance as determined by each profile (see Section 2.3). The dashed vertical lines indicate the time of loss-of-resistance verbally communicated by the anesthesiologist. The bottom graphs show the estimated pressure from the static, decay model and the measured pressure (which is shown again for the sake of comparison). The solid vertical line for these plots indicates the time of loss-of-resistance determined by the pressure profile.
3 3.1
Results and Discussion Porcine Study
A typical set of force, displacement and pressure profiles is shown in Figure 2. There are no significant differences between any of the three times of lossof-resistance estimated from the profiles. Thus, the mean times are calculated and compared with the times verbally indicated by the anesthesiologist. The times indicated by the anesthesiologist are significantly larger, by an average of 0.8 ± 0.3 s, than the mean times. This discrepancy is consistent with the time it takes for the anesthesiologist to conclude entry of the needle into the epidural space and to orally communicate the information to the operator. The estimated pressures from the static and decay models for the same typical trial are shown in Figure 2. The decay model accounts for leakage with a single time constant (23 s), but the actual pressure profile shows small variations in the decay rate with an average time constant of 22 ± 7 s. Although it is possible that leakage may have occurred in the ligaments, the time constants from the pressure profile and the decay model are nearly the same implying that little or no significant leakage occurs in the dense ligaments for a stationary plunger. For the static model, the average mean error is 2 ± 5 kPa. For the decay model, the average mean error is 0 ± 3 kPa. The decay model is significantly more accurate than the static model, and its standard deviation represents approximately 9%
924
K. Hor et al.
Table 1. Summary of measurements, averaged over all trials, for the interspinous ligament (ISL) and ligamentum flavum (LF). The calculated pressure values were determined by using the decay model. Region Porcine - ISL Porcine - LF Human - ISL Human - LF
Flow Rate Fa (mm3 /s) (N) 29 ± 9 2.7 ± 1.6 9 ± 7 3.3 ± 1.4 60 ± 30 2.0 ± 1.4 12 ± 13 5±3
Max Fa P Max P (N) (kPa) (kPa) 4.5 ± 1.6 20 ± 10 31 ± 13 4.1 ± 1.5 27 ± 6 30 ± 7 4.6 ± 1.7 6±3 60
Estimated Pressure (kPa)
10
Displacement (mm)
22
Force (N)
8 6 4 2 0 0
Calc. P Max Calc. P (kPa) (kPa) 20 ± 11 34 ± 13 25 ± 7 32 ± 10 15 ± 12 35 ± 17 30 ± 30 40 ± 30
5
10
15
20
Time (s)
20 18 16 14 12 10 0
5
10
Time (s)
15
20
50 40 30 20 10 0 0
5
10
15
20
Time (s)
Fig. 3. The graphs show the applied force, plunger displacement and the estimated fluid pressure (based on the decay model) for a typical epidural procedure. The solid vertical lines indicate the time of loss-of-resistance determined by each profile.
that of the peak calculated pressure averaged over all subjects from the clinical study. The measurements for the needle while in the interspinous ligament and ligamentum flavum are summarized in Table 1. We observe low-to-medium forces and measurable plunger movement in the interspinous ligament changing to increasing forces with little plunger movement in the ligamentum flavum followed by a rapid fall upon entry into the epidural space. Ultrasound was used to image the needle once it had entered the epidural space. The mean error of the puncture path length is 0.0 ± 0.5 mm. There is no significant difference between the two measurements confirming that the ultrasound measurements are consistent with the actual measurements when the needle itself was visible in the ultrasound. The depiction of the epidural space is characterized by a “doublet”, a horizontal line pair of the interface between the ligamentum flavum and epidural space. In the second study, the average direct and indirect measurements of the epidural space depth for the L3-L4 interspace are 29.5 ± 0.8 mm and 29 ± 2 mm, respectively, and for the L4-L5 interspace are 35.7 ± 1.7 mm and 37 ± 4 mm, respectively. We conclude the direct and indirect measurements are consistent with the instrumented loss-of-resistance technique. 3.2
Human Study
In the clinical study, a typical trial is shown in Figure 3. The measurements for the needle while in the either the interspinous ligament or ligamentum flavum are summarized in Table 1. Although there is large patient variability, the profiles and ligament properties are similar to the ones from the porcine study
Instrumentation for Epidural Anesthesia
925
(see Section 3.1). Additionally, human interspinous ligament and ligamentum flavum are less amenable to saline injection in the human spine in vivo than those of the porcine spine in vitro.
4
Conclusion
A low-cost, sterilizable and unobtrusive instrumentation device for the loss-ofresistance was developed for both porcine and human subjects. The loss-ofresistance is easily visible and consistent among the force, displacement and pressure profiles, and the oral indication by the anesthesiologist. Furthermore, the location of the epidural space detected by the loss-of-resistance is validated using ultrasound measurements. The decay model relating the pressure to the applied force and plunger displacement has a standard deviation of approximately 9% that of the peak calculated pressure in the clinical study. When the plunger was stationary, there was negligible leakage into the interspinous ligament and ligamentum flavum. The measurements also show the ligamentum flavum is generally less amenable to saline injection than the interspinous ligament. The instrumentation of loss-of-resistance will allow further study into tissue properties, patient variability, and operator performance.
References 1. Le Coq, G., Ducot, B., Benhamou, D.: Risk factors of inadequate pain relief during epidural analgesia for labour and delivery. Can. J. Anaesth. 45, 719–723 (1998) 2. Watts, R.W.: A five-year prospective analysis of the efficacy, safety, and morbidity of epidural anaesthesia performed by a general practitioner anaesthetist in an isolated rural hospital. Anaesth. Intensive Care 20(3), 348–353 (1992) 3. Grau, T., Bartusseck, E., Conradi, R., Martin, E., Motsch, J.: Ultrasound imaging improves learning curves in obstetric epidural anesthesia: a preliminary study. Can. J. Anaesth. 50, 1047–1050 (2003) 4. Konrad, C., Schupfer, G., Wietlisbach, M., Gerber, H.: Learning manual skills in anesthesiology: Is there a recommended number of cases for anesthetic procedures? Anesth. Analg. 86, 635–639 (1998) 5. Magill, J., Anderson, B., Anderson, G., Hess, P., Pratt, S.: Multi-axis mechanical simulator for epidural needle insertion. In: Cotin, S., Metaxas, D.N. (eds.) ISMS 2004. LNCS, vol. 3078, pp. 267–276. Springer, Heidelberg (2004) 6. Dang, T., Annaswamy, T.M., Srinivasan, M.A.: Development and evaluation of an epidural injection simulator with force feedback for medical training. Stud. Health Technol. Inform. 81, 97–102 (2001) 7. Carden, E., Ori, A.: The bip test: a modified loss of resistance technique for confirming epidural needle placement. Pain Physician 9(4), 323–325 (2006) 8. Marhofer, P., Willschke, H., Greher, M., Kapral, S.: New perspectives in regional anesthesia: the use of ultrasound - past, present, and future. Can. J. Anaesth. 52(6), R1–R5 (2005)
Small Animal Radiation Research Platform: Imaging, Mechanics, Control and Calibration Mohammad Matinfar1 , Owen Gray1,2 , Iulian Iordachita1 , Chris Kennedy2 , Eric Ford2 , John Wong2 , Russell H. Taylor1, and Peter Kazanzides1 1
2
Dept. of Computer Science, Johns Hopkins University, Baltimore, MD Dept. of Radiation Oncology, Johns Hopkins Medical Institution, Baltimore, MD
Abstract. In cancer research, well characterized small animal models of human cancer, such as transgenic mice, have greatly accelerated the pace of development of cancer treatments. The goal of the Small Animal Radiation Research Platform (SARRP) is to make those same models available for the development and evaluation of novel radiation therapies. In combination with advanced imaging methods, small animal research allows detailed study of biological processes, disease progression, and response to therapy, with the potential to provide a natural bridge to the clinical environment. The SARRP will realistically model human radiation treatment methods in standard animal models. In this paper, we describe the mechanical and control structure of the system. This system requires accurate calibration of the x-ray beam for both imaging and radiation treatment, which is presented in detail in the paper.
1
Introduction
The tremendous advances in medical imaging technologies over the last decade are revolutionizing the management of patient treatment and care. In cancer therapy, high speed, high resolution anatomical imaging [1,2] in combination with functional imaging [3,4] significantly improve our ability to stage the disease, localize the tumor, and evaluate the treatment process. It has also become apparent that by applying advanced imaging methods to study small animals, such as mice or rats, much can be gained in our understanding of the disease processes and the development of new treatment strategies. Micro-animal systems capable of very high-resolution have been developed in positron emission tomography [5,6], X-ray CT [7,8], magnetic resonance imaging [9,10], magnetic resonance spectroscopic imaging [11] and ultrasound imaging [12]. Modern image guided conformal beam radiotherapy aims to deliver a high therapeutic dose to cancerous tissue while minimizing the dose to healthy tissue. Delivering the therapeutic dose from multiple poses during a single session allows a concentrated dose to be delivered to the tumor, while reducing the dose to surrounding tissue. To do this effectively, the location of the tumor must be known, requiring pretreatment scans and registration of the treatment beam.
This work was supported by NIH 1 RO1 CA108449-01.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 926–934, 2007. c Springer-Verlag Berlin Heidelberg 2007
Small Animal Radiation Research Platform
927
Because there is no safe dose of radiation, it is critical that the risks of radiation exposure are balanced against the efficacy of the treatment. Given the size, complexity, and expense of imaging equipment and therapeutic linear accelerators, most novel techniques and protocols are first tested on human subjects. This poses a number of technical and ethical dilemmas because there is only limited pre-clinical validation of most techniques and the number of subjects in clinical trials is typically small, with participants having some of the most dire prognoses after standard techniques have failed. The SARRP aims to address these issues by providing a platform that can perform high-resolution imaging and accurate conformal beam therapy on standard animal models for human cancers. Currently, radiotherapy trials on animal models use gamma cells or similar devices. The SARRP offers several advantages over the gamma cell, including portability and the ability to deliver radiation over a conformal arc rather than as single beams. Many mouse models of human cancer are currently available, but existing imaging and therapeutic systems are ill suited to such small subjects, and the equipment is also in high demand and seldom available for lengthy laboratory trials. The SARRP provides cone beam CT imaging, radiation therapy, and correlates this with treatment dose and efficacy. We believe that this is the first system being developed for the purpose of radiation therapy research on small animals. Section 2 discusses the mechanics and control structure. Sections 3 and 4 describe the imaging and treatment subsystems, respectively, with a focus on calibration methods.
2
Mechanics and Control Structure
SARRP integrates imaging, radiation delivery and treatment planning capabilities. The mechanical structure is designed to meet the system requirements, which are to attain CT voxel resolution of 0.5 mm or better with 1cGy imaging dose, localized radiation dose at a FWHM of 1.5 mm, and a dose rate of 200 cGy per minute. The following sections describe the major components: Robotic Positioner: The robotic positioner θXYZ consists of three modular offthe-shelf subassemblies: rotating table, X-Y cross table, and vertical stage, Fig. 1. The rotary table is a preloaded, anti-backlash worm assembly that provides exceptional angular accuracy (0.05 degrees) and repeatability (0.007 degrees). It is actuated by an encoded DC motor. This stage provides unlimited angular positioning but, due to the cabling to the the XYZ axes, the range of motion is limited to ±190 deg via software. The X-Y motions are realized by an X-Y cross table. Each axis consists of a ball way table actuated by an encoded DC motor driven lead-screw (65 μm/axis accuracy), with a monolithic center. The unit incorporates anti-backlash friction nuts to achieve a high repeatability (6 μm). The motion range for each axis is ±50 mm. For the vertical (Z) motion, we employed a stage with a scissors mechanism, motorized with an encoded DC motor. The travel length is 38 mm and the anti-backlash lead-screw assembly has good repeatability (0.125 mm). Although less precise than the other stages, it is sufficient to satisfy the design requirements.
928
M. Matinfar et al.
Manual Rotation of X-Ray
Y Stage X-Ray Source Z Stage
Image Detector X Stage 4-Axis Positioner Y
Z X
Rotary Stage
R
Fig. 1. Different components of the Small Animal Radiation Research Platform
X-Ray Tube and Arm: The X-ray source has a variable output with a maximum beam energy of 225kVp. Images are acquired using 100kVp photons and a spot size of 0.2mm. In treatment mode, 225 kVp photons are generated from a 2mm focal spot with up to 13 mA beam current to deliver clinically useful dose rates (∼2 Gy/Min) down to 0.5 mm FWHM beam. The X-ray source is mounted on a rotary arm which can be manually set at nine different positions. These positions are located 15 degrees apart to create a 120 degree motion range. Any location of the arm can be used for radiation therapy, whereas cone beam imaging is only possible when the source is in the horizontal position. Collimator and Shutter: There are three collimators aligned with the x-ray beam axis. The primary collimator, fixed to the x-ray source, reduces the size of the beam to 200×200 mm at the isocenter and is permanently attached. The secondary collimator has a fixed position, but can be easily removed; it reduces the beam size to 60×60 mm at the isocenter. The third collimator has a variable position along the x-ray axis (min 230 mm, max 310 mm), is easily removable (together with the second one), and can be chosen to set the beam diameter as low as 0.5 mm. The shutter is a motor driven linear stage carrying two brass pieces to block the x-ray radiation during the tube power-up phase. Image Detector Panel: Image acquisition is accomplished with a flat panel digital x-ray detector. The current system is a 512×512 pixel array with 0.4 mm2 pixel size and 16 bit resolution. The detector frame rate is 7 Hz, providing rapid acquisition of images for cone beam tomography. Preprocessing of acquired images is performed in hardware on the frame grabber card to maximize the acquisition rate and allow concurrent reconstruction of CT volumes on a standard PC workstation. Dark current and gain correction images are acquired prior to each imaging run and used to correct pixel intensities as each image is acquired.
Small Animal Radiation Research Platform
929
Control Structure: The control structure of the SARRP has two main components: a PC with a graphical user interface connected, via Ethernet, to an intelligent motion control board. The motion control board contains a 32-bit microprocessor that provides PID control of up to 6 motors, with a loop update rate of 500 microseconds. Laser Alignment System: The SARRP provides a laser alignment system to facilitate accurate, reproducible setup of subjects. A removable line laser is mounted on the x-ray tube and a cross-hair laser permanently mounted on the gantry pivot. These lasers converge at the nominal isocenter and enable visual setup of large treatment fields, where an accuracy of several millimeters is sufficient. Frequently, tumors are implanted in the extremities (flank or dorsal fold) of subjects, and the laser alignment provides a rapid means of assessing setup accuracy without the need to acquire x-ray images. Animal Support Fixture: While the laser system is intended for relatively coarse alignment, several mouse carriers are being developed for immobilization and registration, where millimeter or submillimeter accuracy is required. These mouse carriers incorporate gas anesthesia, temperature control to prevent hypothermia, and stereotactic frames for accurate delivery of treatment beams to the cranium. The devices are MR compatible, and provide fiducials for coregistration of PET, MR, and CBCT volumes. Moreover, the SARRP provides integrated portal imaging using a standard x-ray film cassette mounted below the subject in the anterior-posterior plane. The portal image may be used to confirm setup accuracy post treatment as well as to perform the initial setup.
3
Imaging Subsystem
The imaging system uses a novel geometry to acquire cone beam CT sequences. The animal rests on a rotating platform and is rotated around an anteriorposterior axis. Images are acquired with the x-ray source in the horizontal position. As the subject is rotated, a series of projection radiographs are acquired. Volumetric reconstruction is accomplished using filtered back projection [13]. This geometry poses a problem due to the large disparity in path lengths when the beams traverse the long axis of the subjects relative to the lateral axis. Due to the additional attenuation, ringing and “cupping like” artifacts are present in the reconstructed volume. It is believed that this is due to beam hardening, and additional simulation studies are planned to develop a correction scheme to provide accurate density information for treatment planning and dose calculation. The SARRP imaging beam operates at 100 kVp with 0.5 mm CU and 2mm AL filtration. The pre-hardening of the beam is necessary to reduce artifacts from beam hardening. The aim of the imaging subsystem is not high resolution CT, but rapid, low dose acquisition and quantitative CT for dose calculation and treatment planning. Existing micro CT scanners use relatively low beam energies on the order of 20 kVp. While this provides excellent contrast and resolution, the
930
M. Matinfar et al.
Beam Center and Axis of Rotation misaligned
Virtual alignment of AOR and beam center
Detector
Apply weak perspective transform and translate Axis of rotation projection
Detector AOR projection AOR pr ojection Beam Center
X-Ray Tube
X-Ray Tube Original image plane
Virtual image plane
Fig. 2. Imaging subsystem of the SARRP
imaging dose to the subject is high relative to the SARRP system. The SARRP imaging dose is roughly 10 cGy, which is an order of magnitude lower than a typical therapeutic dose. High imaging doses confound any analysis of subsequent radiotherapy, and in many cases are sufficient to induce immunosuppression. 3.1
Camera Calibration
Camera calibration is accomplished using 2D-3D point correspondences in the method described by Bopp [14]. A fiducial object composed of point-like features (BBs or holes in a thin copper plate) with known geometry is imaged, and 6 or more landmarks are used to calculate the intrinsic (focal length and detector offset) and extrinsic (detector position in world coordinates) camera parameters. Input images are thresholded, connected components are extracted, and the center of mass is calculated. The user then manually inputs the known positions of each object in world coordinates (Z axis is vertical, X axis is from source to detector, and Y axis is transverse). 3.2
Geometric Calibration
For accurate cone beam reconstruction using the Feldkamp algorithm [13], it is essential that the axis of rotation of the subject intersect the beam axis (the ray normal to the detector surface passing through the x-ray source). The axis of rotation of the theta stage is calculated by rotating a fiducial object composed of point-like objects, Fig. 2. Using the known camera geometry, the axis of rotation in world coordinates and its projection on the detector may be calculated. In practice, perfect alignment of the system is not possible, so additional correction may be applied to the projection radiographs. To correct any residual misalignment, a weak perspective transform is applied to the projection images to create a virtual alignment of the axis of rotation and beam axis. The projection of the axis of rotation is shifted, and an intensity correction is applied to all pixels to account for the inverse square drop off in intensity for the aligned image.
Small Animal Radiation Research Platform
4
931
Treatment (Therapy) Subsystem
Radiation therapy is often delivered from multiple poses that are intended to intersect at a specified point (the treatment isocenter ). Positioning the target (tumor) at this point will cause it to receive a higher radiation dose than the surrounding healthy tissue. With the SARRP, rotation about a target can be obtained via the rotary (theta) axis or the x-ray arm. At a given x-ray arm orientation, 1 the isocenter can be defined by the closest intersection of the two 3D lines defined by the x-ray beam and the rotary axis center of rotation (in general, two 3D lines will not intersect, but a closest intersection can be defined on the shortest line that is perpendicular to both). Because the x-ray arm is manually positioned, it is feasible to define nine different isocenters – one for each position. Figure 4 (right) illustrates the concept, where only 5 treatment positions (P0-P4) and 2 isocenters (C1, C4) are shown for visual clarity. If the offsets between the nine isocenters are measured, the system can compensate by moving the XYZ axes whenever the x-ray arm is moved. This ensures that the target receives the maximum dose regardless of the position of the x-ray arm. If the beam axis and rotation (theta) axis do not exactly intersect, and the treatment plan requires rotation of the theta axis, it is possible to further compensate via coordinated motion of the XY axes with the theta axis. The following sections describe the techniques that were used to measure, and correct, the misalignments and offsets between the different axes. 4.1
Offline Calibration
For offline calibration, we used an optical tracking system to collect points on different components of the SARRP, as shown in Fig. 3. The collected data were analyzed to determine parameters such as the rotary stage and x-ray arm centers of rotation. Because the optical tracking system cannot sense the x-ray beam, all measurements were performed with the beam off. We estimated the beam axis by digitizing the x-ray opening window frame and taking into account the mechanical dimensions of the tube (e.g., from the manufacturer drawings). We performed three mechanical adjustments to improve the alignment. After these adjustments, the shortest distance between rotary stage center of rotation and the x-ray arm center of rotation was reduced to 0.30 mm. The shortest distance between the estimated beam axis and the rotary stage axis of rotation was reduced to 0.31 mm. 4.2
Online Calibration
Offline calibration is useful during initial construction and testing, but is not practical for online (periodic) use due to its requirement for a large and expensive optical tracking system. In contrast, the online calibration setup consists of a collimated beam (e.g., 1mm) and an x-ray camera, as shown in Fig. 4 (left). This 1
Except when the x-ray arm is in the vertical position; in this case, the isocenter is defined at a specified distance along the beam axis.
932
M. Matinfar et al.
X-ray Arm Axis Mechanical Isocenter
Normal to X-ray Tube Frame
Optical Probe
Rotation Axis
Optical Reference Frame
Fig. 3. Mechanical isocenter and data collection on different components of SARRP
method does not require precise positioning or calibration of the x-ray camera. As a first step, we measure the axis of rotation of the robotic positioner. This is accomplished by placing the x-ray arm in the vertical position and rotating the stage through a set of angles. The x-ray camera captures an image at each angle; these images are superimposed (added), as shown in Fig. 4 (center). The center of rotation, Cr (in camera coordinates), is given by the center of gravity of the final image. Once Cr is determined, we measure the isocenter at each x-ray arm position as follows: 1. Capture image with x-ray camera and compute center of gravity, Ci . 2. Move positioner (in XYZ) and repeat Step 1 until Ci equals Cr . The amount of XYZ motion provides the coordinates of one point on the beam axis. 3. Move positioner along nominal beam axis by a “reasonable” amount (e.g., 10 mm) and repeat Step 2. This provides a second point on the beam axis. 4. Isocenter is given by intersection of beam axis and axis of rotation. The above method provides the relative position (offset) between the treatment isocenters in the positioner coordinate system. For convenience, we define the origin when the x-ray arm is in the horizontal (imaging) position. If the target is defined in the image coordinate system (e.g., subsequent to a cone beam CT scan), then the location of the isocenter in image coordinates is known (from the image calibration) and an isocenter offset is applied when the treatment beam is not in the imaging position. If lasers are used to align the target (e.g., with skin markers), then it is necessary to determine the offset between the laser isocenter and the beam isocenter.
5
Conclusion and Future Work
In this paper, we presented a novel system to integrate imaging, radiation delivery, and treatment planning for small animal research. The system was designed and constructed for development and evaluation of novel radiation therapies. The mechanical and control structure of the system were described and the calibration methods for x-ray beam imaging and radiation treatment were presented.
Small Animal Radiation Research Platform
933 P0
1 mm Collimator
Laser Pointer
Treatment Beams
P1
P2
P3 C4
P4 X-Ray Camera
C1 Center of Rotation
Fig. 4. Calibration setup, x-ray image used to find axis of rotation, and isocenter definition for SARRP
Future work will include validation of the calibration method, with central focus on determining the absolute accuracy of beam delivery. This will include characterizing and correcting for errors in the imaging system, beam alignment, and subject setup. The first phase will be to determine the accuracy with which a lesion observed in a reconstructed CT volume can be targeted, given the known error of the various subsystems. This will be followed by phantom studies to validate the accuracy of beam targeting.
References 1. Leter, E., et al.: Definition of a moving gross target volume for steretactic radiation therapy of stented coronary atretries. Jour. of Rad. Oncol. Biol. Phys. 52 (2002) 2. Schoef, U., et al.: Multi-slice computed tomography as a screening tool for colon cancer, lung cancer and coronary artery disease. Jour. of Eur. Rad. 11 (2001) 3. DiBiase, S., et al.: Magnetic resonance spectroscopic imaging-guided brachytherapy for localized prostate cancer. Jour. of Rad. Oncol. Biol. Phys. 52 (2002) 4. Nelson, S.: Analysis of MRI and MR spectroscopic imaging data for the evaluation of patients with brain tumors. Jour. of Mag. Res. Imag. 46 (2001) 5. Jacob, R., Cherry, S.: Compelementry emerging techniques: high-resolution PET and MRI. Current Opinion in Neurobiology 11 (2001) 6. Rubins, D., et al.: Evaluation of a stereotactic frame for repositioning of the rat brain inserial PET imaging studies. Jour. Neuroscience Methods 107 (2001) 7. Medynsky, A., et al.: Elastic response of numan iliac arteries in-vitro to balloon angioplasty using high-resolution CT. Jour. of Biomech 31 (1998) 8. Wan, S., et al.: Multi-generational analysis and visualization of the vascular tree in 3D micro-CT images. Comput. Biol. Med. 32 (2002) 9. Allport, J., Weissleder, R.: In vivo imaging of gene and cell therapies. Exp. Hematol. 29 (2001) 10. Franconi, F., et al.: In vivo quantitative microimaging of rat spinal cord at 7T. Jour. of Mag. Res. Imag. 44 (2000)
934
M. Matinfar et al.
11. Zhang, X., Ugurbil, K., Chen, W.: Microstrip RF surface coil design for extremely high-field MRI and spectroscopy. Jour. of Mag. Res. Imag. 46 (2001) 12. Turnbull, D., et al.: Ultrasound backscatter microscope analysis of mouse melanoma progression. Ultra. Med. Biol. 22 (1996) 13. Feldkamp, L.A., Kress, L.D.: Practical cone-beam algorithm. Optical Society of America A(1), 612–619 (1984) 14. Bopp, H., Krauss, H.: An orientation and calibration method for non-topographic applications. Optical Engineering 44(9), 1191–1196 (1999)
Proof of Concept of a Simple Computer–Assisted Technique for Correcting Bone Deformities Burton Ma1 , Amber L. Simpson2 , and Randy E. Ellis1,2 1
Human Mobility Research Centre, Kingston General Hospital, Kingston, Ontario, Canada 2 Queen’s University, Kingston, Ontario, Canada mab,simpson,[email protected]
Abstract. We propose a computer-assisted technique for correcting bone deformities using the Ilizarov method. Our technique is an improvement over prior art in that it does not require a tracking system, navigation hardware and software, or intraoperative registration. Instead, we rely on a postoperative CT scan to obtain all of the information necessary to plan the correction and compute a correction schedule for the patient. Our laboratory experiments using plastic phantoms produced deformity corrections accurate to within 3.0◦ of rotation and 1 mm of lengthening.
1 Introduction Ilizarov’s method, an orthopaedic surgery used to correct deformities of the long bones, uses external fixation apparatus to apply controlled stress to a cut or fractured bone; the body’s response is to regenerate the bone and soft tissues and grow in the direction of the applied stress. By maintaining and controlling the direction of the tensile load over a period of time a wide range of deformities can be corrected [1]. We propose a computer-assisted approach that can be used at any institute having modest computing capabilities, as it does not require preoperative CT, intraoperative registration, or intraoperative navigation. The Taylor Spatial Frame (Smith & Nephew, Memphis, TN) is one type of external fixator used for Ilizarov’s method. It is made up of two rings connected by six telescoping struts; changing the strut lengths causes relative motion between the rings. The rings are fixed to the patient using thin, tensioned Kirschner wires or wider Steinman pins drilled into the bone through the skin and surrounding soft tissues. Conventional planning of the correction is performed using radiographs and measurements of deformity assessed in clinic. Thirteen parameters must be measured/set by the surgeon when using the Taylor Spatial Frame. The goal when using the Taylor spatial frame is to mount the frame on the patient so that it mimics the shape of the deformity (ie. the distal ring should be fixed parallel to the plane of the most distal joint and similarly for the proximal ring and joint). If this is done properly, the frame will be in a neutral configuration (all struts equal in length) when the deformity is corrected. If the frame is not mounted properly then there will be a residual deformity if the frame is restored to its neutral configuration. Two computer-assisted techniques using three-dimensional planning have previously been described. Iyun and colleagues [2] described a system that used surface models N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 935–942, 2007. c Springer-Verlag Berlin Heidelberg 2007
936
B. Ma, A.L. Simpson, and R.E. Ellis
computed from CT to plan the correction and the placement of the Steinman pins for mounting the rings. They proposed to use intraoperative registration and navigation to implant the pins. The primary drawback of this approach is that the preoperative choice of pin placement may be incompatible with intraoperative conditions. Simpson and coauthors [3] described a CT-based technique that required intraoperative registration to establish the relationship of the rings with respect to the bone. Using this information in conjunction with a preoperative plan allowed them to compute a correction schedule for the patient. Their approach eliminated the two major sources of error in the conventional technique (planning on radiographs and intraoperative mounting of the rings to mimic the deformity) but required an intraoperative tracking system and segmentation of the CT scan to produce the models for planning and registration. Their technique has been used clinically [4]. We use a postoperative CT scan of the bone and frame to (1) establish the relationship between the bone and the rings, and (2) plan the required correction. We avoid segmenting the CT scan by using direct volume visualization for planning purposes. Kirschner wires should be preferred over Steinman pins when using our proposed technique; the relatively wide stainless steel Steinman pins tend to produce excessive noise in the CT scan. The pins do not interfere with planning if they are located far from the joint lines.
2 Method Our proposed technique would be applied in seven steps: 1. Preoperative patient care would proceed as with conventional technique. The surgeon would not need to plan the procedure on radiographs, but may do so if desired. 2. Intraoperative patient care would proceed as with conventional technique. If the surgeon chooses to follow a conventional preoperative plan then the rings should be mounted to mimic the deformity. If the surgeon chooses to rely on the computerassisted technique then the rings could be mounted in any reasonable configuration. 3. Postoperative patient care would be modified to include a CT scan of the limb and frame. The scan can occur at any time during the correction phase if preoperative planning has been performed and a correction schedule obtained; it must occur before the correction phase begins if using only the computer-assisted technique because we compute the correction schedule as the last step of our method. The usual 10-day period of callus formation (before manipulation of the frame begins) provides ample time to perform the scan. 4. Sets of at least 3 landmarks {Fp } and {Fd } on the proximal and distal rings are located in the CT scan; our tests used a central point on the tabs of the rings (see Figure 1) but other landmarks such as the strut attachment points or fiduciary markers machined into or attached onto the rings could be used. Models of the rings would be registered to the CT landmarks {Fp } and {Fd } to yield the rigid transformations Tp and Td (see Section 2.1). 5. A reference point xref is identified in the CT scan; typically, xref is chosen as the point in the middle of the bone on the plane of the osteotomy. The reference point is used to control the rate of the correction schedule.
Simple Computer–Assisted Technique for Correcting Bone Deformities
937
Fig. 1. Proximal ring landmarks. We use the central point on the outer diameter of the three tabs where the struts attach to the ring. The distal ring landmarks are similar.
6. Direct volume rendering of the CT scan can be used for planning purposes. Only simple operations, such as selecting a region of interest and cutting of the volume into proximal and distal fragments, are required. The proximal fragment is moved to plan the correction because orthopaedic convention views the distal end as stationary; however, our technique can easily be modified to allow movement of both fragments. The rigid transformation Tplan of the proximal fragment in CT coordinates is recorded (see Section 2.2). 7. Using the reference point xref and the transformations Tp , Td , and Tplan a correction schedule is calculated as described in Section 2.3. Note that xref and Tplan are defined in CT coordinates, and Tp and Td map model points into CT coordinates; thus, the calculation of the correction schedule takes place in a single (CT) coordinate system. 2.1 Ring Registration Our models of the proximal and distal rings and the centers of rotation of the universal joints of the struts (the strut end points) relative to the rings are shown in Figure 2 and numerical details are given in Table 1. Registration of the model proximal and distal landmarks {Mp } and {Md } to the fiduciary CT landmarks {Fp } and {Fd } can be estimated by any absolute orientation solver; we use Horn’s method [5]. Using the registration transformations Tp and Td we can compute the current strut lengths as si = ((Tp pi − Td di ) · (Tp pi − Td di ))1/2 , i = 1..6
(1)
where Tp pi and Td di ) are the proximal and distal model strut end points, respectively, registered to CT coordinates. The estimated current strut lengths can be compared against the physical strut lengths to validate the registration process. 2.2 Planning We use direct volume visualization of the CT scan to avoid segmentation. The rings, wires, and pins can, for the most part, be removed from the images by using simple region of interest selection. The scan can be separated easily into two volumes of slices
938
B. Ma, A.L. Simpson, and R.E. Ellis
Fig. 2. Models of the 180 mm proximal and distal rings and the strut end points Table 1. Cylindrical coordinates (radius, angle, z) of ring landmarks and strut end points for the 180 mm ring
Landmarks
Strut End Points
Proximal Ring (116.4 mm, 0◦ , 0 mm) (116.4, 120◦ , 0) (116.4, 240◦ , 0) (109.5 mm, 0 − 6.67◦ , −16 mm) (109.5, 0 + 6.67◦ , −16) (109.5, 120 − 6.67◦ , −16) (109.5, 120 + 6.67◦ , −16) (109.5, 240 − 6.67◦ , −16) (109.5, 240 + 6.67◦ , −16)
Distal Ring (116.4 mm, 60◦ , 0mm) (116.4, 180◦ , 0) (116.4, 300◦ , 0) (109.5 mm, 300 + 6.67◦ , 16 mm) (109.5, 60 − 6.67◦ , 16) (109.5, 60 + 6.67◦ , 16) (109.5, 180 − 6.67◦ , 16) (109.5, 180 + 6.67◦ , 16) (109.5, 300 − 6.67◦ , 16)
that are, respectively, proximal and distal to the osteotomy plane. The proximal volume is manipulated to achieve the desired correction. Our planning software, written using VTK (www.vtk.org), provides three orthogonal views of the two volumes (Figure 3). On a PC with modest computing power1, it maintains interactive rates with the volumes we used in this study. The output of the planning software is the rigid transformation Tplan , in CT coordinates, of the proximal fragment. 2.3 Correction Schedule The correction schedule specifies the daily strut lengths needed to achieve the desired correction. The patient adjusts each strut to the specified length once each day. The current (time t = 0) and final (time t = n) locations of the model proximal strut end points in CT coordinates are given by
1
pCT i,t=0 = Tp pi
(2)
pCT i,t=n
(3)
= Tplan Tp pi .
CPU: AMD 3800 X2, GPU: NVIDIA 7800GS.
Simple Computer–Assisted Technique for Correcting Bone Deformities
939
Fig. 3. Volume rendered planning images for phantom 3 before (top) and after (bottom) deformity correction with the distal end held stationary; note the large axial rotation deformity. Images were generated using a linear opacity transfer function and a constant color transfer function; no attempts were made to optimize the transfer functions.
Normally, the duration n, measured in days, is not given. Instead, the reference point xref is displaced by a specified distance each day; typically, the magnitude of the displacement is 1 mm per day. We compute the correction schedule by converting Tplan to its screw representation [6]. A screw transformation is a rotation of angle Θ about an axis with direction b passing through the point c, followed by translation of magnitude M along the same axis. We use brute force, discrete search over Θ and M to find the strut lengths: – – – – –
find the screw parameterized by b, c, Θ and M corresponding to Tplan set dθ = Θ/N , dM = M/N for some large N (say N =1000) set xold = xref set t = 1 for i = 1..N • set θi = i × dθ • set mi = i × dM • set Ti = S(b, c, θi , mi ) • set xi = Ti xref • if xi − xold ≥ 1mm ∗ compute strut lengths for day t (Equation 1, substitute Ti Tp for Tp )
940
B. Ma, A.L. Simpson, and R.E. Ellis
∗ set t = t + 1 ∗ set xold = xi • end if – end for – compute strut lengths for the last day t (Equation 1, substitute Tplan Tp for Tp ) where S(b, c, θ, m) is the 4 × 4 rigid transformation matrix corresponding to the screw transformation with angle θ, translation magnitude m, and axis with direction b passing through the point c. 2.4 Experimental Validation Three plastic tibia phantoms were obtained (Sawbones, Pacific Research Laboratories, Inc., Vashon, WA, USA). Two phantoms were of deformed bones with approximately 10◦ varus proximal deformity at the level of the fibular head. The third bone was a normal tibia modified to have an approximately 45◦ axial deformity at the distal end. Each phantom was scanned using CT for validation purposes and polygonal surface models were created. A Taylor frame with 180 mm rings was mounted on each phantom using Kirschner wires on the ring closest to a joint line and Steinman pins on the other ring. The frame and phantom were scanned using CT with a slice spacing of 2.5 mm. Our seven-step technique described at the beginning of the Methods section was used on phantoms 1 and 2 (varus proximal deformity) a total of six times, and on phantom 3 (axial distal deformity) four times. We compared the physical and computed strut lengths to validate the ring registration process (see Section 2.1). The CT scan of the first phantom was performed with the phantom cut and distracted part-way through a correction schedule, which simulated a partial correction using conventional technique. The other phantoms were scanned before they were cut. The accuracy of each correction was computed by registering the distal and proximal fragments to the surface model of the intact phantom. An Optotrak optical tracking system (Northern Digital Inc., Waterloo, Ontario, Canada) was used to acquire registration data. A dynamic reference body was attached to the distal fragment of the phantom. Registration points were collected from the surface of the phantom using a calibrated stylus. Let Treg,p and Treg,d be the registration transformations from the tracker coordinate system to the CT coordinate system of the proximal and distal fragments, respectively. Then we have (4) Tachieved = Treg,d T−1 reg,p ≈ Tplan for the proximal fragment. The error between the achieved correction and the plan is Δ = Tplan T−1 achieved
(5)
The total angular error of the achieved correction can be computed by finding the quaternion or screw representation of Δ. The translational error can be computed by locating a relevant anatomic landmark t in the CT images and calculating δ = Tplan t − Tachieved t We set t to be the location of the tibial intercondylar eminence in the CT images.
(6)
Simple Computer–Assisted Technique for Correcting Bone Deformities
941
Table 2. Errors of the achieved corrections
Total rotation error Lengthening error (mm)
Femur 1 1.7◦ , 3.0◦ , 2.8◦ , 2.4◦ -0.6, 2.6, 0.6, 1.5
Femur 2 3.0◦ , 2.2◦ 1.0, 1.0
Femur 3 3.3◦ , 3.0◦ , 3.0◦ , 3.2◦ 0.3, 0.5, 1.2, 1.4
3 Results The residual rotation and translation errors of the achieved correction are summarized in Table 2. The average rotational error was 2.8◦ with no error greater than 3.3◦ . The average lengthening error was 1.0 mm with no error greater than 2.6 mm. Our correction error results are comparable to clinical results reported by Feldman [7]. We did not experience any stress loading of the phantoms that would have caused unusual translational errors as reported in [3]. We did not observe differences between the computed and physical strut lengths of greater than 3 mm in any of the trials.
4 Discussion Our technique aims to remove the most significant sources of error experienced with conventional technique. Our method does not require preoperative measurements on radiographs and it allows the surgeon to fix the rings on the patient in any reasonable configuration. Achieving these goals requires the use of a postoperative CT scan, which is conventionally not required. The CT-based methods described by [2] and [3] also address the errors associated with conventional technique. The residual errors we measured were comparable to those reported by both [2] and [3]. Like their methods, ours yields the current strut lengths that can be compared to the physical strut lengths to ensure that the necessary measurements have been performed accurately. Unlike the other methods, ours does not require a tracking system, navigation hardware and software, intraoperative registration, or segmentation of the CT scan. Our method also has the advantage of being usable anytime before the correction schedule has been completed; thus our method can be used as the primary means of achieving a correction, or as a method to possibly recover from a failing conventional procedure. Because finding the ring landmarks and planning the correction can be accomplished quickly, it is possible to use our method for trauma cases if CT is available within approximately ten days of frame attachment. The major disadvantage of our method is that the fidelity of the postoperative CT scan is compromised by the stainless steel hardware of the pins, wires, and small components of the frame. The imaging artifacts do not adversely affect the localization of the frame landmarks, but they can be an impediment to precise planning of the correction. The imaging artifacts are especially problematic if one of the bone fragments is small because it becomes difficult to visualize the correction in the degraded images; Steinman pins must be avoided in these circumstances. It is possible to use radiographs and clinical evaluation of axial rotation to assist in the planning process, but we have not yet evaluated the efficacy of such an approach.
942
B. Ma, A.L. Simpson, and R.E. Ellis
We used Horn’s method to register the models of the rings to the ring landmarks localized in the CT images. A method such as the one described by Ohta and Kanatani [8] may prove to be more accurate because the CT voxel spacing is often anisotropic. The accuracy of our technique is dependent on the localization error of the ring landmarks, which is affected by the CT slice spacing. The errors introduced by the CT resolution could be minimized by designing special fiducial markers or registering the image of the rings (instead of identifying point landmarks). A weakness of our current study is that it was performed using polyurethane phantoms that were very radiolucent. Imaging of biological specimens needs to be performed to determine if the fidelity of the CT scan is sufficient for reliable planning. Allowing for these limitations, our navigation-free technique provides clinicians with a novel application of computer-assisted orthopaedic surgery. Potential future work might include estimating the relationship of the rings with respect to the bone by using radiographs instead of the postoperative CT scan.
References 1. Lieverman, J.R., Friedlaender, G.E. (eds.): The Ilizarov technique for bone regeneration and repair. In: Bone regeneration and repair: biology and clinical applications, Humana Press, Totowa (2005) 2. Iyun, O., Borschneck, D.P., Ellis, R.E.: Computer-assisted correction of bone deformities using a 6-DOF parallel spatial mechanism. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, pp. 232–240. Springer, Heidelberg (2001) 3. Simpson, A.L., Ma, B., Borshneck, D.P., Ellis, R.E.: Computer-assisted deformity correction using the Ilizarov method. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 459–466. Springer, Heidelberg (2005) 4. Simpson, A.L., Ma, B., Slagel, B., Borshneck, D.P., Ellis, R.E.: Computer-assisted distraction osteogenesis using the Taylor frame: initial clinical experiences. In: The 2nd Annual Congress of The British Society For Computer Aided Orthopaedic Surgery (2007) 5. Horn, B.K.P.: Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America A 4, 629–642 (1987) 6. McCarthy, J.M.: An Introduction to Theoretical Kinematics. MIT Press, Cambridge (1990) 7. Feldman, D.S., Shin, S.S., Madan, S., Koval, K.: Correction of tibial malunion and nonunion with six-axis analysis deformity correction using the taylor spatial frame. Journal of Orthopaedic Trauma 17(8), 549–554 (2003) 8. Ohta, N., Kanatani, K.: Optimal estimation of three-dimensional rotation and reliability evaluation. IEICE Transactions on Information and Systems E82-D(11), 1247–1252 (1998)
Global Registration of Multiple Point Sets: Feasibility and Applications in Multi-fragment Fracture Fixation Mehdi Hedjazi Moghari1 and Purang Abolmaesumi1,2 1
Department of Electrical and Computer Engineering, Queen’s University, Canada 2 School of Computing, Queen’s University, Canada [email protected]
Abstract. An algorithm to globally register multiple 3D data sets (point sets) within a general reference frame is proposed. The algorithm uses the Unscented Kalman Filter algorithm to simultaneously compute the registration transformations that map the data sets together, and to calculate the variances of the registration parameters. The data sets are either randomly generated, or collected from a set of fractured bone phantoms using Computed Tomography (CT) images. The algorithm robustly converges for isotropic Gaussian noise that could have perturbed the point coordinates in the data sets. It is also computationally efficient, and enables real-time global registration of multiple data sets, with applications in computer-assisted orthopaedic trauma surgery.
1
Introduction
Global registration of multiple 3D data sets is an essential and complex problem in medical imaging and computer vision, where three or more data sets must be aligned in a common reference frame. One of the applications of such registration is in computer-assisted orthopedic trauma surgery, where surgeons are interested in reassembling the bone fractures back into the original solid bone (Figure 1). Generally the registration problem is divided into two sub-problems of finding corresponding points among data sets, and deriving the transformation parameters that minimize the distance among the estimated correspondences. In this article, we concentrate on the registration task and assume that the corresponding points are either available or the data sets are close enough such that the closest points among the data sets are reliable candidates for the corresponding points. In real applications, the corresponding points are not precisely localized and therefore, they are perturbed by some error (noise), which could be caused by for example, sampling, image segmentation and reconstruction errors. The existence of noise in the data sets is another issue that must be considered in registration algorithms. The statistical property of the noise is usually unknown; however, if it is identified, it can be incorporated in the registration process to improve the accuracy. Here, we assume that the data sets are corrupted by isotropic Gaussian noise with a distribution that may be different for each data set. N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 943–950, 2007. c Springer-Verlag Berlin Heidelberg 2007
944
M.H. Moghari and P. Abolmaesumi
Fig. 1. Multiple data set registration of two femur bone fractures to the whole bone model (template). T1, T2 and T12 are the rigid transformations mapping Fracture 1 to the bone template, Fracture 2 to the bone template and Fractures 2 and 1 to each other, respectively.
The last point is to distinct the pair-wise registration from the global registration algorithms. The former registers only two data sets at a time; however, the global registration techniques simultaneously register multiple data sets. The latter means that, as shown in Figure 1, all the overlapping areas can be simultaneously considered in the global registration algorithms to estimate the transformation parameters. One may argue that pair-wise registration algorithms may also be utilized to solve the global registration problem for multiple data sets, by sequentially registering each two pairs of data sets. However, this will lead to an uneven spread of registration error among the data sets [1]. In our earlier work, the Unscented Kalman Filter (UKF) algorithm was proposed for pair-wise, point-based registration [2]. In this paper, we use the UKF algorithm to globally register multiple data sets, and to simultaneously estimate the transformation parameters. Two algorithms are developed for scenarios where point correspondences are known, and where the correspondences are not available; but, the data sets are roughly aligned together such that the closest point metric [3] can be used to get a reliable estimate of the correspondences. Both algorithms are verified on simulated and real data sets.
2
Mathematical Problem Definition and Previous Work
To mathematically define the global registration problem based on the known correspondences, let’s assume there are M data sets (S 1 ,..., and S M ) that might have overlap with each other. The problem is to determine M rigid transformations (T1 , T2 , ..., and TM ) such that the cost function C is minimized: 1
M
α,β M−1 M N
C(T , ..., T )=
α,β −1 β β,α T β β,α (Tα pα,β (Tα pα,β k −T pk ) [Λk ] k −T pk ), (1)
α=1β=α+1k=1
is given and it models the variance of the point where the weight matrix Λα,β k β,α localization error; pα,β and p are the corresponding points between S α and S β k k
Global Registration of Multiple Point Sets
945
data sets, respectively, and N α,β is the number of corresponding points between the two data sets. It should be noted that this problem is degenerate since if the same transformation applies to all the data sets, it causes no difference in the registration. This degeneracy can be removed by assuming that one of the transformations, such as T1 , to be identity. Therefore, there are M − 1 transformations remaining that should be estimated. Thus far, many global registration algorithms have been proposed to solve the multiple data set registration problem using the gradient descent technique [4], “mean shape” [5], Quaternions [6], the Expectation Maximization (EM) algorithm [7], the classic Gauss method [8], and the first order kinematics [9]. All these algorithms iteratively minimize the cost function C, by considering all the corresponding points at once. Instead, we propose an iterative minimization approach, using the Unscented Kalman Filter (UKF) algorithm, to simultaneously estimate the transformation parameters by incrementally (one by one) feeding the corresponding points to the registration algorithm. Since the optimization method usually converges well before all the corresponding points are used in the registration process, the algorithm is computationally efficient and can potentially make real-time clinical applications feasible. Furthermore, the proposed approach can compute the variance of the estimated transformation parameters, which could be used as a measure for the registration accuracy.
3
Method
Let us assume there are M − 1 rigid transformations (T2 ,..., and TM ) to be estimated from M potentially overlapping data sets. Data set one is assumed to be in the reference frame (T1 = I3×3 ). Each rigid transformation Tα consists of a rotation matrix Rα (θxα , θyα , θzα ) (The Euler angles are used to represent the α α rotation matrix.), and a translation vector tα = [tα x , ty , tz ]. Therefore, (M −1)×6 parameters are needed to estimate M − 1 rigid transformations. Let’s define the state vector x comprising these transformation parameters as: M M M M M T x = [t2x , t2y , t2z , θx2 , θy2 , θz2 , ..., tM x , ty , tz , θ x , θ y , θ z ] ,
= xα t
(2)
T T T T T [x2t , x2θ , ..., xM , xM ] , t θ
α α T [tα x , ty , tz ]
α α α T where = and xα θ = [θx , θy , θz ] . The process model is assumed to be governed by a linear function as:
xk = xk−1 + N (0, ΣQ ),
(3)
where xk is the defined state vector at time k (when the kth pair of corresponding points is fed into the algorithm) with the initial value and covariance matrix x0 and P0x , respectively. N (0, ΣQ ) is a zero-mean, Gaussian random vector with covariance matrix ΣQ . The goal is to incrementally estimate the state vector xk from the observation model: 1,β ¯ + N (0, Λ1,β p1:k = R(xβ ) pβ,1 1:k + t(xβ 1:k ), α = 1, t) θ y1:k = β,α ¯ ¯ α −R 0=R(xαθ ) pα,β + t β p − t +N (0, Λα,β (xt ) 1:k 1:k ), α=2, ...,M − 1, (xθ ) 1:k (xβ t) (4)
946
M.H. Moghari and P. Abolmaesumi
α,β α,β ¯ α α α where β = α + 1, ..., M , pα,β 1:k = [p1 , ..., pk ], t(xt ) = [t(xt ) , ..., t(xt ) ]3×k , α,β and N (0, Λ1:k ) is a zero-mean, Gaussian random vector with covariance matrix α,β α,β Λα,β 1:k = diag(Λ1 , ..., Λk ), that models the point localization error in data sets, and y1:k is the column vector with the maximum size of [3kM (M − 1)/2] × 1. To estimate the state vector x, the following algorithm iterates until it converges to the final solution: 1) Predict the state vector x and its covariance matrix Px from the state model, Equation (3), as: x ˆ− ˆk−1 , and Pxˆ− = Pxˆk−1 + ΣQ , rek = x k spectively; 2) Append the kth pair of corresponding points from the overlapping data sets to the set of already selected pairs, and predict the corresponding points’ positions in the reference frame using the computed state vector in Step 1: ¯ β , R(ˆxβ ) pβ,1 1:k + t(ˆ xt ) θ y ˆ1:k = (5) α,β ¯ R(ˆxα ) p1:k + ¯ t(ˆxα ) − R β pβ,α 1:k − t β , θ
t
(ˆ xθ )
(ˆ xt )
; 3) Calculate the distance error of the corresponding points in the reference frame to update the state vector and its covariance matrix as: ˆ− ˆ1:k ), x ˆk = x k + Kk (y1:k − y
T Pxˆk = Pxˆ− − Kk E[y1:k y1:k ]KTk ,
(6)
k
T T where Kk = E[xk y1:k ]/E[y1:k y1:k ] is called the Kalman gain. This procedure iterates through all pairs of corresponding points until convergence to a solution. For the case where correspondences among data sets are not available, the proposed algorithm can be extended to use the closest point metric [3] to attain the correspondences. Here, it is assumed that the data sets are roughly aligned such that the closest point metric returns reliable candidates of the corresponding points to initiate the optimization process. Then, the correspondences are updated while the data sets are being registered to each other using the following algorithm: 1) Choose three points from each data set and find their corresponding points in the overlapping areas in the other data sets; 2) Use the correspondences and the proposed UKF registration algorithm to register the data sets; 3) Apply the estimated transformations to the data sets; 4) Append another point from each data set to the set of points already selected from that data set. Update the location of the corresponding points (using the closest point metric) for all the points considered in the optimization until this iteration, and go to Step 2. This procedure iterates till it converges to a solution.
4
Results
To verify the proposed algorithm, three simulations were performed in MATLAB on a 2GHz PC with 512MB of RAM. We first performed an experiment on randomly generated data sets based on given correspondences. To do so, three overlapping data sets in the range of ±100 mm were generated. Every data set consisted of 100 points, each with 50 overlapping points. Two rigid random transformations, in the range of [±50o, ±50mm], were applied to two of the data sets to bring them out of alignment. Three zero-mean, Gaussian noise distributions with different variances were added to three overlapping areas among the
Global Registration of Multiple Point Sets
947
Table 1. Surface registration error using the proposed and Pennec’s algorithms over 500 trials for three different experiments (Ov.:overlapping area) Three Exps. Var. noise Pennec’s UKF Var. noise Pennec’s UKF Var. noise Fig. 2. Distribution of the estimated transPennec’s formation parameters, using the proposed UKF algorithm, over 500 trials
Mean square surf. Ov.1&2 Ov.1&3 1mm2 1mm2 2 0.96mm 0.98mm2 0.97mm2 0.99mm2 3mm2 2mm2 2 2.91mm 1.93mm2 2.97mm2 1.95mm2 3mm2 2mm2 2 2.9mm 1.92mm2 2.96mm2 1.95mm2
reg. error Ov.2&3 1mm2 1mm2 0.97mm2 0.025mm2 0.14mm2 0.027mm2 0.001mm2 0.12mm2 0.004mm2
data sets. Then, the proposed algorithm was used to register the two data sets back to their original places. The procedure was repeated for 500 trials. On average, in each trial it took 3 seconds to finalize the registration. We used different noise levels added to data sets to verify the performance of the algorithm. We also compared the performance of the proposed technique with the one reported by Pennec [5]. Table 1 displays the variances considered for each overlapping area among the data sets, and the mean square surface registration error after global registration of the data sets. As shown, the proposed algorithm more accurately registers the data sets than the Pennec’s algorithm since the mean squared surface registration error calculated by the proposed algorithm is closer to the variance of the noise perturbing the data sets. Figure 2 depicts the distance error histograms of the rotation and translation parameters for one of the trials where the variance of the noise added to each overlapping area was 1mm2 . Furthermore, in this simulation, in order to show the superiority of the proposed global registration algorithm over the sequential registration algorithms, we sequentially registered data sets 2 and 3 to data set 1. In this case, the mean squared surface registration errors between data sets 2 and 3 were calculated as 1.1mm2 , 0.3mm2 , and 0.3mm2 , respectively, for different variances of the noise (as above) perturbing that region. As expected, these variances are farther away from the ones reported by the global registration technique, demonstrating the better accuracy of the proposed technique. In the next experiment, three data sets collected from a fractured femur bone phantom were used. The first data set was generated by taking CT images of the femur bone, before it was fractured. CT images were captured using a LightSpeed Plus CT scanner. Then, Mesher software developed at our institution was used to semi-automatically segment the CT images, and to create a 3D surface
Normalized histogram
948
M.H. Moghari and P. Abolmaesumi 0.2 0.1 0 0 0.1 0.05 0 0 0.1 0.05 0 0
0.5 1 1.5 2 Fract. 1 and bone template
0.5 Fract. 2 and bone template
0.5 1 1.5 2 Fract. 1 and fract. 2
Table 2. Convergence rate of the proposed algorithm in globally registering multiple femur bone fractures to 2.5 the bone template ( Fracture 1 to the template, Fracture 2 to the template and Fracture 2 to 1) over 250 trails for different range of random transfor1 mations; when there is no outlier and when there is outlier in the data sets
2.5
Fig. 3. Normalized histogram of surface registration error among the registered areas for femur fractures in (mm) over 250 trials where the range of generated random transformations is [±2.5o ,±2.5mm]
Range of Trans. (mm.,deg.) ±2.5 ±5 ±10 ±15 ±20
Convergence rate in % with outl. no outl. 1 2 1&2 1 2 1&2 99 100 99 100 100 100 97 100 96 100 100 100 82 97 87 98 98 98 60 93 63 97 92 89 48 86 46 90 84 76
model (bone template)– containing 36, 508 points– using a marching cubes algorithm. To produce the other two data sets, the femur bone phantom was fractured into two pieces. Another set of CT images was taken from each piece and a 3D surface model was then generated from that piece. The generated mesh of the bone and its fractures are shown in Figure 1. Before fracturing the femur bone phantom, some fiducials were implanted on the bone surface. Those fiducials were used to register the bone fracture mesh data sets to the bone template. Two random transformations are applied on the two fracture data sets to take them out of alignment. Then, the proposed UKF-based global registration algorithm, based on unknown correspondences, was used to globally register the transformed fracture data sets back to the bone template. To do so, 100 points were randomly chosen from each overlapping area among fractures and the bone template. Those points were then used to globally register the fracture data sets back to their original locations for two cases; where there are no outliers among the fracture data sets (100 points of each fracture are registered to 100 points of the other fracture or the bone template), and where there were outliers among data sets (100 points of each fracture are registered to all the points of the other fracture or bone template). After performing registration, the rest of the fracture points which were not employed in the registration algorithm, were used to compute the surface registration error. This procedure was repeated for 250 trials for different ranges of random transformations. On average, each trial took 29 seconds to complete. Table 2 displays the convergence rate of the proposed algorithm in different simulation runs with different ranges of transformations. It is assumed that the algorithm is able to converge for registration of each overlapping area when the mean surface registration error for that area is less than √ 2.5mm. Figure 3 shows normalized histogram of surface registration error (in mm) for the three overlapping areas after performing registration over 250 trials
Global Registration of Multiple Point Sets
Fig. 4. Global registration of the femur bone fractures
949
Fig. 5. Global registration of the plevis bone fractures
for the range of transformation [±2.5o ,±2.5mm]. Figure 4 depicts the registered fractured bone data sets to the bone template using the proposed algorithm for one of the simulation runs. In the last simulation, the above experiment was repeated on a pelvis bone phantom. Using CT images, a 3D mesh was generated from the phantom, containing 83, 562 points. Then, the bone was fractured into three pieces. CT images were used to generate a surface mesh for each fractured piece. Before fracturing the phantom, some fiducials were mounted on the bone. Those markers were used to register the fractures back to their original locations on the bone template. Three random transformations in the range of [±10o,±10mm] were applied to the fractures to bring them out of alignment. Then, the proposed algorithm was used to simultaneously register 80 points, randomly selected from each fractured surface, to the bone template. This procedure was repeated for 250 trials. On average, each trial took 22 seconds to complete. Figure 5 displays the registered pelvis bone fractures to the pelvis bone template for a simulation run. Before and after performing registration, the mean surface registration error for all the overlapping areas over 250 trials were calculated to be 14.61mm and 1.41mm, respectively (with outliers); and 14.6mm and 0.83mm, respectively (with no outliers).
5
Discussions and Conclusion
The goal of this work was to present a new global registration algorithm for registering multiple overlapping data sets based on known and unknown correspondences. In the case of known correspondences, the proposed algorithm was used to globally register randomly generated data sets, and its performance was compared with the Pennec’s algorithm. As shown in Table 1, the proposed algorithm was able to reduce the mean square surface registration error among the overlapping areas to the level of variance of the noise added to each data set, and was more accurate than the Pennec’s technique. Where correspondences
950
M.H. Moghari and P. Abolmaesumi
among data sets were not available, the closest point metric was utilized to get a reliable estimate of the correspondences. Results on fractured femur and pelvis bone phantoms, with or without outliers in the registration process, show that the algorithm is robust and can converge even when relatively large initial alignment errors exist. As future work, it would be of interest to increase the robustness of the technique to initial alignment errors. One solution would be a feature matching technique, such as robust point matching algorithm (RPM) [10], to automatically find at least three pairs of correspondences between each data set. Furthermore, We would like to compare the proposed registration algorithm with the sequential ICP registration algorithms existing in the literature. Extending the proposed algorithm for registering multiple data sets perturbed by anisotropic Gaussian noise is another issue to be investigated.
Acknowledgment We would like to thank Dr. Maarten Beek for assisting us with CT data collection and 3D surface mesh generation of the femur and pelvic bone phantoms.
References 1. Sharp, G., Lee, S., Wehe, D.: Multiview Registration of 3D Scenes by Minimizing Error between Coordinate Frames. PAMI, 1037–1050 (2004) 2. Moghari, M., Abolmaesumi, P.: Point-based rigid-body registration using an unscented kalman filter. IEEE Trans. Medical Imaging (to appear) 3. Besl, P., McKay, H.: A Method for Registration of 3D Shapes. PAMI 14(2), 239–256 (1992) 4. Stoddart, A., Hilton, A.: Registration of Multiple Point Sets. In: ICPR, vol. 2 (1996) 5. Pennec, X.: Multiple Registration and Mean Rigid Shapes - Application to the 3D Case. Image Fusion and Shape Variability Techniques, 178–185 (1996) 6. Benjemaa, R., Schmitt, F.: A Solution for the Registration of Multiple 3D Point Sets using Unit Quaternions. In: Proc. Eur. Conf. Comp. Vis., pp. 34–50 (1998) 7. Goldberger, J.: Registration of Multiple Point Sets using the EM Algorithm. Comp. Vis. 2, 730–736 (1999) 8. Williams, J., Bennamoun, M.: A Multiple View 3D Registration Algorithm with Statistical Error Modeling. Inf. Sys., 1662–1670 (2000) 9. Pottmann, H., Leopoldseder, S., Hofer, M.: Simultaneous Registration of Multiple Views of a 3D Object. Photo. Remote Sens. Spatial Inf. Sci., 265–270 (2002) 10. Rangarajan, A., Chui, H., Bookstein, F.L.: The softassign procrustes matching algorithm. In: Duncan, J.S., Gindi, G. (eds.) IPMI 1997. LNCS, vol. 1230, pp. 29–42. Springer, Heidelberg (1997)
Precise Estimation of Postoperative Cup Alignment from Single Standard X-Ray Radiograph with Gonadal Shielding Guoyan Zheng1 , Simon Steppacher2 , Xuan Zhang1 , and Moritz Tannast2 1
MEM Research Center, University of Bern, Stauffacherstrasse 78, CH-3014, Bern, Switzerland [email protected] 2 Department of Orthopaedic Surgery, Inselspital, University of Bern, Switzerland
Abstract. This paper addresses the problem of estimating postoperative cup alignment from single standard X-ray radiograph with gonadal shielding. The widely used procedure of evaluation of cup orientation following total hip arthroplasty using single standard anteroposterior radiograph is known inaccurate, largely due to the wide variability in individual pelvic position relative to X-ray plate. 2D-3D image registration methods have been introduced to estimate the rigid transformation between a preoperative CT volume and postoperative radiograph(s) for an accurate estimation of the postoperative cup alignment relative to an anatomical reference extracted from the CT data. However, these methods require either multiple radiographs or a radiograph-specific calibration, both of which are not avaiable for most retrospective studies. Furthermore, these methods were only evaluated on X-ray radiograph(s) without gonadal shielding. In this paper, we propose to use a hybrid 2D-3D registration scheme combining an iterative landmark-to-ray registration with a 2D-3D intensity-based registration to estimate the rigid transfromation for a precise estimation of cup alignment. Quantitative and qualitative results evaluated on clinical and cadaveric datasets are given which indicate the validity of our approach. Keywords: postoperative cup alignment, radiograph, 2D-3D registration, iterative landmark-to-ray registration, intensity-based registration.
1
Introduction
Two-dimensional (2D) anteroposterior (AP) pelvic radiographs, despite their inferior accuracy in comparison to three-dimensional (3D) techniques based on computed tomography [1], are the standard imaging method for the evaluation of cup orientation following total hip arthroplasty (THA) [2][3], largely due to the simplicity, availability, and minimal expense associated with acquiring these images. While plain pelvic radiographs are easily obtained, their accurate interpretation is complicated by the wide variability in individual pelvic position relative to the X-ray plate [1] (see Fig. 1(a) for a detailed explanation). In THA, increased pelvic tilt results in a significant decreases in apparent prosthetic cup N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 951–959, 2007. c Springer-Verlag Berlin Heidelberg 2007
952
G. Zheng et al.
Fig. 1. (a) Two angles ( α = anteversion, β = inclination) describe the cup orientation (left image). Due to the large variation of individual pelvic orientation they have to be measured relative to an anatomical reference (the anterior pelvic plane). These standardized values (α, β) significantly differ from the cup orientation calculated out of AP pelvic raidograph (α , β ); and (b) method from [2] for determining radiographic acetabular cup orientation with measurement lines drawn on a plain X-ray with gonadal shielding (right image). A line is drawn at 1/5th distance of the maximum diameter lAB and perpendicular to the major axis AB of the projected ellipse. It intersects with lCD ). line AB at point C and with the visible arc at point D. And, α = arcsin( 0.4l AB
anteversion and vice versa [4]. These position variations affect the accuracy of studies correlating cup position to instability, wear, and osteolysis. 2D-3D image registration methods [5][6] have been introduced to estimate the rigid transformation between preoperative CT volume of a patient and postoperative radiograph(s) for an accurate estimation of the postoperative cup alignment relative to an anatomical reference, which is a plane called the anterior pelvic plane (APP) defined by the anterior superior iliac spines (ASIS) and pubic tubercles. The transformation chain in such a method can be summarized by Eq. CT and TAP 1. The estimations of CupOrientation X−ray P are trivial and the challenge lies X−ray , particularly when single standard radiograph with in the estimation of TCT gonadal shielding is used. X−ray CT CupOrientation = TAP · CupOrientation AP P P · TCT X−ray
(1)
and In this work, we use the method introduced in [2] to find CupOrientation X−ray CT the method published in [7] to find TAP P . We then propose to use a hybrid 2D-3D registration scheme combining an iterative landmark-to-ray registration [8] with X−ray a 2D-3D intensity-based registration [9] to find the rigid transfromation TCT between a preoperative CT volume and single standard X-ray radiograph. Our method does not require a radiograph-specific calibration and can work with Xray radiograph with gonadal shielding. The only information that we assume to know about the radiograph is the image scale (pixel/mm) and the distance from the focal point to the imaging plane or to the film. As long as the radiograph is acquired in a standardized way, which is performed in a clinical routine [3], they can be estimated by performing one-time calibration [10].
Precise Estimation of Postoperative Cup Alignment
2 2.1
953
The Proposed Approach Cup Alignment Estimation Protocol
Our cup alignment estimaiton protocol includes following five steps: 1. CT data processing and landmark extraction. In this step, we first extract the surface models of both pelvis and femur from the CT volume data and then define region of interest, which will be used in step 4 to exclude the contribution of most of the femur part to the digitally reconstructed radiograph (DRR) - a projection image obtained from the CT volume data by simulating X-ray projection. We then interactively extract two sets of landmarks from the CT data using a custom-made planning tool [13]: (a) landmarks for measuring cup orientation, including left and right ASIS, and pubic tubercles; they are used to define the APP; and (b) landmarks for registration, including left and right acetabular centers (by interactive sphere fitting), the pubic symphysis, and the middle of the sacrococcygeal joint (see Fig. 2(a) for details). 2. X-ray radiograph landmark extraction. In this step, two sets of landmarks are interactively picked from the radiograph (see Fig. 2(b) for details): (a) three landmarks for measuring cup orientation as described in [2], which are used to calculate the radiographic cup orientation (see Fig. 1(b), right); and (b) the corresponding projections of those CT landmarks for registration, including left and right acetabular centers, the upper border of the symphysis, and the middle of the sacrococcygeal joint. The local coordinate reference and the cone-beam projection model of the radiograph is then established as follows (see Fig. 2(c) for details). The intersection between the line connecting the middle of the sacrococcygeal joint and the upper border of the symphysis and the line connecting the acetabular centers is assumed to be the cone-beam projection center and is taken as the coordinate origin. The central projection line is perpendicular to the radiograph plane and its opposite direction is regarded as Z-axis. With gonadal shielding, it is possible that either the middle of the sacrococcygeal joint or the upper border of the symphysis is occluded. In such a case, a rough estimation is used. 3. Iterative landmark-to-ray registation. Using those landmarks picked from the CT data and from the radiograph, we perform an iterative landmarkto-ray registation. The estimated rigid transformation is then treated as the starting value for next step. 4. Intensity-based 2D-3D registration. The rigid transformation obtained from last step will be fine-tuned by an intensity-based 2D-3D registration which uses a similarity measure derived from Gibbs random field theory [9]. 5. Cup alignment estimation. In this step, we will estimate the cup alignment in relative to the anterior pelvic plane of the patient 2.2
Iterative Landmark-to-Ray Registration
Let us denote those landmarks defined in CT volume, i.e., the left and right acetabular centers, the pubic symphysis, and the middle of the sacrococcygeal
954
G. Zheng et al.
(a)
(b)
(c)
Fig. 2. (a) Landmarks extracted from CT volume; (b) landmarks extracted from radiograph; and (c) the radiograph coordinate system and the cone-beam projection model 1 2 3 4 joint, as vCT , vCT , vCT , and vCT , respectively; and their corresponding land1 2 3 marks interactively picked from the radiograph as vX−ray , vX−ray , vX−ray , and 4 vX−ray , respectively. And for each X-ray landmark, we can calculate a projection ray emitting from the focal point to the landmark. We then calculate the length 1,2 1 2 3 and vCT and denote it as lCT , and the shortest distance from vCT between vCT 4 (or vCT , if the upper border of the symphysis on the radiograph is occluded) to 3,1−2 4,1−2 1 2 vCT and denote it as lCT (or lCT ). Using the known image scale, we line vCT 1,2 1 2 . Then, we do: also calculate the length lX−ray between vX−ray and vX−ray
Initialization. In this step, we assume that the line connecting the acetabular centers is parallel to the AP pelvic radiograph plane. Using this assumption and the correspondences between the landmarks defined in the CT volume and those 1 2 and v¯X−ray picked from the radiograph, we can first compute two points v¯X−ray 1 2 on the projection rays of vX−ray and vX−ray , respectively, which satisfy: 1,2 1 2 1 2 1 2 v¯X−ray v¯X−ray //vX−ray vX−ray ; and | v¯X−ray − v¯X−ray | = lCT
(2)
3 3 on the projection ray of vX−ray whose distance to We then find a point v¯X−ray 3,1−2 1 2 the line v¯X−ray v¯X−ray is equal to lCT . A paired-point matching [11] based on i i {vCT ; i=1,2,3} and {¯ vX−ray ; i=1,2,3} is performed to calculate the initial rigid X−ray ¯ (see Fig. 3(a) for details). From now on, we assume that transformaton TCT all information defined in the radiograph coordinate frame has been transformed X−ray into the CT coordinate frame using T¯CT . We denote the transformed X-ray i landmarks as {˜ vX−ray } and the transformed X-ray focal point as f˜X−ray .
Iteration. The following steps are iteratively executed until convergence: i 1. For a point vCT , we find a point on the corresponding projection ray of i i and denote it as v˜X−ray which has the shortest distance to the point vCT
Precise Estimation of Postoperative Cup Alignment
955
Fig. 3. Iterative landmark-to-ray registration. (a) schematic view of initialization; and (b) schematic view of finding 3D point pairs. i v¯CT (see Fig. 3(b)). We then perform a paired-point matching [11] using the X−ray extracted point pairs to compute a rigid transformation ΔT˜CT . X−ray ˜ . 2. We update the radiograph coordinate frame using ΔTCT
2.3
Intensity-Based 2D-3D Registration
Without the use of fiducial markers, the iterative landmark-to-ray registration can not fullfill the accuracy requirements of our application and is complemented by an intensity-based 2D-3D registration. The challenge here is the big area occlusion caused by gonadal shielding which creates large differences between the X-ray radiograph and the DRR obtained from the CT volume data by simulating X-ray projection given the current estimation of the rigid transformation, and contains very little useful information to aid registration. Let us denote L = {(i, j) : 1 i I, 1 j J}, an I × J integer lattice, as the pixel sites of X-ray radiograph and the image value at pixel site (i, j) of the X-ray radiograph as IX−ray (i, j). Similarly, we denote the image value of the DRR at pixel site (i, j) as IDRR (i, j). Our 2D-3D registration scheme is based on a recently introduced spline-based multi-resolution 2D-3D registration scheme [12] but with different similarity measure. We use a similarity measure that is derived from Gibbs random field theory [9]. It allows us to effectively incorporate spatial inforation and has following form: S=
I,J i,j
d2i,j +
I,J i,j
1 r ) card(Ni,j
r where Ni,j is a neighborhood defined by
r (i ,j )∈Ni,j
(di,j − di ,j )2
(3)
956
G. Zheng et al. r Ni,j = {(i , j )|(i , j ) ∈ L, (i , j ) = (i, j), |(i , j ) − (i, j)| r}
(4)
and r is a positive integer that determines the size of the neighborhood system. di,j is the local normalization based difference image value at pixel site (i, j) and is computed by: di,j = I¯X−ray (i, j) − I¯DRR (i, j) I¯X−ray (i, j) =
IX−ray (i,j)−mX−ray (Rri,j ) ; σX−ray (Rri,j )
I¯DRR (i, j) =
IDRR (i,j)−mDRR (Rri,j ) σDRR (Rri,j )
(5)
r r r r ), σX−ray (Ri,j ) and mDRR (Ri,j ), σDRR (Ri,j ) are the mean where mX−ray (Ri,j value and the standard deviation calculated from the intensity values of all sites r of the X-ray radiograph and of the associated DRR, in the local region Ri,j r r respectively. Ri,j has the same size as Ni,j and is defined by: r = {(i , j )|(i , j ) ∈ L, |(i , j ) − (i, j)| r} Ri,j
(6)
To accelerate the registration process, we use the cubic-splines data model described in [12] to compute the multi-resolution data pyramids for both CT images and X-ray images, the DRRs, as well as the gradient and the Hessian of the similarity measure. The registration is then performed from the coarest resolution until the finest one. And to improve the capture range, we use two different sizes of neighborhood systems: r = 15 and r = 3. Starting from the rigid transformation obtained by the iterative landmark-to-ray registation, the similarity measure with the bigger neighborhood system is first minimized via a Levenberg-Marquardt non-linear least-squares optimizer. The estimated rigid transformation is then treated as starting value for optimizing the similarity measure with the smaller neighborhood system.
3
Experimental Results
We designed and conducted experiments on two clinical datasets and a cadaveric pelvis dataset. As there are no ground truths availabe for the two clinical datasets, we use them to qualitatively evaluate the effectivenss of the iterative landmark-to-ray registration and the accuracy of the hybrid 2D-3D registration scheme. Fig. 4 shows one example. The input X-ray radiograph is shown in Fig. 4(a). Fig. 4(b) shows the end of the iterative landmark-to-ray registration and the beginning of the intensity-based 2D-3D registration. Both the x-ray radiograph and the CT volume data are downsampled to 1/8th of the original sizes. The edges extracted from the DRR are superimposed onto the X-ray radiograph. Fig. 4(b) demonstrates the effectivenss of the iterative landmark-to-ray registration. Fig. 4(c) shows the end of the intensity-based 2D-3D registration. An accurate matching between the X-ray radiograph and the DRR was observed. To quantitatively evaluate the measurement accuracy of the proposed approach, a cadaveric pelvis and an all polyethylene acetabular component (Charles F. Thackray, Leeds, UK) were used. Before the prosthesis was implanted, we
Precise Estimation of Postoperative Cup Alignment
(a)
(b)
957
(c)
Fig. 4. (a) X-ray radiograph with gonadal shielding; (b) the beginning of the intensitybased 2D-3D registration. and (c) the end of the intensity-based 2D-3D registration. Table 1. Experimental results Difference between radiographic measurements and the ground truths angle
img 01 img 02 img 03 img 04 img 05 img 06 img 07 img 08 img 09
anteversion (o )
13.4
1.1
2.4
20.8
10.4
19.2
17.8
12.5
9.1
inclination (o )
0.5
5.1
3.6
1.2
0.6
1.0
0.5
1.3
2.5
Difference between the estimation results of the first study and the ground truths anteversion (o )
3.4
0.4
0.4
3.1
0.9
0.9
2.3
2.2
1.8
inclincation (o )
1.4
0.2
1.6
0.9
1.0
2.0
0.1
1.0
0.5
Difference between the estimation results of the second study and the ground truths anteversion (o )
5.9
0.2
0.3
5.3
1.4
1.8
3.6
2.1
3.2
inclincation (o )
1.6
0.1
1.8
1.0
1.0
2.1
0.2
1.2
0.7
did a CT scan of the cadaveric pelvis. After the prosthesis was implanted, we took 9 radiographs by putting the pelvis in different tilt and rotation positions relative to the X-ray plate. To get the ground truth about the prosthesis orientation relative to the anterior pelvis plane of the cadaveric pelvis, we did another CT scan of the pelvis after the prosthesis was implanted. Custom-made software [13] was used to extract the ground truth from the second CT scan. Using these data, we performed two studies. In the first study, each one of the 9 radiograph was used together with the CT scan to estimate the prosthesis orientation and the estimated results were compared to the ground truth. In the second study, to simulate the occlusion caused by gonadal shielding, we intentionally set a region covering 1/5th-1/3th of the valid image area of each radiograph with constant gray value. We then used each one of these radiographs together with the first CT scan to estimate the prosthesis orientation. The purpose was to see the effect of the gonadal shielding on the accuracy of the proposed approach. The differences between the radiographic measurements and the ground truths, and the differences between the estimated angles in both studies and the ground truths are presented in Table 1. Differences of 11.9o ±7.0o were found for the anteversion and differences of 1.8o ± 1.6o was found for the
958
G. Zheng et al.
inclination when the radiographic measurements were compared to the ground truths. Our finding is coincident with a recently published finding [1] that radiographic measurement of anteversion is unreliable. With the help of the hybrid 2D3D registration scheme, the differences were changed to 1.7o ±1.1o for anteversion and 1.0o ± 0.6o for inclincation, which proved the accuracy of the proposed approach. With the simulated gonadal shielding, the differences were slightly higher but still in the acceptable ranges [1]: 2.6o ±2.0o for anteversion and 1.1o ±0.7o for inclincation.
4
Conclusions
In this paper, we proposed a hybrid 2D-3D registration scheme for accurately estimating postoperative cup alignment using a preoperative CT volume and a standard AP pelvic radiograph. Our method is more appropriate for longterm retrospective study than those have been previously reported [5][6], which require either multiple radiographs [5] or a radiograph-specific calibration[6], both of which are not available for most retrospective studies. Furthermore, those methods were only evaluated on X-ray radiograph(s) without gonadal shielding, which may post a challenge for them.
References 1. Kalteis, T., et al.: Position of the acetabular cup-accuracy of radiographic calculation compared to CT-based measurement. Eur. J. Radiol. 58, 294–300 (2006) 2. Pradhan, R.: Planar anteversion of the acetabular cup as determined from plain anteroposterior radiographs. J. Bone Joint Surg. Br. 81-B, 431–435 (1999) 3. Della Valle, C.J., et al.: Primary total hip arthroplasty with a flanged, cemented all-polyethylene acetabular component: evaluation at a minimum of 20 years. J. Arthroplasty 19, 23–26 (2004) 4. Sellers, R.G., et al.: The effect of pelvic rotation on alpha and theta angles in total hip arthroplasty. Contemp. Orthop. 17, 67–70 (1988) 5. LaRose, D., et al.: Post-operative measurment of acetabular cup position using X-ray/CT registration. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 1104–1113. Springer, Heidelberg (2000) 6. Jaramaz, B., Eckman, K.: 2D/3D registration for measurement of implant alignment after total hip replacement. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 653–661. Springer, Heidelberg (2006) 7. DiGioia, A., et al.: Image guided navigation system to measure intraopertively acetabular implant alignment. Clin. Orthop. Rel. Res. 355, 8–22 (1998) 8. Wunsch, P., Hirzinger, G.: Registration of CAD-models to image by iterative inverse perspective matching. In: ICPR 1996, vol. 1, pp. 78–83 (1996) 9. Zheng, G., et al.: Point similarity measures based on MRF modeling of difference images for spline-based 2D-3D rigid registration of X-ray fluoroscopy to CT images. In: Pluim, J.P.W., Likar, B., Gerritsen, F.A. (eds.) WBIR 2006. LNCS, vol. 4057, pp. 186–194. Springer, Heidelberg (2006) 10. The, B.: Digital radiographic preoperative planning and postoperative monitoring of total hip replacements - techniques, validation and implementation. Doctoral dissertations, University Medical Center Groningen, the Netherlands (2006)
Precise Estimation of Postoperative Cup Alignment
959
11. Veldpaus, F.E., et al.: A least-square algorithm for the equiform transformation from spatial marker coordinates. J. Biomech. 21, 45–54 (1988) 12. Joni´c, S., et al.: An optimized spline-based registation of a 3D CT to a set of C-arm images. Int. J. Biomed. Imaging , 1–12 (2006) (Article ID 47197) 13. Zheng, G., et al.: A hybrid CT-free navigation system for total hip arthroplasty. Computer Aided Surgery 7, 129–145 (2002)
Fully Automated and Adaptive Detection of Amyloid Plaques in Stained Brain Sections of Alzheimer Transgenic Mice Abdelmonem Feki1,2 , Olivier Teboul1,2 , Albertine Dubois1 , Bruno Bozon3 , Alexis Faure3 , Philippe Hantraye1, Marc Dhenain1 , Benoit Delatour3 , and Thierry Delzescaux1 1
MIRCen, URA CEA-CNRS 2210, Orsay, France [email protected] 2 Ecole Centrale Paris, Grande Voie des Vignes, Chatenay-Malabry, France 3 Laboratoire NAMC, CNRS, UMR 8620, Universit´e Paris Sud, Orsay, France
Abstract. Automated detection of amyloid plaques (AP) in post mortem brain sections of patients with Alzheimer disease (AD) or in mouse models of the disease is a major issue to improve quantitative, standardized and accurate assessment of neuropathological lesions as well as of their modulation by treatment. We propose a new segmentation method to automatically detect amyloid plaques in Congo Red stained sections based on adaptive thresholds and a dedicated amyloid plaque/tissue modelling. A set of histological sections focusing on anatomical structures was used to validate the method in comparison to expert segmentation. Original information concerning global amyloid load have been derived from 6 mouse brains which opens new perspectives for the extensive analysis of such a data in 3-D and the possibility to integrate in vivo-post mortem information for diagnosis purposes.
1
Introduction
Alzheimer disease (AD) is a progressive neurodegenerative disorder that affects a large proportion of the ederly population [1]. One of the characteristic of histological sections from the brain of patients who had the disease is the presence of many amyloid plaques. These lesions are extracellular deposits of an amyloid protein. They can be detected after histological processing such as tissue staining by the Congo Red method. Each plaque measures approximately from 20 to 200 μm. Transgenic mice presenting amyloid plaques (AP) are widely studied to improve our understanding of the pathophysiology of AD but also to investigate new therapeutics. The automated detection of amyloid plaques on large dataset of histological sections remains mainly manual or semiautomatic and based on basic image processing methods using histogram or color image analysis [2]. More recently, an approach using statistical model based on prior operator expertise has been proposed [3]. Nevertheless, most of these approaches remain tedious and time consuming (manual delineation, threshold adjustement). N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 960–968, 2007. c Springer-Verlag Berlin Heidelberg 2007
Fully Automated and Adaptive Detection of AP in Stained Brain Sections
961
In this paper, we propose a fully automated method to detect amyloid plaques in Congo Red stained sections of transgenic mouse brains. The first step consisted in the separation of the tissues from the background based on an expectation maximization (EM) algorithm followed by the extraction of the seeds corresponding to the plaques using global/local adaptive thresholds. The second step exploited the seed and the corresponding region previously extracted to fit a mathematical model estimating iteratively a plane with uniform variance for the tissues and a gaussian surface for the amyloid plaques. The results were then filtered using an a priori area and shape constraints. The processes were implemented in 2-D and could be extented to series of stacked sections (3-D volume). Several informative parameters such as amyloid load (in the section as well as in the whole brain), plaque number, area, distribution and R, G, B intensities could be robustly and reproducibly assessed with the proposed method.
2 2.1
Materials ans Methods Biological Materials
During our studies, image analysis were performed on histological sections from hemibrains of APP/PS1 and APP/PS1KI mouse models of AD. The brains were cut into 40 microns-thick coronal sections on a freezing microtome. Brain sections were stained by the Congo Red method. All sections were then digitized using a Nikon Coolscan 4000 ED scanner as RGB images with a 4000 dpi inplane digitization resolution (pixel size 6.35e-3 x 6.35e-3 mm2 ) generating large images (1300 ∗ 1300 pixels)(Fig.1a). Image analysis was conducted during two successive experiments. First, the method was tested in selected brain regions (hippocampus and frontal part of the brain) from 5 APP/PS1 and 5 APP/PS1KI mice. The APP/PS1 mice displayed large amyloid plaques while plaques were smaller in APP/PS1KI animals (Fig.1c). This experiment was conducted to evaluate the robustness of automatic detection under a variety of histological contexts. In a second analysis, plaque detection was evaluated from histological sections that covered the whole brain of 6 APP/PS1 animals. This allowed to extend the study to different brain regions. For this experiment, one histological brain section out of twelve was extracted so as to finally keep 12-14 sections for each brain with an associated inter-slices thickness of 0.48 mm. 2.2
Overall Protocol
From RGB images, G color component was extracted and corresponding gray scales were inverted (Fig.1b). This channel is predominantly used in histological image processing due to the lower level of noise and its high natural contrast [4]. A binary mask of the brain section was computed with EM procedure [5] which estimated a threshold value under hypothesis that intensities of the background and the tissue can be assimilated to gaussian distributions. The definition of seeds corresponding roughly to amyloid plaques (AP) was performed in a 2-step scheme. To discriminate plaques from tissue, we assumed
962
A. Feki et al.
a)
b)
c)
Fig. 1. a) Coronal section stained using Congo Red, b) inverted green color component and c) large/small amyloid plaques respectively in APP/PS1 (top) and APP/PS1KI (bottom) hippocampus
that tissue was the majority class in the section. For a region of interest defined (mask and corresponding image), mean m and standard deviation σ values were calculated and pixels presenting a value over T = m + σ were automatically outlined and labelled as seeds. To select the optimal contrast between plaques and tissue stainings, this approach was applied twice: 1) on the global masked section to generate a rough estimation of the seeds which was used to create a first generalized Voronoi’s partition of the image domain [6,7] (calculation based on a distance criterion to attribute pixels to the nearest seed) and 2) on each Voronoi’s partition (previously obtained in 1) ) to refine both second seeds estimation and consecutively Voronoi’s partitions. We proposed a mathematical modelling of AP and brain tissue, and used an interative EM-like approach to separate these to classes. 2.3
Mathematical Model
The divide-and-conquer strategy chosen lead to an easier segmentation problem: separating a plaque (P) from the tissue (T ). As a consequence, we introduce a model for each class (tissue, AP). The classification algorithm works in two steps: estimate the parameters of the models and classify the pixels until convergence. Plaque Model. We are looking for an amyloid plaque, which is actually a load of amyloids. As the Congo Red stains these proteins, the intensity of Congo Red is proportional to the density of amyloid. Thus, we consider the intensity of Congo Red in a pixel as a measure of the amyloid load in this pixel’s area. Moreover, we suppose that in a plaque, the proteins are distributed according to a spatial gaussian law around the center of the plaque:
Fully Automated and Adaptive Detection of AP in Stained Brain Sections
p(X|P) =
1 1 exp(− (X − μp )T Σp−1 (X − μp )) 1/2 2 2πdet(Σp )
963
(1)
where X = (x, y)T This meant that the location X of a amyloid protein is a random variable whose probability is p(X). Besides, we assume that the observed intensity I in that location X matches with I observations of that random variable. However, we don’t observe only amyloid protein stains with Congo Red but also the brain tissues which have to be taken into account. Tissue Model. We assume that the brain intensity follows a gaussian law: p(I|T ) = √
(I − μb )2 1 exp(− ) 2σb2 2πσb
(2)
This meant that the probability of observing an intensity I in the tissue is independant of the location, and is equal to p(I). The main difficulty of this formulation is that the observations don’t belong to the same spaces: in the plaque model, the observations are the locations of proteins, and in the tissue model, the observations are the pixels intensities. As a consequence, we use an heuristic to classify the pixels between the two classes, based upon the posterior probabilities of the pixels to belong to these two classes. The algorithm works in two steps: estimate and classify. Initialization. Let τn be the ownership of a pixel n to the plaque: τ = 1 if the pixel is in the plaque and 0 otherwise. We intialize τ to 1 in the vornoi seed, and 0 elsewhere. Parameters estimation. The estimation of the parameters is based upon the maximum likelihood of the two classes. (1 − τn )In (1 − τn )(In − μb )2 2 n σb = n (3) μb = n (1 − τn ) n (1 − τn ) τn In (Xn − μp )(Xn − μp )T n τn In Xn μp = Σp = n (4) n In τn n In τn Classification. In order to classify a pixel, we compute its posterior probability (score S) in the two classes using (1, 2) and the prior probability of each class. We label the pixel by comparing the two scores. For instance, if a pixel have an intensity close to the tissue intensity, the score for the tissue class is high (same intensity) whereas the score for the plaque class is low (because it is far from the plaque center). Sp (vi ) = πp(Xi |P) St (vi ) = (1 − π)p(Ii |T )
(5) (6)
where Sp (vi ) and St (vi ) are the scores of the voxel vi with respect to both classes, and π is the prior probability of the AP class (It is equal to the number of pixels with τ =1 in the previous iteration, divided by the total number of pixels).
964
A. Feki et al.
2.4
Data Analysis
The amyloid load which was the ratio between the quantity of amyloid plaques and the tissue was the main information to be calculated to characterize ADrelated pathology. Due to the processing on 2-D data, we assumed that the ratio of the surfaces was proportional to the ratio of the volume enabling us to assess amyloid load for both individual sections (validation part) and series of sections (whole brain) according to the principle of Delesse validated in stereology. In addition, shape, position features and intensity of each individual plaque were also computed. In a first analysis, the proposed methodology was applied to a subset of 20 images focusing on two anatomical structures manually delineated by an expert: the hippocampus (n=10) and the frontal region (n=10). The computation of the amyloid load of reference for each region was performed by a biologist expert using Visilog software and a dedicated image processing protocol (R, G, B color component adjustment, global automated threshold based on entropy criterion and morphometric filtering according to Feret’s diameter). A correlation analysis was performed to test the linear relationship between the measurements of the two methods. The second part of the work consisted in analysing 6 mouse brains (representing 77 sections) in order to derive global parameters and to perform an inter-subject comparison. The AP and tissue areas were automatically calculated for each section and brain accross antero-posterior axis. In the end, visual inspection of the results based on superimposed image of the original section and its segmentation image as well as a volume rendering of the plaques in 3-D were performed.
3
Results
Figure 2 describes the sequential results obtained at different step of the AP segmentation in one section (Fig.2a). Following the automated extraction of the section performed on the inverted green color component, holes corresponding to the ventricles and vessels can be observed (Fig.2b). The definition of subregions including a single seed was performed in a 2-steps process: the first one provided a rough estimation of the seeds which generated several large Voronoi’s partitions coded with a Rainbow colormap (Fig.2c) while the second step produced an oversegmentation leading to supplementary seeds detected as well as small and numerous Voronoi’s partitions (Fig.2d). This information was then treated with the mathematical model proposed in order to filter real AP from artefacts (border, dust, vessels, stria) and to assess quantitative parameters (location, area, intensity). We proposed a pseudo 3D representation of the detected AP as spheres (presenting a size and a color texture relative to their mean radius and facilitating the visualization of the detection on several sections) which were superimposed on the mask of the section (Fig.2e). Masked
Fully Automated and Adaptive Detection of AP in Stained Brain Sections
a)
c)
e)
b)
d)
f)
965
Fig. 2. a) Coronal section stained using Congo Red, b) inverted and masked green color component, c) first and d) second seeds estimation and Voronoi computation. e) Final AP extracted with the automated method and f) image fusion of the original section with the detected AP (bright spots).
plaques were also superimposed to the original section to visually appreciate the reliability of the method (Fig.2f). In order to validate it, this methodology was applied to two selected brain regions from mice with large and small amyloid plaques (n=5 for each group). Amyloid load values were estimated in the targeted anatomical regions providing close intervals: [0.75; 5.03] for the reference method and [1.82; 5.63] for the automated method (Table 1). Expert and automated methods had a highly and significant correlation: R = 0.96, p < 0.0001. The algorithm was implemented in C++ and the computation was performed on a Linux Workstation (processor Intel Xeon 3.2GHz) with 2Go RAM and required two minutes to process a 1300 ∗ 1300 pixels image and half an hour for a brain constituted of 15 sections. A few hours were sufficient to process the 6 brains of the study. Curves representing plaques and brain tissue areas for each section according to the antero-posterior axis are presented in figure 3a.
966
A. Feki et al.
Table 1. Results of amyloid load computed on hippocampus and frontal regions 1
2
3
4
5
6
7
8
9
10
Frontal (expert) 2.2
1.6
1.2
1.2
1.4
3.8
4.1
5.0
4.1
4.0
Frontal (method) 2.7
2.1
1.9
2.1
2.0
5.6
4.5
5.4
4.6
5.3
Hippoc. (expert) 1.6
0.9
1.3
0.8
1.4
3.6
3.5
4.8
4.3
4.4
Hippoc. (method) 2.9
1.8
2.6
2.1
2.7
4.4
4.3
5.1
5.0
5.0
Consistent values were obtained for the 6 brains for both brain tissue and AP classes. The increase of the areas observed for brain underlines the variability of section dimensions accross the whole brain. Moreover, AP constituted from 2.5% up to 6% of the global area of the section which corroborated the assumption in our modelling that tissue was the predominant class in the histological section. The histogram of figure 3b displays the number of AP detected according to their areas for each brain. Despite the variation of the number of sections in the different brains (12, 13 or 14 sections) the resulting information is characteristic and consistent with the literature and underlines the predominance of small AP over big AP in AD.
Tissue
AP
Fig. 3. a) Curves representing brain/AP areas calculated according to antero-posterior sections in 6 mouse brains. b) Histogram of the number of AP detected according to their area.
A 3-D surface rendering view representing AP as spheres for the sections of a single brain is presented in figure 4 to highlight the ability of the method to deal with 3-D data. To facilitate the vizualization, inter-section space was artificially increased to 2mm. It is interesting to notice that AP were mainly located in the cortical areas but also in internal structures like thalamus and hippocampus (section 10, Fig.4).
Fully Automated and Adaptive Detection of AP in Stained Brain Sections
967
POSTERIOR
ANTERIOR
1
2
3
4
5
6
7
8
9
10
11
12
13
Fig. 4. 3-D surface rendering of the AP detected in a whole brain of transgenic mouse
4
Discussion and Conclusion
This paper presents an original method to detect AP in stained brain sections of Alzheimer transgenic mice. All the steps which classically require operator intervention have been automated (removal of the background, definition of the thresholds) and processing duration is compatible with biological studies. The adaptive extraction of the seeds and their corresponding Voronoi’s partition allowed the detection of small plaques which are often missed with global threshold approaches. The oversegmentation of the seeds is well-adapted to the mathematical modelling we proposed because in one hand it provides a valuable initialization and on the other hand the estimation and classification processes are able to deal with false seeds detected as well as twin AP. A few cases of non-detection were visually noticed. They corresponded to major agregates in which individual AP could not be detected or to small Voronoi’s cells in which the two classes expected were not present. Futher works can resolve these limitations but will probably not modify our preliminary results. The proposed model presents the advantages to be easy to implement, to converge rapidly (less than 10 iterations in average were necessary for each Voronoi’s cell), and to present a high correlation with the reference segmentation performed by an expert. Superior values of amyloid load obtained by the method compared to the expert segmentation was observed in table 1 and can be attributed to the ability of our method to detect the diffuse region of the AP and to take into account partial volume effect which is not the case with the expert method (threshold based on entropy criterion). Moreover, the differences noticed between large/small amyloid plaques can be imputed to the major differences in the images (Fig.1c). The use of our methodoly could improve the reproducibility of the segmentation and allow to compare studies performed in different laboratories which constitutes a significative overhang. The field of appplication of our method is large and includes specific analyses on delineated anatomical region, brain sections or whole brain. Such a tool enables group studies and makes possible the extraction of
968
A. Feki et al.
a huge amount of information from large dataset improving their analysis and comparison. Perspectives of this work will be the use of anatomical atlases [8] to treat massively large dataset and the possibility to integrate complementary information acquired post mortem (autoradiography, histology)[9,10] or in vivo (MRI high resolution)[11] in 3 dimensions.
Acknowledgements The two first authors contributed equally to this work. We would like to thank Nikos Paragios for his help, his advice and his availability, and Xavier Pennec for his medical imaging advices. The authors would also like to thank the Sanofi-Aventis neurodegenerative group for the generous gift of the animals involved in this study.
References 1. Cummings, B.J., Cotman, C.W.: Image analysis of beta-amyloid load in alzheimer’s disease and relation to dementia severity. Lancet 346, 1524–1528 (1995) 2. Defigueiredo, R.J., Cummings, B.J., Mundkur, P.Y., Cotman, C.W.: Color image analysis in neuroanatomical research: application to senile plaque subtype quantification in alzheimer’s disease. Neurobiol Aging 16, 211–223 (1995) 3. Chubb, C., Inagaki, Y., Sheu, P., Cummings, B., Wasserman, A., Head, E., Cotman, C.: Biovision: an application for the automated image analysis of histological sections. Neurobiol. Aging 27, 1462–1476 (2006) 4. Annese, J., Sforza, D.M., Dubach, M., Bowden, D., Toga, A.W.: Postmortem highresolution 3-dimensional imaging of the primate brain: blockface imaging of perfusion stained tissue. Neuroimage 30, 61–69 (2006) 5. Carson, C., Belongie, S., Greenspan, H., Malik, J.: Blobworld: Image segmentation using expectation-maximization and its application to image querying. IEEE PAMI 24, 1026–1038 (2002) 6. Duyckaerts, C., Godefroy, G.: Voronoi tessellation to study the numerical density and the spatial distribution of neurones. J. Chem. Neuroanat. 20, 83–92 (2000) 7. Prodanov, D., Nagelkerke, N., Marani, E.: Spatial clustering analysis in neuroanatomy: applications of different approaches to motor nerve fiber distribution. J. Neurosci. Methods 160, 93–108 (2007) 8. Chan, E., Kovacevic, N., Ho, S.K.Y., Henkelman, R.M., Henderson, J.T.: Development of a high resolution three-dimensional surgical atlas of the murine head for strains 129s1/svimj and c57bl/6j using magnetic resonance imaging and microcomputed tomography. Neuroscience 144, 604–615 (2007) 9. Ourselin, S., Roche, A., Subsol, G., Pennec, X., Ayache, N.: Reconstructing a 3d structure from serial histological sections. Image and Vision Computing 19, 25–31 (2001) 10. Dubois, A., Dauguet, J., Herard, A.S., Besret, L., Duchesnay, E., Frouin, V., Hantraye, P., Bonvento, G., Delzescaux, T.: Automated three-dimensional analysis of histologic and autoradiographic rat brain sections: application to an activation study. J. Cereb. Blood Flow Metab (in press, 2007) 11. Dhenain, M., Delatour, B., Walczak, C., Volk, A.: Passive staining: a novel ex vivo MRI protocol to detect amyloid deposits in mouse models of Alzheimer’s disease. Magn. Reson. Med. 55, 687–693 (2006)
Non-rigid Registration of Pre-procedural MR Images with Intra-procedural Unenhanced CT Images for Improved Targeting of Tumors During Liver Radiofrequency Ablations N. Archip, S. Tatli, P. Morrison, F. Jolesz, S.K. Warfield, and S. Silverman
Abstract. In the United States, unenhanced CT is currently the most common imaging modality used to guide percutaneous biopsy and tumor ablation. The majority of liver tumors such as hepatocellular carcinomas are visible on contrast-enhanced CT or MRI obtained prior to the procedure. Yet, these tumors may not be visible or may have poor margin conspicuity on unenhanced CT images acquired during the procedure. Non-rigid registration has been used to align images accurately, even in the presence of organ motion. However, to date, it has not been used clinically for radiofrequency ablation (RFA), since it requires significant computational infrastructure and often these methods are not sufficient robust. We have already introduced a novel finite element based method (FEM) that is demonstrated to achieve good accuracy and robustness for the problem of brain shift in neurosurgery. In this current study, we adapt it to fuse pre-procedural MRI with intra-procedural CT of liver. We also compare its performance with conventional rigid registration and two non-rigid registration methods: b-spline and demons on 13 retrospective datasets from patients that underwent RFA at our institution. FEM non-rigid registration technique was significantly better than rigid (p<10-5), non-rigid b-spline (p<104) and demons (p<10-4) registration techniques. The results of our study indicate that this novel technology may be used to optimize placement of RF applicator during CT-guided ablations. Keywords: non-rigid registration, biomechanical model, b-splines, demons, radiofrequency ablation, targeting.
1 Introduction In the last decade, percutaneous image guided tumor ablations techniques such as radiofrequency (RF) ablation and cryoablation have become an alternative minimally invasive method to treat primary and metastatic liver malignancies in select groups of patients. In particular, radiofrequency ablation has emerged as effective and practical [1], particularly in tumors smaller than 3 cm in diameter. CT is the most common imaging modality used to guide biopsy and tumor ablation procedures. Typically, unenhanced CT images are obtained intermittently to plan the procedure, guide biopsy needle and ablation applicator placement, and to monitor the ablation. Contrast agent may be administered only once during CT-guided N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 969–977, 2007. © Springer-Verlag Berlin Heidelberg 2007
970
N. Archip et al.
intervention. The most of liver tumors are visible on contrast-enhanced CT or MRI obtained prior to the procedure. However, these tumors are either invisible or not demonstrated optimally on unenhanced CT images obtained during the procedure. This may increase the procedure time, and/or lead to non-diagnostic cytopathologic assessment, requiring repeat biopsy or sub-optimal ablation applicator placement. Particularly in percutaneous ablations, accurate applicator placement has a direct impact on treatment outcome since the location of the applicator has a certain effect range and suboptimal applicator placement ultimately results in an inadequate ablation (Figure 1).
(a)
(b)
(c)
Fig. 1. Current RFA procedure. Patient with hepatocellular carcinoma of the left hepatic lobe. The patient was a poor surgical candidate due to other co-morbidities and underwent RFA under CT guidance. (a) The tumor margin can be clearly seen on the pre-procedural MRI. (b) The intra-procedural CT does not have information about the tumor margins. The RFA electrode was placed by estimating the location of the tumor by using other anatomical landmarks. (c) Aligned pre-procedural MR and intra-procedural CT shows that RFA applicator is not placed accurately. The tumor margin was drawn based on the information provided by pre-procedural MRI.
Image registration has been extensively addressed in the literature. Non-rigid registration is often required in practice. A survey on elastic registration methods for medical images with emphasis on landmark-based schemes has been presented [2]. Several clinical studies have demonstrated the feasibility of rigid registration for prostate and pelvic MR volumes [3], for RF liver ablation based on intra-procedural MRI scanner [4]. Pre-procedural MRI is matched with intra-procedural ultrasound [5] under the assumption of rigid body transformation. A similar approach has been presented [6]. Liver motion and deformation based on gated MR images are modeled [7] based on both rigid and non-rigid registration algorithms. However, execution time is the main limitation for this approach. Multimodality interventions are feasible by allowing the real-time updated display of previously acquired functional or morphologic imaging during angiography, biopsy, and ablation. However, this entails the use of an electromagnetic tracking device in the interventional room, while the existing instruments need modifications [8].
Non-rigid Registration of Pre-procedural MR Images
971
The utility of enhanced visualization during image guided therapy procedures is clearly demonstrated in the literature. To date, commercial systems can only rigidly align the pre-procedural, high resolution imaging data with the intra-procedural imaging data. However, rigid alignment can result in errors as high as 20 mm. Yet, there has been no fully volumetric, non-rigid registration of liver deformations demonstrated in a clinical environment during interventional procedures. In this study we assessed retrospectively three different methods for non-rigid registration between pre-procedural MRI and intra-procedural unenhanced liver CT. One was adapted from our previous work on the problem of brain shift. The other two methods are available in open source as part of the ITK (www.itk.org). Our goal is to establish the feasibility of non-rigid registration for CT guided RF ablation. We measure the accuracy of all three registration techniques, their robustness, and execution time.
2 Material and Methods 2.1 Materials The study was conducted on 13 retrospective datasets of patients who underwent CTguided RF ablation of liver tumors (8 metastases and 5 hepatocellular carcinomas Table 1). For all the patients, the images used are: (1) pre-procedural MR images obtained with a 1.5T MRI (Sigma LX, GE Medical Systems, Milwaukee, WI), matrix=256x256, 1.36 X 1.36 X 2.5 mm. (2) Intra-procedural: Imaging was performed using interventional CT (SOMATOM Plus 4, Siemens Medical Solutions, Erlangen, Germany), matrix=512x512 and voxel size 0.61 X 0.61 X 2.5 mm. 2.2 Methods Rigid and non-rigid registration In an initial step, a rigid registration between pre-procedural MRI and intra-procedural CT was performed, based on an ITK implementation of the mutual information. Three non-rigid registration methods were used. One, introduced by our group [9], and two other standard techniques, available as open-source in ITK: b-splines and demons. In the following sub-section, information about each of these non-rigid registration techniques is presented, together with validation strategy used. 2.2.1 Finite Element Method (FEM) Based The algorithm we propose is extensively validated on brain MR images. Details are presented in [10] and a brief overview is presented in this section. Recently, this algorithm has been validated on patients enrolled prospectively for image guided neurosurgery [11]. The images used in our study are acquired at the same respiration cycle, therefore the liver deformations are relatively small (up to 20 mm). The linear elastic model previously used [10] can be consequently easily employed for the registration of the liver.
972
N. Archip et al.
The algorithm can be decomposed into three main parts. • The first part, in the pre-procedural phase, consists in building the patientspecific liver model utilizing pre-procedural MRI. • The second part, during the CT guided procedure, is the block matching computation for selected blocks which estimates a set of displacements across the volume. • The third part, during the procedure, is an iterative hybrid solver that estimates the 3D volumetric deformation field. Table 1. Details on the RFA patients enrolled in our retrospective study
Sex
Age
Pathology
Location (liver segment) 7
Case 1
M
63
Colon metastasis
Case 2
F
60
Breast metastasis
7
Case 3
F
55
Ovary metastasis
8
Case 4
F
64
HCC
6
Case 5
F
46
3
Case 6
M
49
Metastasis from unknown primary Colorectal metastasis
4a
Case 7
F
53
HCC
4a
Case 8
M
60
HCC
6
Case 9
F
75
HCC
4a
Case 10
F
68
colorectal
5
Case11
M
50
Colorectal metastasis
4a
Case12
M
58
GIST
5
Case13
M
60
HCC
2
A key aspect of the deformation estimation is our formulation of the displacement estimation as a continuum between approximation and interpolation strategies. Interpolation strategies ensure that the estimated field exactly matches the displacement identified for each block, but because such matches are noisy, a pure interpolation is prone to error. An approximation strategy does not require the estimated displacement fit exactly each block displacement, and so is better able to reject noise than an interpolation scheme, but is also guaranteed to never exactly recover an estimated displacement which is an undesirable property. In our novel formulation, we first carry out an approximation solution, we then compare the block
Non-rigid Registration of Pre-procedural MR Images
973
displacement with the approximate solution and rank order the blocks according to the magnitude of the difference. We reject outlier blocks that are likely noisy matches by removing the blocks with the largest error. We then re-estimate the displacement utilizing a more stringent approximation criterion, and repeat the procedure. As we increase the strength of the requirement that the approximation solution match the block displacements, we shift from approximating the displacement field to interpolating the displacement field. This makes it possible for us to match exactly the true deformation, which cannot occur with an approximating solution. Furthermore, since we are iteratively rejecting blocks with large displacement error, we are rejecting noisy matches and so we gain robustness to noise and spurious matches that a pure interpolating solution cannot have. High performance computational architecture Non-rigid registration algorithms are typically computationally expensive and often proven to be impractical for solving clinical problems. We employed a cluster of computers to achieve near-real time performance. Our implementation addresses three aspects: (1) load balancing, (2) fault-tolerance and (3) ease-of-use for parallel and distributed registration procedures. With dynamic load balancing we improved by 50% the performance of the most computational intensive part, parallel block matching. Our 2-level fault-tolerance introduced a moderate 6% overhead due to additional communication. With web-services and by hiding pre-processing overheads, we developed faster and easier to use remotely registration procedure. Details about the novel technology can be found in [12]. 2.2.2 B-Spline Deformable Registration An ITK-based implementation of the free-form deformation algorithm [13] is used. After affine initialization of the transformation, a displacement field modeled as a linear combination of B-spline is estimated by maximization of the mutual information between the images to be registered. A regular grid of uniformly distributed control points and a gradient descent optimizer were used. A coarse-to-fine pyramidal based approach was employed. At each pyramidal level, both the resolution of the images and the number of control points in each dimension were doubled. 2.2.3 Demons Deformable Registration An ITK-based implementation of multi-resolution intensity-based algorithm [14] based on the concept of optical flow is used. The image alignment is approached as a diffusion process. The object boundaries in the reference image are viewed as semipermeable membranes. The moving image is considered as a deformable grid, and diffuses through these interfaces driven by the action of effectors, called demons, situated within the membranes. The smoothness of the displacement field is controlled by filtering at each iteration with a Gaussian function of standard deviation. 2.2.4 Validation of Non-rigid Registration Algorithms We employ two standard methods for the validation of registration algorithms. The first (i) is an overlap invariant entropy measure of 3D medical image alignment.
974
N. Archip et al.
The second (ii) is based on an algorithm for extraction of edges of anatomical landmarks. We describe briefly the two validation methods in the following three sub-sections. (i) Normalized Mutual information (NMI) NMI is an entropy based measure of alignment between different 3D medical modalities. In practice, direct quantitative measures of information derived from the overlap of a pair of images are affected by local image statistics. In order to provide invariance to overlap statistics, normalized entropy based measure of registration is proposed, which is simply the ratio of the sum of the marginal entropies and the joint entropy. The method was proposed in [15], and used among others by [16]. (ii) Edge distance based We employ a Canny edge detector to extract edges of the liver and its internal anatomical structures from the CT and MR images. The edges are discretized and represented as a set of points. The 95% Hausdorff distance is measured between the points on the edges extracted from the two images (pre- and intra-procedural). Ideally, when there are no errors in registration present this distance should be 0. The 95% Haussdorff ensures that the outliers are rejected. The method is used in [11].
3 Results The rigid and non-rigid registration algorithms were successfully applied for all 13 retrospective datasets. The mean execution times were 1 minute for rigid, 10 minutes for b-spline, 6 minutes for demons, and 5 minutes for biomechanical technique. Overall, MI calculations were 0.13 for rigid registration, 0.25 for b-spline, 0.18 for demons, and 0.44 for biomechanical non-rigid registration techniques. The mean distance between the edges of anatomical landmarks of the liver were 12.2 mm for the rigid, 2.4 mm for b-spline, 3.0 mm for demons and 1.64 mm for biomechanical nonrigid registration methods.
(a)
(b)
Fig. 2. The accuracy results for our RFA retrospective study. (a) Mutual information (0-min, 1max). (b) distance between edges.
Non-rigid Registration of Pre-procedural MR Images
(a)
(b)
975
(c)
Fig. 3. Axial images of the same patient as Figure 1. Contrast enhanced pre- MRI overlaid on intra- CT by non-rigid registration (a) and (b). The position of the RFA electrode with respect to the tumor margins can be evaluated in (c).
FEM non-rigid registration technique was significantly better than rigid (p<10-5), non-rigid b-spline (p<10-4) and demons (p<10-4) registration techniques. Details on the accuracy comparison between the three techniques are presented in Figure 2 and Table 2. Examples of registered images with are presented in the Figure 3. Table 2. Registration results between the pre-procedural MRI and intra-procedural CT images for retrospective data
Non-Rigid registration between preprocedural MRI and intra-procedural CT
Rigid registration between preprocedural MRI and intraprocedural CT Error MI (mm)
Error (mm)
MI
Error (mm)
MI
Error (mm)
MI
Case 1
9.7
0.24
2.4
0.27
2.9
0.25
1.9
0.51
Case 2
12.4
0.17
3.5
0.29
4.2
0.18
2.1
0.35
Case 3
13.8
0.08
2.2
0.27
2.8
0.14
1.4
0.41
Case 4
8.8
0.20
2.3
0.27
3.1
0.25
1.7
0.57
Case 5
9.5
0.07
2.1
0.34
3.2
0.35
1.3
0.60
Case 6
7.5
0.12
1.9
0.20
3.4
0.14
0.7
0.40
Case 7
9.2
0.07
2.5
0.16
2.7
0.09
2.1
0.48
B-spline
Demons
Biomecha nical
976
N. Archip et al. Table 2. (continued)
Case 8
15.1
0.21
2.7
0.28
2.8
0.20
1.9
0.45
Case 9
10.4
0.05
2.6
0.13
2.9
0.06
2.0
0.25
Case10
15.4
0.18
2.3
0.26
3.2
0.17
2.1
0.31
Case11 Case12
14.6 9.2
0.14 0.15
1.9 2.1
0.27 0.19
2.8 2.7
0.22 0.14
2.1 1.2
0.38 0.42
Case13
22.5
0.10
2.1
0.40
2.9
0.23
0.9
0.61
Avg.
12.16
0.13
2.35
0.25
3.04
0.18
1.64
0.44
4 Conclusions Our study demonstrates that non-rigid registration techniques can be used to register pre-procedural contrast-enhanced MR images with intra-procedural unenhanced CT scans of liver within the time constrains imposed by the RFA procedure and with accuracy found satisfactory by radiologists. Our novel FEM non-rigid registration technique substantially (7 times) improved the accuracy of currently used rigid registration. Our new algorithms is running on a HPC system. Therefore, significant computational resources are required. Nevertheless, it is demonstrated to be fast enough for clinical application, and, achieves better accuracy than the b-spline and demons, for the images used in our study. For clinical trials, further validation studies are necessary, to assess the accuracy of registration algorithms in the vicinity of tumors. Phantom experiments will be conducted, together with correlation between our prediction outcomes with pathological reports. The results of our study indicate that this novel technology may be used to optimize placement of applicators during CT-guided RF ablations. Acknowledgments. This investigation was supported in part by NSF ITR 0426558, and by NIH grants R03 EB006515, U41 RR019703, P01 CA067165, R01 021885.
References 1. Silverman, S.G., Tuncali, K., Morrison, P.: MR Imaging-guided percutaneous tumor ablation(1). Acad. Radiol. 12(9), 1100–1109 (2005) 2. Rohr, K.: Elastic Registration of Multimodal Medical Images: A Survey 14(3) (2000) 3. Fei, B., Duerk, J.L., Boll, D.T., Lewin, J.S., Wilson, D.: Slice-to-volume registration and its potential application to interventional MRI-guided radio-frequency thermal ablation of prostate cancer. IEEE Trans. Med. Imaging 22(4), 515–525 (2003) 4. Carrillo, A., Duerk, J.L., Lewin, J.S., Wilson, D.L.: Semiautomatic 3-D image registration as applied to interventional MRI liver cancer treatment. IEEE TMI 19(3) (2000)
Non-rigid Registration of Pre-procedural MR Images
977
5. Penney, G.P., et al.: Registration of freehand 3D ultrasound and magnetic resonance liver images. Med. Image. Anal. 8(1), 81–91 (2004) 6. Bao, P., Warmath, J., Galloway, R., Herline Jr, A.: Ultrasound-to-computer-tomography registration for image-guided laparoscopic liver surgery. Surg. Endosc. 19(3), 424–429 (2005) 7. Rohlfing, T., Maurer Jr, C.R., O’Dell, W.G., Zhong, J.: Modeling liver motion and deformation during the respiratory cycle using intensity-based nonrigid registration of gated MR images. Med. Phys. 31(3), 427–432 (2004) 8. Banovac, F., Wilson, E., Zhang, H., Cleary, K.: Needle biopsy of anatomically unfavorable liver lesions with an electromagnetic navigation assist device in a computed tomography environment. J. Vasc. Interv. Radiol. 17(10) (2006) 9. Clatz, O., Delingette, H., Talos, I.F., Golby, A.J., Kikinis, R., Jolesz, F.A., Ayache, N., Warfield, S.K.: Robust nonrigid registration to capture brain shift from intraoperative MRI. IEEE Trans. Med. Imaging 24(11), 1417–1427 (2005) 10. Archip, N., Clatz, O., Whalen, S., Kacher, D., Fedorov, A., Kot, A., Chrisochoides, N., Jolesz, F., Golby, A., Black, P.M., Warfield, S.: Non-rigid alignment of pre-operative MRI, fMRI, and DT-MRI with intra-operative MRI for enhanced visualization and navigation in image-guided neurosurgery. Neuroimage (2007) 11. Chrisochoides, N., Fedorov, A., Kot, A., Archip, N., Black, P., Clatz, O., Golby, A., Kikinis, R., Warfield, S.K.: Toward Real-Time, Image Guided Neurosurgery Using Distributed and Grid Computing. In: Löwe, W., Südholt, M. (eds.) SC 2006. LNCS, vol. 4089, Springer, Heidelberg (2006) 12. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L., Leach, M.O., Hawkes, D.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans Med Imaging 18(8), 712–721 (1999) 13. Thirion, J.: Image matching as a diffusion process: an analogy with Maxwell’s demons. Med. Image. Anal. 2(3), 243–260 (1998) 14. Studholme, C., Hill, D.L.G., Hawkes, D.J.: An overelap invariant entropy measure of 3D medical image alignment. Pattern Recognition 32(1) (1999) 15. Soza, G., Grosso, R., Nimsky, C., Hastreiter, P., Fahlbusch, R., Greiner, G.: Determination of the Elasticity Parameters of Brain Tissue with Combined Simulation and Registration. Int. J. Medical Robotics and Computer Assisted Surgery (1, Nr. 3)
Author Index
Abe, Nobuhiro I-994 Abolmaesumi, Purang II-628, II-760, II-900, II-943 Aboussouan, Eric I-144 Abr` amoff, Michael D. I-244 Abugharbieh, Rafeef I-767, II-444 Aburaya, Naoki II-68 Acar, Burak II-153 Ackermans, Linda I-584 Ahmadian, Alireza I-475 Aizenstein, Howard J. II-826 Aja-Fern´ andez, Santiago I-161, II-368 Akkerman, Erik I-169 Akselrod-Ballin, Ayelet II-118 Alberola-L´ opez, Carlos I-319, II-368 Ali, Asem M. I-384 Aljabar, Paul I-523, I-532 Allard, J´er´emie I-557 Allsop, Joanna I-18 Amunts, Katrin II-760 An, Jungha II-495 Anderson, Chivon II-303 Andersson, Mats II-385 Angelini, Elsa I-626 Anjari, Mustafa I-18 Anstey, Kaarin I-375 Arakawa, Tetsuo I-775 Arbel, Tal I-925 Archip, N. II-969 Armour, E. II-9 Arridge, S. I-575 Asahi, Takeshi II-219 Atallah, Louis II-270 Avants, Brian I-359, II-303 Awate, Suyash P. I-294, I-883 Axel, Leon I-302, I-800, II-469 Ayache, Nicholas I-252, I-549, I-675, I-959, II-77, II-319 Bachta, Wael I-78 Badier, Jean-Michel II-793 Baegert, Claire II-676 Bai, Bing II-742 Bailey, Lara II-760
Bajka, M. I-717 Baloch, Sajjad II-393 Bardinet, Eric I-875 Barillot, Christian II-344, II-891 Barmpoutis, Angelos I-908 Barnes, Nick I-375 Barratt, D.C. I-634 Basri, Ronen II-118 Bauer, Martin II-652 Baumann, Michael II-26 Baur, Charles II-620 Bazin, Pierre-Louis I-94 Beaudoin, Gilles I-144 Becker, James T. II-826 Beichel, Reinhard II-511 Bender, Frederik II-527 Bennink, H.E. II-436 Benois-Pineau, J. II-411 Berger, M.-O. I-102 Bergmans, Ren´e C. I-601 Beymer, D. I-261 Bhalerao, Abhir I-236 Bhatia, Kanwal K. I-532, II-544 Bhotka, Rahul II-420 Bichlmeier, Christoph I-434 Biederer, J¨ urgen I-817 Biros, George I-642, I-950 Bischof, Horst I-968, II-460, II-511 Black, Peter I-491 Blevins, J. II-9 Bliddal, Henning II-261 Bloch, Isabelle I-626 Blum, T. I-102 Blume, Moritz I-743, II-718 Boardman, James P. I-532 Bock, Davi II-710 Bodensteiner, C. I-177 Boesen, Mikael II-261 Bondiau, Pierre-Yves I-549 Borga, Magnus I-210 Borisch, Eric I-809 Bosch, Johan G. I-52 Bossa, Matias I-667 Bossard, Lukas II-620
980
Author Index
Bouix, Sylvain I-161, I-850 Bourgeat, Pierrick I-228 Boyle, Roger D. II-261 Bozkaya, U˘ gur II-153 Bozon, Bruno II-960 Brady, J. Michael I-343 Brandt, Achi II-118 Bredno, J. I-634 Brillet, Pierre-Yves I-825 Brocardo, Roberta I-626 Brun, Caroline II-826 Brunenberg, Ellen J.L. I-584 Bucki, Marek II-219 B¨ ulow, Thomas II-195 Buonaccorsi, G.A. II-376 Burdette, C. II-9 Bystrov, Daniel I-601 Caan, Matthan I-169 Camara, Oscar I-900, II-785 Cardenas, Valerie II-311 Carneiro, Gustavo II-571 Casanova, Manuel F. II-882 Cathier, Pascal I-842 Cattin, Philippe C. I-659, II-620 Chalemond, Bernard II-685 Chambon, Sylvie I-626 Chandrashekara, R. I-335 Chang, Sukmoon I-751 Chang-Chien, Kuang-Che I-825 Chanu, Arnaud I-144 Chassat, Fabrice II-219 Chen, Jianghong II-352 Chen, Jiuhong I-977 Chen, Mei II-295 Chen, Ting II-469 Chen, Wenjin I-617, II-287 Chen, Xi II-596 Chen, Yen-Wei I-86 Cheng, Jack C.Y. II-818 Cheng, Mario I-701 Cherbuin, Nicolas I-375 Chinchapatnam, P.P. I-575 Chintalapani, Gouthami I-499, II-9, II-519 Chou, Yi-Yu II-826 Choyke, Peter I-128 Christiansen, Claus II-352 Chu, Winnie C.W. II-818 Chui, Yim-Pan II-842
Chung, Adrian J. II-102 Chung, Albert C.S. I-866, I-916 Chung, Jae-Hoon I-651 Chung, Sohae II-469 Chupin, Marie I-875 Cimmino, Marco A. II-261 Clatz, Olivier I-549 Clevert, Dirk-Andr´e I-136 Clif Burdette, E. I-119 Cointepas, Y. I-27 Colbert, C.M. II-486 Colliot, Olivier I-875 Comaniciu, Dorin II-571 Commowick, Olivier II-203 Conrad-Hansen, Lars A. II-352 Cook, Philip A. II-777 Corso, Jason J. I-985 Costa, Mar´ıa Jimena I-252 Cotin, St´ephane I-557, II-850 Counsell, Serena J. I-18, I-532 Coup´e, Pierrick II-344 Courtis, Patrick II-244 Crozier, Stuart II-186 Crum, William R. I-900, II-785 Csoma, Csaba II-59 Cusack, Rhodri II-760 D’Hoore, Andr´e I-467 Daanen, Vincent II-26 Daghighian, Farhad II-252, II-909 Dahlqvist, Olof I-210 Dalal, Pahal I-507 Darkner, Sune II-801 Darolti, C. I-177 Darvann, Tron A. II-452 Darzi, Ara II-110, II-270, II-660 Dauguet, Julien II-710 Davatzikos, Christos I-642, I-950 Davies, Brian II-604 de Bruijne, Marleen II-178, II-352 De Buck, Stijn I-467 de Luis-Garc´ıa, Rodrigo I-319 de Vries, Anne Willem I-169 Deguchi, Daisuke II-644 Deguet, A. II-9 Dehghan, Ehsan I-709 Delatour, Benoit II-960 Delingette, Herv´e I-252, I-549 DeLorenzo, Christine II-553 Delzescaux, Thierry II-960
Author Index den Harder, Chiel J. I-601 Deng, Xiang I-977 Denis de Senneville, B. II-411 Dequidt, J´er´emie II-850 Deriche, Rachid II-769 Desbarats, P. II-411 Desbrun, M. I-692 Descoteaux, Maxime II-769 Desh, Vladimir I-327 Desikan, Rahul I-683 Dhenain, Marc II-960 DiMaio, Simon P. I-425, II-50 Dodt, Hans-Ulrich II-718 Dojat, M. I-219 Dong, Xiao I-834 Donner, Ren´e I-968, II-460 Dorval, T. II-693 Dowsey, Andrew W. I-609 Dressel, Philipp II-18 Dries, Sebastian P.M. I-601, II-195 Dubois, Albertine II-960 Dubois, J. I-27 Duchesne, Simon II-891 Duda, Jeffrey T. I-359, II-777 Duncan, James S. I-44, II-553 Duncan, John S. I-875 Durrleman, Stanley I-675
Eagleson, Roy II-86 Ecabert, Olivier II-402 Echigo, Tomio I-775 Eckstein, I. I-692 Edwards, A. David I-532, II-127 Edwards, Philip I-68 Ehrhardt, Jan I-959 El-Baz, Ayman S. I-384, II-235, II-882 El-Ghar, Mohamed A. II-235 Elhawary, Haytham II-604 Ellingsen, Lotta M. I-499 Ellis, Randy E. II-628, II-935 Elter, Matthias II-360 Englander, Sarah I-933, II-393 Ernst, Floris II-668 Ersbøll, Bjarne K. II-452 Escobar Kvitting, John-Peder II-385 Eskildsen, Simon F. I-409 Essa, I. I-102 Etyngier, Patrick I-891
981
Fan, Ayres C. II-477 Fang, Tong II-612 Farag, Aly A. I-384 Farneb¨ ack, Gunnar I-210 Faure, Alexis II-960 Feki, Abdelmonem II-960 Feldman, Michael D. I-617 Felfoul, Ouajdi I-144 Fetita, Catalin I-825 Feuerstein, Marco I-458, II-252 Feussner, Hubertus I-102, I-458 Fichtinger, Gabor I-119, I-425, I-734, II-1, II-9, II-50, II-59, II-701 Fiene, Jonathan I-119 Figl, Michael I-68 Fischer, Gregory S. I-425, II-50 Fischl, Bruce I-683 Fisher III, John W. II-477 Fletcher, P. Thomas I-10 Foran, David J. I-617, II-287 Forbes, F. I-219 Ford, Eric II-926 Forder, John R. I-908 Forgione, Antonello I-78 Fox, Nick C. II-785 Frangi, Alejandro F. II-452 Freixenet, Jordi I-286 Friedrich, Klaus II-460 Fripp, Jurgen II-186 Fujiwara, Kazuo I-994 Funka-Lea, Gareth I-327 Galdames, Francisco II-219 Galun, Meirav II-118 Gangloff, Jacques I-78 Gao, G. I-575 Garbay, C. I-219 Garnero, Line I-875 Gatehouse, Peter D. I-393 Gee, James C. I-294, I-359, I-817, I-883, II-211, II-303, II-777 Genovesio, A. II-693 Georgescu, Bogdan II-571 Gerig, Guido I-10, II-144 Gimel’farb, Georgy II-235, II-882 Glaser, Christian II-536 Glocker, Ben II-536 Glossop, Neil I-128 Goksel, Orcun I-401 Golby, Alexandra J. II-161
982
Author Index
Golland, Polina I-110, I-683, II-751 Golland, Yulia I-110 Gomori, John Moshe II-118 Gonzalez Ballester, Miguel A. I-834 Good, Sara II-571 Goodlett, Casey I-10, II-144 Gopinath, Ajay I-942 Gowland, Penny I-759 Graumann, Rainer II-18 Gray, Owen II-926 Grenier, Philippe I-825 Grewer, R¨ udiger I-601 Grimbergen, Kees I-169 Groher, Martin II-527 Grossman, Murray II-303 Groth, Alexandra II-402 Guetter, Christoph I-725 Guillon, Jean-Pierre I-792 Guion, Peter I-128, II-59 Guiraudon, G´erard M. II-94 Guo, Ting I-483 H¨ aberling, Armin II-620 Haeker, Mona I-244 Haffty, Bruce G. I-751 Hager, Gregory D. II-1 Hajnal, Joseph V. I-18, I-523, I-532, II-127, II-544 Haker, Steven I-1 Hall, Bonnie II-287 H¨ am¨ al¨ ainen, Matti II-751 Hamarneh, Ghassan I-278, II-444, II-503, II-726 Hammers, Alexander I-523, I-875, II-544 Hancock, Edwin II-144 Handels, Heinz I-959 Hansg˚ ard, Jøger I-858 Hantraye, Philippe II-960 Harders, M. I-717, II-858 Hartl, Alexander II-909 Hartley, Richard I-375, I-792 Hasegawa, Yosihnori II-644 Hashizume, Makoto I-491, II-68 Hata, Nobuhiko I-491 Hautmann, Hubert I-475 Hawkes, David J. I-68, I-575, I-634, I-900 He, Qing II-352
Heckemann, R. I-523 Hedjazi Moghari, Mehdi II-628, II-943 Heese, Harald S. I-601 Heibel, Tim Hauke II-18 Heining, Sandro Michael I-434, II-18 Heng, Pheng Ann II-818, II-842 Hermann, Nuno V. II-452 Hernandez, Monica I-667 Highnam, Ralph P. I-651 Higuchi, Kazuhide I-775 Hill, Derek L.G. I-575, II-785 Hoffman, Eric A. I-842 Hoffmann, Ralf-Thorsten II-527 Hogea, Cosmina I-642 Hor, King-wei II-918 Hori, Masatoshi I-86 Hornegger, Joachim I-725 Horton, Ashley I-541 Hu, Mingxing I-68 Hu, Zhenghui II-734 Hua, Jing I-367 Huang, Junzhou I-302 Huang, Xiaolei I-302 Hufnagel, Heike I-959 Huntbatch, Andrew II-866 Iglesias, J. Eugenio II-178 Imaizumi, Kazuyoshi II-644 Iordachita, Iulian I. I-119, I-425, II-9, II-50, II-59, II-926 Iseki, Horoshi I-491 Ishitani, Kazuyoshi II-644 Ito, Masaaki II-336 Jabbour, Salma J. I-751 Jackson, A. II-376 Jain, Ameet K. I-734, II-9, II-701 James, A. II-110 Jayson, G. II-376 Jensen, Karl E. II-261 Jiang, Jiefeng II-136 Jiang, Shuzhou I-18, II-127 Jiang, Tianzi II-136 Jiang, Yifeng II-809 Johansson, Andreas I-210 John, Matthias I-60 Johnsrude, Ingrid II-760 Jolesz, Ferenc A. I-491, II-50, II-969
Author Index Jolly, Marie-Pierre Joshi, A.A. I-692
I-977
Kadoury, Samuel I-60 Kakadiaris, Ioannis A. I-202, II-486 Kamani, Allaudin II-918 Kambhamettu, Chandra I-933 Kanade, Takeo II-295 Kanagasingam, Yogesan I-792 Kantor, Paul II-742 Kardon, Randy I-244 Kawata, Shosaku I-994 Kazanzides, Peter I-119, II-926 Keegan, Jennifer I-609 Keenan, Niall II-834 Kennedy, Chris II-926 Kennedy, Christopher W. I-119 Keriven, Renaud I-891, II-793 Kessel, David I-566 Khamene, Ali I-136 Khedoe, Ganesh I-169 Kikinis, Ron I-491 Kim, Sung N. I-751 Kindlmann, Gordon I-1 King, A. I-575 Kiraly, Atilla P. I-784 Kitasaka, Takayuki II-336, II-644 Klein, Tassilo I-475 Klinder, Tobias II-195 Klinker, Gudrun II-652 Kneser, Reinhard II-402 Knutsson, Hans I-210, II-385 Kohlberger, Timo I-327 Komodakis, Nikos II-536 Kong, Koon II-612 Konishi, Kozo II-68 Konofagou, Elisa E. I-800 Konukoglu, Ender I-549 Koo, Min-Seong I-850 Kreiborg, Sven II-452 Krieger, Axel II-59 Kronreif, Gernot I-119 Kruecker, Jochen I-128 Krupa, Alexandre II-1 Kubassova, Olga I-593, II-261 Kubicki, Marek I-36 Kukuk, Markus II-636 Kuo, C.-C.J. I-692 Kupelian, Patrick I-626 Kutter, Oliver I-136
Labat, Christian II-701 Lamp´erth, Michael II-604 Lanche, Stephanie II-452 Langs, Georg I-968, II-460 Laporte, Catherine I-925 Laroche, Edouard I-78 Larsen, Per II-452 Larsen, Rasmus II-452, II-801 Lasser, Tobias II-252, II-909 Lauze, Fran¸cois II-178, II-352 Law, Max W.K. I-866 Le, Y. II-9 Leahy, R. I-692 Lee, Angela I-651 Lee, Su-Lin II-866 Lee, Wei-Ning I-800 Leff, Daniel Richard II-270 Leischner, Ulrich II-718 Lekadir, Karim II-834 Lemieux, Louis I-875 Lenoir, Julien II-850 Lepor´e, Natasha II-826 Lerotic, Mirna II-102 Lessoway, Vickie II-918 Leung, K.Y. Esther I-52 Leung, Kwok-Sui II-842 Levitt, James J. II-477 Li, Kang II-295 Li, Yong I-60 Liao, Rui I-60 Licht, Daniel J. I-883 Lieby, Paulette I-375 Likar, Boˇstjan I-450 Lin, Weili I-10 Linte, Cristian A. II-94 Little, Stephen H. II-94 Liu, Haihong II-136 Liu, Huafeng I-270 Liu, Qingshan I-751 Liu, Rebecca S.N. I-875 Liu, Xiaofeng I-734 Liu, Yong II-136 Liu, Zhening II-136 Llad´ o, Xavier I-286 Lloyd, Bryn A. I-717, II-874 Lo, Benny II-34, II-110 Lobo, Gabriel II-219 Lomax, A. I-659 London, M. I-261 Loog, Marco II-178, II-580
983
984
Author Index
Lopez, Oscar L. Lorenz, Cristian Lundberg, Peter
II-826 II-195 I-210
Ma, Burton II-628, II-935 Ma, Y. I-575 Maclair, G. II-411 Madabhushi, Anant II-278 Maeder, Anthony I-194 Maes, Frederik I-467 Magee, Derek I-566 Maier-Hein, L. II-42 Malach, Rafael I-110 Malandain, Gr´egoire II-203 Manduca, Armando I-809 Mangin, J.-F. I-27, I-515 Mankiewicz, Martin I-144 Manniesing, Rashindra II-562 Marchal, Maud I-709 Mart´ı, Joan I-286 Martel, Sylvain I-144 Mathieu, Jean-Baptiste I-144 Matinfar, Mohammad II-926 Matth¨ aus, L. I-177 McGregor, Robert H.P. II-227 McIntosh, Chris II-503 McKeown, Martin J. I-767 McLennan, Geoffrey I-842 McRobbie, Donald II-604 Meer, Peter I-617 Meinzer, H.-P. II-42 Melonakos, John I-36, II-420 Mendon¸ca, Paulo II-420 Merges, Reto D. I-977 Metaxas, Dimitris N. I-302, I-751, I-800 Meyer, Carsten II-402 Micusik, Branislav II-460 Miller, Karol I-541 Miller, Michael I. I-186 Mills, S. II-376 Mitsuishi, Mamoru I-994 Mohan, Vandana I-36 Mohiaddin, R. I-335 M¨ oller, Torsten II-726 Moonen, C.T.W. II-411 Moore, John II-94 Moradi, Mehdi II-900 Moreno, Antonio I-626 Mori, Kensaku II-336, II-644 Morikawa, Shigehiro I-491
Morosan, Patricia II-760 Morrison, P. II-969 Morriss-Kay, Gillian M. II-452 Morrissey, Sean Patrick II-344 Mott, Meghan II-882 Mou, Xuanqin II-596 Mountney, Peter II-34 Mousavi, Parvin II-900 Mozer, Pierre II-26 Mueller, Daniel I-194 M¨ uller, S.A. II-42 Munsell, Brent C. I-507 Murgasova, Maria I-532 Muzik, Otto I-367 Mylonas, George P. II-102, II-660 Myronenko, Andriy II-428 Naghavi, Morteza I-202 Naidich, David P. I-784 Nakajima, Yoshikazu I-994 Nakamoto, Masahiko I-86, II-68 Nakamura, Hironobu I-86 Nakamura, Yoshihiko II-336 Napel, Sandy II-636 Nash, Martyn P. I-651 Navab, Nassir I-102, I-136, I-434, I-458, I-475, I-743, II-18, II-252, II-327, II-527, II-536, II-652, II-718, II-909 Nawano, Shigeru II-336 Neeman, Ziv I-128 Ng, Bernard I-767 Nicolau, S.A. II-77 Nielsen, Mads II-178, II-352, II-580 Nielsen, Poul M.F. I-651 Niessen, Wiro II-562 Niethammer, Marc I-1, I-36, I-161, I-850 Noble, J. Alison I-153, I-343 Novak, Carol L. I-784 Novellas, S´ebastien I-252 Nystr¨ om, Fredrik I-210 O’Connor, J.P.B. II-376 O’Donnell, Lauren J. I-351, II-161 O’Keefe, Graeme I-228 O’Malley, Sean M. I-202 O’Shea, Peter I-194 Odry, Benjamin L. I-784 Ogier, A. II-693 Okada, Toshiyuki I-86
Author Index ´ Olafsd´ ottir, Hildur II-452 Oliver, Arnau I-286 Olmos, Salvador I-667 Orderud, Fredrik I-858 Orihuela-Espina, Felipe II-270 Østergaard, Lasse R. I-409 Ostermann, J¨ orn II-195 Ou, Wanmei II-751 Oubel, Estanislao II-452 Ourselin, S´ebastien I-228, I-701, II-186 Ozaki, Toshifumi I-994 Padoy, N. I-102 Paley, Martyn II-604 Palmer, Samantha J. I-767 Pang, Wai-Man II-842 Papademetris, Xenophon I-44, II-553 Papadopoulo, Th´eo II-793 Paragios, Nikos II-536 Parker, G.J.M. II-376 Parrent, Andrew G. I-483 Paulsen, Rasmus R. II-801 P´echaud, Micka¨el II-793 Peinecke, Niklas I-850 Pekar, Vladimir I-601 Peloschek, Philipp I-968, II-460 Pennec, Xavier I-675, I-959, II-77, II-319, II-826 Pennell, Dudley II-834 Penney, Graeme I-68 Perchant, Aymeric II-319 Perlyn, Chad A. II-452 Pernuˇs, Franjo I-450 Peters, Jochen II-402 Peters, Terry M. I-442, I-483, II-86, II-94 Pettersen, Paola C. II-352, II-580 Pettersson, Johanna I-210 Pham, Dzung L. I-94 Pianka, F. II-42 Piper, Steve I-491 Pitiot, Alain I-759 Pizarro, Daniel II-219 Plambeck, Gerry I-60 Platel, Bram I-584 Pock, Thomas II-511 Poon, Kelvin II-444 Poupon, C. I-27 Poupon, F. I-27, I-515 Pouponneau, Pierre I-144
985
Prˆeteux, Fran¸coise I-825 Prima, Sylvain II-344 Prince, Jerry L. I-499, II-701 Qian, Zhen I-800 Qin, Jing II-842 Qiu, Anqi I-186 Quesson, B. II-411 Rabben I., Stein I-858 Radjenovic, Alexandra II-261 Rajagopal, Vijay I-651 Rangarajan, Anand I-942 Raniga, Parnesh I-228 Ratnalingam, Rish I-566 Raundahl, Jakob II-580 Razavi, R.S. I-335, I-575 Rea, Marc II-604 Reddy, R. I-261 R´egis, J. I-515 Reichl, Tobias I-458 Reid, R. Clay II-710 Reiss, Michael II-287 Renaud, Pierre I-78 Resnick, Jeff I-60 Restif, Christophe II-588 Reuter, Martin I-850 Rhode, K.S. I-575 Ridgway, Gerard R. II-785 Ries, M. II-411 Rivi`ere, D. I-515 Roberts, C. II-376 Roche, A. I-27 Rohling, Robert II-918 Rolland, Jannick P. I-626 Rolland, Yan II-891 Rose, C.J. II-376 Rosen, Mark II-278 Rousson, Mikael II-495 Rowe, Christopher I-228 Roy, Arunabha S. I-942 Rueckert, Daniel I-18, I-335, I-523, I-532, II-127, II-211, II-544 Rupp, Stephan II-360 Rustaee, Mohammad I-434 Rutherford, Mary A. I-18, I-532, II-127 Rydell, Joakim I-210 Saad, Ahmed II-726 Sabuncu, Mert R. I-351, I-683 Sachdev, Perminder I-375
986
Author Index
Sadowsky, Ofri I-499, II-519 Sagawa, Ryusuke I-775 Saggau, P. II-486 Sahn, David J. II-428 Salaru, Gratian I-617 Salcudean, Septimiu E. I-401, I-709 Samani, Abbas II-244 Samset, E. II-50 San Jos´e Est´epar, Ra´ ul I-1 Santamar´ıa-Pang, A. II-486 Santhanam, Anand P. I-626 Sato, Yoshinobu I-86, II-68 Sauer, Frank I-60 Scahill, Rachael I. II-785 Schaap, Michiel II-562 Schaeffter, T. I-575 Scherrer, B. I-219 Schlaefer, Alexander II-668 Schmid, Volker J. I-393 Schmidt, Stefan I-601 Schmied, B.M. II-42 Schnabel, Julia A. II-785 Schnall, Mitchell D. I-933, II-393 Schneider, Armin I-458 Schramm, Hauke II-402 Schreck, Pascal II-676 Schumacher, H. I-177 Schweikard, Achim I-177, II-668 S´egonne, Florent I-891 Seitel, A. II-42 Seitel, M. II-42 Sermesant, Maxime I-549, I-575 Shen, Dinggang I-933, I-950, II-393 Shenton, Martha E. I-161, I-850 Shi, Feng II-136 Shi, Lin II-818 Shi, Pengcheng I-270, II-734 Shiba, Masatsugu I-775 Shimada, Ryuji I-86 Shokoufandeh, Ali II-742 Sicotte, Nancy L. I-985 Sielhorst, Tobias II-652 Sigfridsson, Andreas II-385 Silverman, S. II-969 Simon, Tony J. II-777 Simonetti, Arjan W. I-601 Simpson, Amber L. II-935 Singh, Anurag K. I-128, II-59 Sinusas, Albert I-44 Sirohey, Saad II-420
Slabaugh, Greg II-612 Smal, Ihor II-562 ¨ Smedby, Orjan I-311 Smith, Ben II-726 Smith, Kate I-36 Soler, Luc II-77, II-676 Song, Danny Y. I-119, II-9 Song, Lan I-977 Song, Xubo II-428 Song, Zhuang I-883 Sonka, Milan I-244, I-842 Soulez, Gilles I-144 Spaan, J.A.E. II-436 Spencer, Dennis D. II-553 Srinivasan, Latha I-532, II-127 Steppacher, Simon II-951 Stockel, Jonathan I-842 Stoyanov, Danail I-417, II-34, II-660 Streekstra, G.J. II-436 Studholme, Colin II-311 Suehling, Michael I-977 Suenaga, Yasuhito II-336, II-644 Suetens, Paul I-467 Sugita, Naohiko I-994 Sun, Deqing II-809 Sun, Hui II-777 Sun, Yiyong I-60, I-977 Sun, Z.Y. I-515 Sundar, Hari I-950 Sundaram Cook, Tessa I-817 Suzuki, Masahiko I-994 Switala, Andrew E. II-882 Syeda-Mahmood, T. I-261 Szczerba, Dominik I-717, II-227, II-874 Sz´ekely, G´ abor I-659, I-717, II-227, II-620, II-858, II-874 Szumilas, Lech II-460 Takabatake, Hirotsugu II-644 Tam, Julian II-760 Tamaz, Samer I-144 Tamura, Shinichi I-86, II-68 Tang, Tommy W.H. I-916 Tank´ o, L´ aszl´ o B. II-352 Tannast, Moritz II-951 Tannenbaum, Allen I-36 Tatli, S. II-969 Taylor, Russell H. I-499, II-519, II-926 Taylor, Zeike A. I-701 Teboul, Olivier II-960
Author Index Tekbas, A. II-42 Temel, Yasin I-584 Tempany, Clare MC I-491, II-50 ter Haar Romeny, Bart M. I-584, II-436 ter Wee, R. II-436 Tetzlaff, Ralf I-817 Thiemjarus, Surapa II-34 Thompson, Paul M. II-826 Tiwari, Pallavi II-278 Toga, Arthur W. I-985, II-169, II-826, II-891 Tomaˇzeviˇc, Dejan I-450 Tosun, Duygu II-891 Totman, John I-759 Tran, Denis II-918 Traub, Joerg I-475, II-18, II-252, II-909 Troccaz, Jocelyne II-26 Trouv´e, Alain I-675, II-685 Trzasko, Joshua I-809 Tse, Zion Tsz Ho II-604 Tsui, Hungtat II-809 Tsujimura, Yukihiro II-336 Tu, Zhuowen II-169 Tuchschmid, S. I-717 Tustison, Nicholas I-817 Tziritas, Georgios II-536 Unal, Gozde II-612 Urschler, Martin II-511 van Assen, H.C. II-436 van der Lugt, Aad II-562 van Muiswinkel, Arianne M. I-601 van Vliet, Lucas I-169 van Walsum, Theo II-562 Vemuri, Baba C. I-908 Vercauteren, Tom II-319 V´erin, Marc II-891 Vieira, D. II-110 Vilanova, Anna I-584 Villard, Caroline II-676 Villemagne, Victor I-228 Visser-Vandewalle, Veerle I-584 Vives, Kenneth P. II-553 Vogel, Jakob I-458, II-252 von Berg, Jens II-195 von Lavante, Etienne I-153 von Siebenthal, M. I-659 Vos, Frans I-169 Vu, Hai I-775
987
Wachinger, Christian II-327 Wacker, Matthias I-725 Waechter, I. I-634 Wang, Chunliang I-311 Wang, Defeng II-818 Wang, F. I-261 Wang, Linwei I-270 Wang, Song I-507 Ward, Aaron D. I-278 Warfield, Simon K. II-186, II-710, II-969 Warren, Ruth I-651 Watson, Y. II-376 Weese, J¨ urgen I-634, II-402 Wein, Wolfgang I-136, I-743, II-327, II-718 Wells III, William M. II-477 Wen, Xu I-709 Wendler, Thomas II-252, II-909 Wengert, Christian II-620 Wenisch, Oliver II-652 Westin, Carl-Fredrik I-1, I-161, I-236, I-351, II-161, II-368 Whitcher, B. II-376 Whitcomb, Louis L. II-59 Wierzbicki, Marcin II-94 Wiest-Daessl´e, Nicolas II-344 Wigstr¨ om, Lars II-385 Wiles, Andrew D. I-442 Willsky, Alan S. II-477 Winter, Christian II-360 Wittek, Adam I-541 Wittenberg, Thomas II-360 Wolf, I. II-42 Wolf, Matthias I-842 Wolter, Franz-Erich I-850 Wong, John II-926 Wong, Ken C.L. I-270 Wong, Tien-Tsin II-818, II-842 Wood, Bradford J. I-128 Wu, Qian II-866 Wu, Xiaodong I-244 Wu, Xunlei I-557 Xie, Jun II-809 Xu, Chenyang I-60, I-725, I-950, I-977, II-495 Xu, Sheng I-128 Xu, Xiaodong I-977 Xue, Hui I-18, II-127
988
Author Index
Yagi, Keiko I-775 Yagi, Yasushi I-775 Yahia, L’Hocine I-144 Yang, Guang-Zhong I-393, I-417, I-609, II-102, II-110, II-270, II-660, II-834, II-866 Yang, Lin I-617 Yedidya, Tamir I-792 Yeo, B.T. Thomas I-683 Yeung, Benson H.Y. II-818 Yokota, Keita I-86 Yoshino, Ichiro II-68 Young, Ian II-604 Yu, Jingyi I-933 Yu, Yong II-685 Yue, Ning j. I-751 Yuille, Alan I-985 Yushkevich, Paul A. II-211, II-777 Zach, Christopher II-511 Zahiri-Azar, Reza I-709 Zerfaß, Thorsten II-360 Zhang, Fan II-144
Zhang, H. I-359 Zhang, Heye I-270 Zhang, Hui I-294, II-211, II-777 Zhang, Qi II-86 Zhang, Weiwei I-343 Zhang, Xiangwei I-842 Zhang, Xuan II-951 Zheng, Guoyan I-834, II-951 Zheng, Yuanjie I-933, II-393 Zhong-Yang, Guang II-34 Zhou, Jinghao I-751 Zhou, Luping I-375 Zhou, Yuan II-136 Zhu, Lei I-977 Zhu, Wanlin II-136 Zhu, Yanong I-566 Zhu, Yun I-44 Ziegler, Sibylle I. II-252, II-909 Zikic, Darko I-743 Zivanovic, Aleksandar II-604 Ziyan, Ulas I-351 Zou, Guangyu I-367