Medical Images Segmentation Operations

. Extracting various valuable medical information from head MRI and CT series is one of the most important and challenging tasks in the area of medical image analysis. Due to the lack of automation for many of these tasks, they require meticulous preprocessing from the medical experts. Nevertheless, some of these problems may have semi-automatic solutions, but they are still dependent on the person's competence. The main goal of our research project is to create an instrument that maximizes series processing automation degree. Our project consists of two parts: a set of algorithms for medical image processing and tools for its results interpretation. In this paper we present an overview of the best existing approaches in this field, as well the description of our own algorithms developed for similar tissue segmentation problems such as eye bony orbit and brain tumor segmentation based on convolutional neural networks. The investigation of performance of different neural network models for both tasks as well as neural ensembles applied to brain tumor segmentation is presented. We also introduce our software named "MISO Tool" which is created specifically for this type of problems. It allows tissues segmentation using pre-trained neural networks, DICOM pixel data manipulation and 3D reconstruction of segmented areas.


Introduction
Modern ray diagnosis is at the stage of development, and completely different settings and methods are required for different organs: x-ray, MRI, CT, ultrasound are supplemented with invasive contrast methods. Only the doctor can see everything necessary for correct diagnosis and subsequent treatment. However, at the heart of all these methods lie common tasks -the most accurate visualization of the selected zone and obtaining as much data as possible from the results of the examination. In 3D methods (CT and MRI), these tasks are essentially the same, despite the differences in both physical principles and additional settings. Since the goal of our work is to create a tool that would as accurately as possible visualize isolated structures from raw data obtained by MRI and CT procedures, then this complex work can be decomposed into separate logical components. To isolate complex structures, we formulated the problem of segmentation of tumor processes in MRI images. MRI better visualizes soft tissue and allows to carry out various sequences, change the basic settings of the method in a wide range and use contrast agents. To determine the volume and edge isolation of structures, the problem of determining the volume of bony orbits on a CT was singled out. In this method the bone structures have a high contrast, the distance between slices is very small, and the method itself is widely distributed and takes little time, which allows to study a large data volume. From the point of medical informatics those problems are not completely dissimilar and could be solved in a unified manner. Moreover, creating a single instrument that may solve all of these challenging tasks autonomously will not only save doctors' time, but also decrease the amount of errors. To the best of our knowledge, there have not been introduced any instrument for automatic segmentation of different body tissues. We came to the conclusion that while the segmentation tasks on different body parts may seem different, they may also all be derived from a core solution based on the deep neural networks. In this work, we explored state-of-the-art solutions based on deep neural networks for brain tumor segmentation and created an ensemble to see if their performance can be improved and used not only for the brain segmentation task but also for complicated head bony structures in general. We use the results of this research as a first step for creating a convenient and powerful instrument for all medical specialties.

Overview
An interest in the possibility of medical images segmentation has increased during the last decade and many different approaches were explored. However, only a few researches evolutionized into complete useful tools for medicine. Commonly used software, that allows semi-automatic segmentation is Brainlab IPlan (commercial distribution) and ITK-SNAP (open source project). The main feature of IPlan, that have already been used in several studies [1,2], is atlas-based segmentation. Atlas is the described and sketched out by experts shape variations of the ROIs (Region of Interest). Due to complexity of human body structure, there are many problems about the accuracy of delineated atlas. ITK-Snap allows segmentation via active contour evolution method -smooth blow-out of preplaced bubbles into the desired region of interest [3]. Although many of the tasks have been solved by these instruments, there are still many problems that specialists face constantly waiting for improvement. Segmentation is performed by manual or semi-automatic methods. For the brain tumor segmentation problem many different approaches have been explored and evaluated. There may be formed mainly two classes for these algorithms: methods, which require training on the dataset in advance and those which do not. Early works in this area treated a brain tumor segmentation problem as an anomaly detection problem on the image. Representative works for these approaches may be [4] and [5]. The main advantage of these works is that the presented solutions do not need to be trained beforehand, however that makes it harder to improve the quality of the detection, especially on the smaller tumors. Another class of approaches is based on the idea of using supervised learning methods, such as random forests [6] or support vector machines [7]. These models can learn a powerful set of features and work quite well on the most common cases, but due to the highly discriminative nature of brain tumors it is hard to detect the correct feature set and create a good model. As a result, recent approaches on segmentation refer to the deep neural networks. It is a powerful instrument that has a capability of extracting new features while training and hence may outperform pre-defined features sets of the supervised learning methods. The results of these algorithms may be also used for different kinds of medical images. We are developing our own tool -Medical Images Segmentation Operations (MISO), which uses neural networks as a back-end for solving various segmentation tasks in medicine. In the next sections we overview separately application of neural networks for brain tumor and bony orbit segmentation as they were trained and used in MISO.

Brain Tumor Segmentation
For that task we chose to overview two CNNs (Convolutional Neural Networks) with different architecture which have proven to be the best in this field: DeepMedic [8] -11-layers deep, multi-scale, 3D CNN with fully connected conditional random field and WNet [9] fully convolutional neural network with anisotropic and dilated convolution.

Implementation Details
For WNet we used configuration described in the original papers and BRaTS 2017 dataset for training. For DeepMedic we trained two versions of this network on different datasets and injected some changes into original architecture of this network. For the first version we introduced the following changes: model was trained only on T1 and T2 images. The reason for that change was that these are the most common MRI sequences.
Having a network trained only with this data makes the model available for more hospitals in future. Also, instead of PReLu non-linearity, introduced in the original model, we use SELU [12], which improves the performance and time spend on training. For the second version of DeepMedic we also used SELU, but this network was trained only on T1 images. We wanted to explore how this network will cope when having only one source. For all of these three networks we separated initial dataset into 3 chunks: training (about 80% percent), validation (10%) and test (10%). The performance of these networks on test data may be seen at Table 1. In the observed studies, authors were aiming not only to detect the tumor but also to segment the tumor into three categories: whole tumor, tumor core and enhancing tumor core. However, in our work we are only interested in the whole tumor detection problem.

Detecting the Percentage of False Negative Segments
The original works analyse the quality of CNN performance based on the Dice and Hausdorff measurements, which are good for the segmentation problems in general, but hides the necessary details about misclassifications. For that reason, we explored the results from work of the considered networks to determine the percentage of false positives via false negatives results. Our main goal was to examine whether these methods are more prone to predict false positives then false negatives.
Since the decisive opinion during the diagnosis and treatment is always on doctor, our main goal is to indicate if there may be a pathological tissue and get the surgeon's attention to this area. Our system is aiming to find all suspicious areas and send them for reevaluation to medical specialist. Hence, one of the main qualities of this system that should be optimized first-hand would be not false positive results, but false negatives, because those when unnoticed may not get the essential medical care and be a reason for future proliferation of tumor cells. The results of this experiment may be seen at Table 2.

Neural Network Ensembles
We wanted to detect whether the general performance of these three networks can be improved, when they are used together. So, we proposed the idea of forming the neural networks ensemble [13] out of them. We implemented the following voting scheme: for each voxel we determine each individual result for every neural network, based on their already pre-trained models, and then we qualify a voxel as part of the tumor if and only if the majority of networks classify it as tumor, otherwise it is considered to be a healthy matter. The results of this experiment may be seen at Table 3.

Methods
Our approach consists of two steps. First of all, image classification was presented, dividing initial dataset into two groups: «contains orbit» and «does not contain orbit». The next step is to segment the orbit in the images marked by the classifier in the previous stage. In this paper first step is described in details, whereas the second step is introduced briefly as it is the subject of further research.

Data Collection
Raw CT scans was presented by faculty of Medicine of Saint Petersburg State University. Using Toshiba Scanner as instrument and Helical image acquisition as main method, 5 series were made and anonymized. The initial image dimensions were 512*512, using short (2-byte number) to represent radiation intense with Grayscale Standard display function. Orbits occupy less than 1/4 of the image, so we reduced the original size from 512*512 to 256*256 in order to decrease computation complexity ( fig. 2 b). Slices with orbit was labeled and some of them was manually segmented by expert ( fig. 2 c). Total amount of data: 601 sinus + 80 head CT images were marked as «contains orbit» and 1414 were marked as «doesn't contain orbit». 150 images were segmented.

Model Choosing
To achieve best classification performance of 1st CNN, some important parameters like number of layers and convolutional kernel size must be chosen. So, several kernel sizes and layers number have been evaluated for classification accuracy. The quantitative assessments are shown in Table 4. As a result, the model used for training consisted of eight layers, out of which four were convolutional layers and four were fully connected layers. The output of last fully-connected layer has been fed to a sigmoid function, as it is a standard neural network classification layer [14]. The initial images were cropped and compressed in order to reduce training time.
Hence, network accepts grayscale images of dimension 128 × 128 as inputs. The first layer filters input with 32 kernels of size 5 * 5.
As it could be seen from experiments, rectified linear unit (ReLU) [15] nonlinearity applied to the outputs of all convolutional layers gives best result compared with other activation functions. The (n+1)th convolutional layer takes the output of nth as input processed by ReLU nonlinearity and max pooling layer respectively and process it with Fn + 1 filters. Filters configuration are shown in Table IV. All fully connected layers have equal number of neurons i.e., 256. For the Second CNN the U-net architecture [16] was chosen, as it has already proven its suitability for segmentation in general. Several layer sequences were evaluated to find most fitting model. In order to reduce bias and increase universality, 2 dropout layers with dropout rate equals to 0.2 were added.

Training Details
Classification CNN was implemented, trained and evaluated using Python 3.6 as programming language on NVIDIA GTX740M GPU with CUDA Toolkit 9.0 and CuDNN 7.0.5. Keras 2.1.*(version was continuously updated during development) was chosen as neural networks framework, working on top of Tensorflow 1.5*. We have trained and evaluated CNNs on a range different filter models (number of filters in each convolutional layer), kernel sizes and neuron amount in fullyconnected layers. Also experiments with dropout layer [17] were performed.

Output Image Visualization
After segmentation has been performed, series of marked images are converted to voxel grid using initial DICOM metadata in order to create 3D model using Marching cubes algorithm by means of MISO Tool and The Visualization Toolkit library. Result is presented in fig. 3.

Images cropping
As the main purpose of our work is to create an instrument, that could be run on our servers from multiple clients, in order to deliver the best performance to the customers and lessen waiting time, computation complexity must be decreased as much as possible. To achieve that goal, it was decided to perform experiments with cropped and resized images. When the image was reduced to less than 128*128, we were unable to achieve the required accuracy. The best result under the condition "accuracy > 0.95" showed the approach in which a piece of 256*256 was cut out of the image, which subsequently was compressed to 128. Because of high similarity of head position in CT scans, it was not necessary to move the cropping window.

Performance
For the 1st CNN we used different kernels from 3 to 11 pixels, different CNN model configurations, activation functions and a suitable epoch number to illustrate which one of these properties support CNN to get the highest level of performance. Data was split between train and validation in proportion 4:1. Our model performs best after 115 training epochs -validation accuracy 99% and then stabilizes. Dropout layers with dropout rate lower than 0.4 doesn't impact the accuracy significantly, and more than 0.4 fails the accuracy to ~85%, so it was decided to exclude dropout layers from final model. Worth noticing the fact that models with 512 neurons in each FCL showed approximately same result as a model with 256 neurons, but it takes up to 1.4 times more computation time, so 256 was chosen as less resource-consuming.