Cancer researchIn
Ross et al. (2000), an important dataset for the molecular classification of different types of cancer was introduced, namely the NCI60.
The data corresponds to gene expression in 64 cell lines using DNA microarrays robotically spotting 9,703 cDNAs. The cDNAs included approximately
8,000 different genes. At the time of presenting this dataset, 3,700 of the genes represented previously characterized human proteins and 2,400
were identified only as ESTs. Our work is based on the authors' website supplement with the gene expression of 6,831 genes.
There are several good reasons to use this instance for our studies. In their original paper, Ross et al. have identified several groups
of genes that correspond to some of the tissue characteristics of the cell lines. Of particular interest for our objectives
are two groupings named "Leukaemia Cluster" and "Melanoma Cluster".
These have been visually identified from a hierarchical clustering
as a highly-expressed group of genes in the leukaemia-derived and in most of the melanoma-derived cell lines. It is, however, very difficult
to identify, from a hierarchical clustering, an analogous group of genes that is highly under-expressed and that is a robust significant
marker of differential expression within the same cell-line and that at the same time discriminates well all other types of lines.
The approach that we used in this study has been designed to uncover such groups if they exist. To our knowledge, no other method has
been able to identify some of the key genes that allow such an interpretation linking both the highly expressed or under expressed
gene expression of groups of genes on this dataset.
In addition to finding large signatures for the Leaukaemia and Melanoma cell lines, we also found signatures for other three types
of cancer represented in the NCI60 dataset: Central Nervous System, Renal and Colon. They are shown in the figure below.
For more information about the results and the list of genes in each of the signatures, please refer to the publication:
Molecular Classification of Cancer using Integer Programming Models and Algorithms
R. Berretta, A. Mendes and P. Moscato, submitted to the
Journal of Research and Practice in Information Technology
Or go directly to the
supplementary material webpage.