Big Data Project at UPMC Reveals Patterns in Breast Cancer Tumors

Using infrastructure from IBM, Oracle and others, University of Pittsburgh researchers have detected genetic changes in the makeup of breast cancers.

The University of Pittsburgh Medical Center (UPMC) has announced progress in a big data project involving breast cancer research.

Using an enterprise data warehouse built as part of a five-year $100 million investment in IT infrastructure from dbMotion, IBM, Informatica and Oracle, announced on Oct. 1, 2012, Pitt researchers examined de-identified data on 140 patients. They contributed this tumor data to The Cancer Genome Atlas (TCGA), a federally funded project that aims to produce comprehensive genomic maps of the most common cancers.

"In the patient's tumor, we try to understand what are the drivers of the tumor and what are the genetic changes that cause it to be a cancer," Dr. Adrian V. Lee, director of the Women's Cancer Research Center at the University of Pittsburgh Cancer Institute and Magee-Womens Research Institute, told eWEEK. "Once we found them, we can target them."

Pitt researchers used high-performance computing (HPC) to integrate clinical data from electronic health records (EHRs) as well as genomic information for the patients and compared it against age, tumor size and nodal status.

Integrating clinical information from EHRs with big data projects has been a goal for researchers, Lee said. Those clinical systems are usually separated from the research setting, where large amounts of data are measured, he explained.

HPC analytics will allow researchers to mine large amounts of genomic and proteomic data, which comes from an examination of proteins that cells or tissue produce. Researchers will also eventually be able to mine data from radiology images and the costs of providing care, UPMC reported. When clinical and financial data reside in separate databases, researchers have difficulty integrating and analyzing variables in the data, according to Lee.

"The specific goal of the UPMC project is to try and integrate these large amounts of data we can generate into the clinical record," Lee said. "That historically has been a huge stumbling block."

Researchers analyzed two types of breast cancer "omic" data, gene expression and copy number variant data, and will study additional types.

In the project announced June 19, UPMC and Pitt researchers were able to use the big data tools to detect molecular differences in the makeup of pre-menopausal and post-menopausal breast cancer.

"Having a single source system allows us to ask rapid-fire, translational questions around personalized medicine," Lee said.

Women who get pre-menopausal breast cancer usually have worse health outcomes compared with those who have post-menopausal breast cancer. More research is required to find out why, he said.

Other questions Lee would like to address using big data analytics are whether breast cancer is different for elderly patients and if women should be receiving therapy around age 80, he said.

He also noted variations in women of different races and how big data could help researchers understand these patterns.

"African-American women tend to get aggressive breast cancer and also die from their breast cancers earlier than Caucasian women," Lee said. "Once you have it in this single infrastructure and database, you can start to ask all these relational questions you just simply couldn't before because all this data is spread out in all these different information systems."

After further research, it's possible doctors will be able to prescribe individual therapy based on changes in a person's tumor, Lee explained.

"Each of us has unique genetic attributes, and what we want to do is find subgroups of patients where their tumors behave differently so that we can target them specifically," Lee said.

It's not the data that's important but the ability to aggregate it, learn from it and gain knowledge about patients, according to Lee.

"We're producing data much faster than we can create knowledge," Lee said. "The data is starting to become overwhelming.

Researchers are trying to sequence tumors to make sense of the big data, he said. "When we sequence a tumor there are lots of changes we find in the tumor that are noise," he said. "You have to find it and sift through all this noise to get to it."

That takes computational power, he added.

Although researchers started with only 140 breast cancer patients, studies usually have 1,000 or 2,000 participants, Lee noted. If the trial goes well, UPMC will move on to larger research projects on other types of cancer and diseases. UPMC plans to complete the first phase of its multi-year analytics project by the spring of 2014.