Research Highlights

Biomedical Big Data Training Collaborative: Creating the Go-To Place for Information, Tools, and Training

box

For the last 5-10 years, a lot has been said and written about “big data” and the insight it’s reputed to hold – if only you can find an efficient, effective way to get at it. But big data can be a bewildering topic, as there seem to be as many types and formats of big data as there are methods to analyze it.

So if you’re a biomedical researcher, how do you get started? How do you find the experts? (But you worry they haven’t been tasked to help you…) Where can you find the most useful information? (But it has to relate to your particular purpose and be sufficiently technical. Google searching is not going to cut it.) Where can you find the right resources so you can help yourself? And how can you get trained quickly on relevant tools?

These needs are just the kind of thing the Biomedical Big Data Training Collaborative (BBDTC, https://biobigdata.ucsd.edu) at UC San Diego was designed to address. BBDTC is compiling, organizing, integrating, and making easily searchable the body of human knowledge about big data. BBDTC’s vision is to cultivate, by encouraging large-scale community collaboration, a technically accurate, comprehensive, evolving, and freely accessible knowledge and data repository for biomedical big data. Researchers will be able to access it for their own benefit and contribute to it, so its value will continue to grow over time. The result will be the “go-to place” for biomedical big data information, tools, and training.

Ilkay Altintas and Rommie Amaro, collaborators in the National Biomedical Computation Resource (NBCR), are project PIs. Altintas is Chief Data Science Officer at the San Diego Supercomputer Center (SDSC), and Amaro is Associate Professor of Chemistry and Biochemistry at UC San Diego and director of NBCR. Altintas, based on her cyberinfrastructure and collaborative data background, has responsibility for technical development of the resource. Amaro, as a well-known researcher in the biomedical community, serves as the interface to that community in promoting the resource and soliciting contributions.

“One of the problems BBDTC is intended to address,” says Altintas, “is how to teach students with a wide variety of backgrounds and specialties taking a given class. Faculty teaching big data courses can augment their materials with BBDTC to help students come up to speed on particular topics, relevant to the class they’re taking, in which they have no background.”

Amaro takes this key point one step further: “In effect, BBDTC is helping level the playing field among the students’ knowledge base so the instructor can focus on the content he or she wants to teach. The students of course have to take responsibility for doing whatever extra work might be needed to keep up with course content that’s initially unfamiliar to them.”

In the parlance of NIH, the team’s specific aims are to develop biomedical big data curricula, an Open Online Course (OOC) framework, a software toolbox, and repository interfaces to engage diverse community stakeholders. The team, employing best practices and building on current research and training efforts, is focusing on example courses, lecture content, and application use cases. They also plan to disseminate those best practices for developing and delivering course content, complemented by adaptive learning approaches and methods to assess what’s been learned. In particular, they will deliver portable and customizable virtual machines (VMs) that include course materials, hands-on tools, and example data.

The entry point to this wealth of material is through the BBDTC website (Figure 1). Here you can become part of the BBDTC community by creating a login, then customizing your environment to suit your needs. Once your login is created, you can sign up for courses offered in a variety of scientific areas related to biomedical big data (e.g., look for an introductory course that explains what biomedical big data is and where it comes from). You will also be able to upload your own course materials (including lecture slide decks, links to YouTube videos, and required software tools) to share them with the community.

Once training materials are uploaded, after a quick approval process, they become available as public resources to anyone who has a site login. So anyone with a registered login can use the materials as part of online courses. Faculty can use these materials to augment, even update, traditional classroom courses. Original providers can also track the views and downloads of their materials. And, best of all, BBDTC content is tagged and searchable, providing an easy way to find to relevant materials.

Courses are broken down into modules, which can be mixed and matched for use across multiple courses and reshuffled, just as you might shuffle a song playlist on a smart phone, producing new courses dynamically. Users may create playlists to suit their learning requirements and share them with individual users, their research groups, their departmental colleagues, or the public at large.

Tags, like keywords, help you find content, events, and members with common or similar interests. Tags can be added to groups, your profile, resources, wiki pages, and events. When creating/editing content, you can add/remove tags as you wish. If a tag doesn’t exist, you simply type it in the “Tags” form field to create it, which makes it available for all users. Tag topics include software applications, scientific areas of investigation, “how-to” type topics, organizations (e.g., NBCR), and training opportunities. The home page (Figure 1, right side) displays existing tags, listed alphabetically, that you click on individually to access the content in that tag.

Questions & Answers operates like a bulletin board where users post and answer questions. But, as one of the especially innovative features of the website, it applies the idea of “market” value to reflect the level of community interest that a particular question has generated. This value is calculated by summing the weighted number of answers, recommendations, and answer votes, e.g., each additional answer increases the value of the question by 10 points, each recommendation by 2 points, etc. When the question asker selects an answer as “most helpful,” the question receives 20 bonus points and is “closed.” At that point, the accumulated value of the question is distributed among the participating sites: 1/3 to the asker, 1/3 to the user providing the best answer, and 1/3 split among all users who answered the question, provided that their responses received at least 3 community votes and more than half of those are positive (that is, users found the responses “helpful”).

In addition, you get monthly “royalty” payments for your questions and answers based on the total of question recommendations and answer votes. The more community interest generated by your question or the more positive votes your answer receives, the more points you earn. You can boost the value of your question by assigning a point reward for best answer, but you have to have sufficient “funds” in your account to cover the value you set.

From a technical standpoint, BBDTC leverages the design and maturity of the HUBzero content-management platform. To facilitate migration of existing content, BBDTC supports importing and exporting course material from the edX platform. Migration tools will be extended in the future to support other platforms.

Software packages that can be used for hands-on training, what the team calls toolboxes, are supported as downloadable lightweight Virtualbox Images, providing a standardized software tool environment with software packages and test data on your personal machines. They are also remotely accessible via Amazon EC2 Virtual Machines.

This project is a collaboration with Maryann Martone, a professor in the UC San Diego Neurosciences Department, who is focusing on knowledge management for biomedical big data, and Florin Vaida, a biostatistics expert who teaches online M.S. courses. The collaboration includes computer scientist Judy Qiu at Indiana University and her group’s expertise with online training and MOOC-based development. In addition, UC San Diego is supporting development of the platform through several campus research institutes – SDSC, the California Institute for Telecommunications and Information Technology, and the Center for Research in Biological Systems – and the Department of Chemistry and Biochemistry. Besides Altintas and Amaro, the project depends on two technical staff members focused on course development/playlist integration and course data management/integration with other platforms.

“We’ve just finished Year 1 of our grant,” says Altintas, “and we’re already seeing encouraging results. We filmed the teachers at two of our 2015 NBCR workshops. One instructor who teaches tomography data processing works with 20 students a year and is now sending them to BBDTC to watch the tutorials so the students can learn the full range of material on their own time.”

Researchers: UCSD: Rommie E. Amaro, Ilkay Altintas