Genes Families hacked by Microsoft Excel

Few months ago I was playing with some data provided by the Sanger Institute. It was Copy Number Variation (CNV) data for 417 genes across a panel of 780 cell lines of interest in oncology. Data was provided in an Excel file where each column represents a gene and each line represents a cell line.

The Bioinformatics Open Space : Get/Give help with Biostar

This article is the kickoff of a sequel of articles called "Bioinformatics Open Space". It will mainly talk about the social web resources that can improve your productivity in bioinformatics. By productivity I mean either it will help you to fix a problem, keep you informed or give you inspiration for you daily tasks related to bioinformatics so that you won't feel alone. Today I will talk about a great resource that will help you to solve your question(s) related to Bioinformatics : "Biostar, Questions and Answers in Bioinformatics".

Cool data visualization using augmented reality animation

Each friday my colleague Steve sends to colleagues his nice "week-end reading" newsletter related to any interesting topics in bioinformatics or biostatistics. Just before christmas vacation it was about an interesting BBC's video of Hans Rosling, a professor of global health at Sweden's Karolinska Institute.

It is very funny because it reminded me that this guy is cited in Garr Reynolds's book "Presentation Zen" on page 207 : "Coventional wisdom say never never stand between the screen and the projector. Generally this is good advice. But as you can see from the photo, Rosling at times defies conventional wisdom and gets involved with the data in an energetic way that engages his audience with tha data and his story".

In this video you have a nice instance showing how Rosling gives life to his presentations. Moreover this time he is using augmented reality animation wich make him one step deeper in the heart of his presentation. In about 4 minutes he shows the evolution of life expectancy against income in 200 countries during the last 200 years.

You can see more cool presentations by Rosling or other good speakers at the TED website. The annual TED (Technology, Entertainment, Design) conference brings together the world's most fascinating thinkers and doers, who are invited to give insanely great talks on stage in only 18 minutes.

My first Bioinformatics Zen Presentation

One year ago I found out two very interesting books related to Power Point/KeyNote presentations "Presentation Zen" and "Presentation Zen Design". Through his books Garr Reynolds gives advices to deliver engaging and appealing oral presentations. I found it very interesting because sometimes it is not so easy to keep the scientists/biologists interested in the bioinformatics application we are presenting. In june 2010 I started to adapt an old excel plug-in dedicated to help scientists of our pharmacology group to analyze their Meso Scale raw data. Meso Scale assay technology provides a rapid and convenient method for measuring one or more protein targets within a single small-volume sample. Their 96-well plates supply a platform for the development of sandwich immunoassays.

When I was preparing my presentation related to the Plug-in I decided to spend some of my spare time to find ideas, pictures and sentences in order to produce a presentation zen that would keep researchers awake even if it take place after lunch time !!!!

So this article shows pictures that helped me to illustrate most of the messages/ideas I wanted to expose to the future users of the application. Hopefully it will give you inspiration for your future bioinformatics presentations. At the end you will find some good book(s) and Iphone application(s) related to the presentation zen philosophy.

The Body Browser By Google

Google is providing a new lab called the Body Browser. This new browser uses the WEBGL technology of 3D representation (instead of Flash). So you are able to explore the human body as you would do with Google Earth. You can show or hide several layers such as skin, bones, muscles etc. Then you can click on objects to get their respective label. If you want to give a try to this new tool you need a recent browser compatible with WEBGL technology such as Chrome 9 or Firefox 4.

New design

Six years ago, I started for fun and also in order to improve my HTML, CSS, SQL, PHP and Javascript skills. I also wanted to play with the Google services likes Google Analytics, Adsense, FriendConnect and Search. But during this period of time my experience in ergonomy and the design for websites and web applications was improving. And at beginning of 2010 I was shocked by how ugly the interface of was.

SSFA-GPHR Thyrotropin receptor (TSHR), follitropin receptor (FSHR) and lutropin/chorionic gonadotropin receptor (LHCGR) belong to the glycoprotein hormone receptors (GPHRs). They are a subgroup of family A GPCRs. This database and website have been designed to function as an information source on GPHR-related topics, collating and linking data from studies on i) naturally occurring mutations and site-directed mutations, ii) structures/structural models. Our aim is to facilitate the focused investigation of GPHRs to reveal new insights into the function and malfunction of these important receptors. SSFA-GPHR... * is a database for semi-quantitative Sequence- Structure- and Function- Analysis of GPHRs, * provides a condensed overview of available information such as mutagenesis data for GPHRs, * functional data are converted into unified scaled values to compare and to classify mutagenesis data from GPHR subtypes and different experimental approaches, * provides analyses of data by: focused extraction, comparison, projection and mapping on three-dimensional receptor structures and models. General Aims... * linking functional data with structural data for GPHR investigation, * contribution to new hypotheses regarding molecular interactions and activation mechanisms, * evaluation of data availability (including lack of information) and consistency. New features in the second database release... * a structure-based search for mutation data using three-dimensional structures or homology models, * structural morphings between basal and activated receptor conformations allow changes in amino acid interactions during activation to be traced, * inclusion of double and triple mutations, * improved handling and features for data analyses.

RCDI - eRCDI The RCDI server is a web-application that calculates the Relative Codon Deoptimization Index (RCDI) and an expected value of the RCDI for a set of query sequences by generating random sequences with similar G+C content and amino acid composition to the input. This expected RCDI therefore provides a direct threshold value for discerning whether the differences in the RCDI value are statistically significant and arise from the codon preferences or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences.


Description :
The PubMed database is available on the Entrez retrieval system, and was developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM). PubMed provides free access to MEDLINE, NLM's database of more than 13 million bibliographic citations and abstracts in the fields of biomedicine, nursing, dentistry, veterinary medicine, health care systems, and preclinical sciences. PubMed also includes access to additional selected life sciences journals not in MEDLINE. PubMed's LinkOut feature provides access to a wide variety of relevant web-accessible online resources, including full-text publications, biological databases, consumer health information, and research tools. Links are also available to the molecular biology databases maintained by NCBI. New citations are typically added to PubMed Tuesday through Saturday.

Recent develoments :
Two new NCBI databases, Journals and MeSH, were created to provide additional search features for PubMed. The Related Articles algorithm was modified to better calculate citations that are closely related to a selected article. Icons were added to the PubMed Summary page display to indicate whether or not a link to the free full-text article is available. An e-mail feature, cancer subset, and several new indexes were added including: Corporate Author, Comments/Corrections, Place of Publication, Grant Number, and Ahead of Print. In addition, the LinkOut library program now includes print as well as electronic holdings data.

NCBI continued development of PubMed, with enhancements including the addition of over 1.7 million OLDMEDLINE citations. These citations were originally printed in the hardcopy indexes published from 1950 through 1965. The MeSH database was enhanced with terms that are identified by MeSH as pharmacological actions and a direct link was added to the Clinical Queries page. The Clinical Queries page was also revised and the filter strategies were updated. The History page now includes a menu from the search statements number to provide an easier way to combine, delete and retrieve History statements. The truncation limit in PubMed was increased from 150 variations of a truncated term to 600. Entrez programming software used by PubMed was enhanced to improve the way in which PubMed interprets users' queries. A new Entrez NCBI database, NLM Catalog, was developed to provide additional search features for books, journals and audiovisuals in the NLM collections.

Viralzone General website dedicated to viruses containing all viral families and genera.


Description :
GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300 000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage using the link provided.


Description :
Database for dinucleotide properties


Description :
Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions in vertebrate genomes, entitled ECRbase, which is constructed from a collection of whole-genome alignments produced by the ECR Browser. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and Fugu genomes. It is freely accessible at

Aknowledgement :
ECRbase was supported in part by grants from the Lawrence Livermore National Laboratory

References :
G.G. Loots and I. Ovcharenko, ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes, Bioinformatics, 23(1):122-4 (2007)


Description :
