Similar to: If you use conventional data compression on …

To the extent that genomes can be thought of as compressed encodings of biological structures, they are spectacularly efficient. All the trillions of cells in the human body-not just the tens of billions in the brain-are guided in one way or another by the information contained in 30,000 or so genes. The best high-quality set of pictures of the body- the National Institutes of Health Visible Human Project, a series of high-resolution digital photos of slices taken from volunteer Joseph Paul Jernigan (deceased)-takes up about 60 gigabytes, enough (if left uncompressed) to fill about 100 CD-ROMs-and still not enough detail to capture individual cells. The genome, in contrast, contains only about 3 billion nucleotides, the equivalent (at two bits per nucleotide) of less than two-thirds of a gigabyte, or a single CD-ROM.

The beauty in the genome is of course that it's so small. The human genome is only on the order of a gigabyte of data...which is a tiny little database. If you take the entire living biosphere, that's the assemblage of 20 million species or so that constitute all the living creatures on the planet, and you have a genome for every species the total is still about one petabyte, that's a million gigabytes - that's still very small compared with Google or the Wikipedia and it's a database that you can easily put in a small room, easily transmit from one place to another. And somehow mother nature manages to create this incredible biosphere, to create this incredibly rich environment of animals and plants with this amazingly small amount of data.

[T]o encode a brain genetically, based on the hardware that we are using, we need something like at least 500 kilobytes of code... actually... it's going to be a little more, I guess. It sounds like surprisingly little... but in terms of scientific theories this is a lot. ...The universe, according to the core theory of quantum mechanics... it's like half a page of code... to generate the universe. ...[I]f you want to understand evolution, it's like a paragraph... a couple lines, really, to understand an evolutionary process. ...[T]here's lots ...of details that you get afterwards, because this process itself doesn't define what all the animals are going to look like. In a similar way, the code of the universe doesn't tell you what this planet is going to look like and you... are going to look like. It's just defining the rule book.

If I read the genome out to you at the rate of one word per second for eight hours a day, it would take me a century. If I wrote out the human genome, one letter per millimetre, my text would be as long as the River Danube. This is a gigantic document, an immense book, a recipe of extravagant length, and it all fits inside the microscopic nucleus of a tiny cell that fits easily upon the head of a pin.

Exponential Growth in Storage Consider data storage, which is critical for the genomics world today. The 3.2 billion base pairs of your genome correspond to about 725 megabytes of data, or 0.75 gigabytes of storage. In 1981, if you were to store your uncompressed genome, a 1-gigabyte hard drive of storage cost half a million dollars. Today, it’s 50 million times cheaper at under 1 cent per gigabyte.

Your synapses store all your knowledge and skills as roughly 100 terabytes’ worth of information, while your DNA stores merely about a gigabyte, barely enough to store a single movie download.

How difficult is it to define a brain? We know that the brain must be somewhere hidden in the genome [which] fits in a CD-ROM. It's not that complicated. It's easier than Microsoft Windows. ...[A]bout 2% of the genome is coding for proteins, and maybe about 10%... tells you when to express which protein, and the remainder is mostly garbage. It's old viruses that are left over and it's never been properly deleted [etc.] because there are no real code revisions in the genome. ...How much of this 10%, [i.e.,] about 75 megabytes code for the brain, we don't really know. What we do know is that we share almost all of this with mice. Genetically speaking, a human is a pretty big mouse, with a few bits changed to fix some of the genetic expressions. ...Most of the stuff there is going to code for cells and metabolism and what your body looks like, [etc.]...

Biology doesn't know in advance what the end product will be; there's no Stuffit Compressor to convert a human being into a genome. But the genome itself is very much akin to a compression scheme, a terrifically efficient description of how to build something of great complexity-perhaps more efficient than anything yet developed in the labs of computer scientists (never mind the complexities of the brain, there are trillions of cells in the rest of the body, and they are all supervised by the same 30,000-gene genome). And although there is no counterpart in nature to a program that compresses a picture into a compact description, there is a natural counterpart to the program that decompresses the compressed encoding, and that's the cell. Genome in, organism out. Through the logic of gene expression, cells are self-regulating factories that translate genomes into biological structure.

From the very beginning of time until the year 2003," says Google Executive Chairman Eric Schmidt, "humankind created five exabytes of digital information. An exabyte is one billion gigabytes — or a 1 with eighteen zeroes after it. Right now, in the year 2010, the human race is generating five exabytes of information every two days. By the year 2013, the number will be five exabytes produced every ten minutes … It's no wonder we're exhausted.

The CD-ROM's worth of information in the genome really wouldn't be enough to paint a bitmapped picture of an embryo, but it is enough to describe a process for building one. An artist who only wants to paint a picture that looks like a kind of tree has much less to remember than an artist who wants to paint a particular Ponderosa Pine from memory; in a similar way, if some alien's genome had to encode every cell in a body, it would need much more information (many more nucleotides) than our genomes do, because ours specify a general way to build a creature rather than an exact picture of every detail of the finished product. Our genomes are lossy because they specify methods rather than pictures, but it is precisely that lossiness that allows them to so efficiently supervise the construction of complex biological structure.

the very beginning of time until the year 2003," says Google Executive Chairman Eric Schmidt, "humankind created five exabytes of digital information. An exabyte is one billion gigabytes — or a 1 with eighteen zeroes after it. Right now, in the year 2010, the human race is generating five exabytes of information every two days. By the year 2013, the number will be five exabytes produced every ten minutes … It's no wonder we're exhausted.

You contain a trillion copies of a large, textual document written in a highly accurate, digital code, each copy as voluminous as a substantial book. I'm talking, of course, of the DNA in your cells.

a typical chromosomal DNA molecule in a human being is composed of about five billion pairs of nucleotides… But since there are four different kinds of nucleotides, the number of bits of information in DNA is four times the number of nucleotide pairs. Thus if a single chromosome has five billion (5 X 10^9) nucleotides, it contains twenty billion (2 X 10^10) bits of information… We also see that if more than some tens of billions (several times 10^10) of bits of information are necessary for human survival, extragenetic systems will have to provide them: the rate of development of genetic systems is so slow that no source of such additional biological information can be sought in the DNA.

the very beginning of time until the year 2003,” says Google Executive Chairman Eric Schmidt, “humankind created five exabytes of digital information. An exabyte is one billion gigabytes — or a 1 with eighteen zeroes after it. Right now, in the year 2010, the human race is generating five exabytes of information every two days. By the year 2013, the number will be five exabytes produced every ten minutes … It’s no wonder we’re exhausted.

Reference Quote

Similar Quotes

Reference Quote

Similar Quotes

Go Premium

Works in ChatGPT, Claude, or Any AI

Share Your Favorite Quotes