23andme: The power of genetic correlations

Science brings you far, a ton of data brings you farther.

23andme was founded in 2006 by Linda Avey, Paul Cusenza and Ann Wojcicki, the ex-wife of Google founder Sergey Brin. Its purpose: to provide B2C genetic testing and interpretation; directly to the consumer, with no complexities.

The process is simple: First, payment. $99 for an ancestry report, or $199 for the full health and ancestry report. Within a few days, a package is delivered to your doorstep with a small tube. A few milliliters of saliva later, the tube may be shipped back to lab for analysis. A couple of weeks later, you get a report back.

 

Value Creation

23andme is based partially on SNP genotyping – basically, 99.5% of human DNA is identical from person to person, but the small differences (“variants”) are what makes us unique. Without going into too much detail (mostly because I will get it wrong), specific sequences and structures of the DNA are tied significantly to external qualities of the individual.

Understanding DNA sequences and what they tell us about where we come from, or what risks we are prone to, is part science and part Big Data – and the king among all is his majesty, Correlation. If a sequence X appears most often in people with symptom Y, it is perceived as a good predictor for others with a similar sequence who want to know if they are likely to have the same symptom. One of the earliest correlations found is a gene the correlates with high reliability to the photic sneeze reflex – the kind that makes you sneeze in the sun. Yes, I have this gene.

“The information encoded in your DNA determines your unique biological characteristics, such as sex, eye color, age and Social Security number.“

-Dave Barry, Pulitzer Prize winning author and columnist

However, value creation doesn’t stop at science and correlation from training sets. Once the data is delivered, users are strongly encouraged by 23andme (see image below) to contribute themselves to the growing training set by answering a short questionnaire about who or what they really are and the symptoms they know about themselves. The data is then stored carefully (without the name of the individual) for further research.

If this isn’t pressure, I don’t know what is.

 

Security must be taken seriously, or people will refrain from using the product.

 

Value Capture

As mentioned, the reports themselves cost money, but also intrinsically contribute to the stability and quality of the final product. Incentivising customers to disclose what some may say is the most private thing people possess – their biological identity – is done by reaching out to some of our basic desires – to know where they come from, and where they are going.

 

The main two reports that rely on clustering and analytics are the ancestry and health reports. The ancestry reports informs their customers about their origins and where they come from, and the health report informs of any potential health risks the individual may have.

(Many photos of the reports are available online; for the sake of privacy, I will not post my own. I will, however, share that I found out that I am 0.4% scandinavian. Which is not surprising given my love for [internet] trolls)

 

Challenges

In 2013 the FDA issued a warning letter to 23andme. Part of the warning related to issues of health regulations, but since the purpose of this post is Data analytics this will not be discussed here. A main concern that the FDA had is that 23andMe has failed to provide adequate evidence that its product provides accurate results. In addition, many people raised the issue of causal relationships and correlation, and questioned whether true diagnosis/reports can and should be issued based on pure correlation.

Later, changes were made to the 23andme reports that carefully mention the probability of risks associated with a specific genetic stream, rather than diagnose it. A legal acknowledgement sign by participants ensures they understand the risk and uncomfortable news they may get from such reports (See Here about how 23andme outed parents who gave their baby up for adoption).

The Road Ahead

Many competitors are trying to tap into the genetic sequencing B2C market. With one particular alternative, Rthm, “the world’s first automated, mobile DNA analysis technology”, users are able to go a step further and leverage the insights produced from their genetic test to implement changes to their everyday routine through a mobile application, all in real time. A quote on the Rthm site reads “Beyond analyzing 41 genetic traits related to the brain, immune system, and more, riDNA sends you personalized tips based on your genetics, heart health, sleep, and exercise, directly to your phone, everyday.”

 

Conclusion

While 23andme carefully enjoys a first mover advantage, with a strong and well connected founding team, it remains to be seen whether a competitor will take over. One thing is for sure – more users, more big data power; a true combination of big data analytics and the power of a community can make it almost indispensable and create a large moat that will make it harder for competitors to enter the market. And then there’s the issue of privacy, but that’s for the next blog post…

 

Sources

 

https://www.fastcompany.com/3018598/for-99-this-ceo-can-tell-you-what-might-kill-you-inside-23andme-founder-anne-wojcickis-dna-r

http://www.newyorker.com/tech/elements/the-f-d-a-vs-personal-genetic-testing

https://www.forbes.com/sites/matthewherper/2013/11/25/23andstupid-is-23andme-self-destructing/#642e23f111c0

The changes at 23andMe

https://www.scientificamerican.com/article/23andme-is-terrifying-but-not-for-the-reasons-the-fda-thinks/

http://www.businessinsider.com/i-tried-the-new-23andme-genetic-test-2015-12/

http://content.time.com/time/specials/packages/article/0,28804,1852747_1854493,00.html

https://www.23andme.com/dna-health-ancestry/

https://www.23andme.com/howitworks/

https://en.wikipedia.org/wiki/SNP_genotyping

Advances in AI and ML are reshaping healthcare

https://www.findrthm.com/

Previous:

Constraints on Data-driven Systems in Public Education

Leave a comment