May 20, 2019, midnight

Analysis of my gut microbiome

I finally sequenced my poop. This was a thing I was wanting to do for a long time, I have been watching the microbiome scene evolving since around 2011 and when I had this opportunity I got very excited in know what lives in my gut, you know... "for science".

The methodology used was an amplicon sequencing on 16S gene at V3/V4 region. In case you don't know what that means do not worry, I will give a short explanation in the next paragraph, but beware, the explanation will have gross oversimplifications to keep things shorter and easier to digest.

alt text

The next paragraph

The 16S gene is a big collection of letters (ACTG) that exists in almost all prokaryote organisms, it's commonly used to identify those microbes because it is "highly conserved" which can also be stated as doesn't change too much over time. If we isolate some bacteria, let's say Yersinia pestis (the one that causes the plague) and sequence it's entire genome it will have something between 500-7500 genes and one those genes will be the 16S and will likely be the same for all Yersinia pestis around the globe.

This is how the 16S gene for Yersinia pestis looks like:

ATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGCAGCGGGAAGTAGTTTACTACTTTGCCGGCGAGCGGCG
GACGGGTGAGTAATGTCTGGGGATCTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATGACCTCG
CAAGAGCAAAGTGGGGGACCTTAGGGCCTCACGCCATCGGATGAACCCAGATGGGATTAGCTAGTAGGTGGGGTAATGGC
TCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGG
GAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTT
GTAAAGCACTTTCAGCGAGGAGGAAGGGGTTGAGTTTAATACGCTCAATCATTGACGTTACTCGCAGAAGAAGCACCGGC
TAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCG
GTTTGTTAAGTCAGATGTGAAATCCCCGCGCTTAACGTGGGAACTGCATTTGAAACTGGCAAGCTAGAGTCTTGTAGAGG
GGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAA
GACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATGTCGACT
TGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAA
AACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTA
CTCTTGACATCCACAGAATTTGGCAGAGATGCTAAAGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCA
GCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCACGTAATGGTGGGAA
CTCAAGGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTAC
ACACGTGCTACAATGGCAGATACAAAGTGAAGCGAACTCGCGAGAGCCAGCGGACCACATAAAGTCTGTCGTAGTCCGGA
TTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTAGATCAGAATGCTACGGTGAATACGTTCCCGG
GCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTAC
CACTTTGTGATTCATGACTGG


After my poop was sampled, it was handled in a laboratory that extracted the DNA of a small subset of 16S gene from all the bacterias living in there and sequenced it on an Illumina Miseq. This small subset is called V3/V4 and is about 250bp.

My poop resulted in 123778 reads, I took a random one so you can see how it looks like:

CACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTGAAGAAG
GCCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACAAATGTGTAAGTAACTATGCACGTCTTGACGGTACCTAATCAG
AAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAACACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGC
GCGCGTAGGCGGTTTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAA

If this subset was found in the full gene of Yersinia pestis below, I probably should pay a visit to my doctor, luckily it match the 16S gene for an Staphylococcus capitis which usually is harmless.

What lives in my poop

The analysis pipeline that I choose (Shaman) gave-me a taxon for 74% of my reads, the remaining 26% it wasn't able to infer a taxonomy at any level or didn't pass any of the quality control steps present in the pipeline.

Let's take a look at the Phylum data:

# PHYLUM         # READS
Firmicutes       50268
Bacteroidetes    40042
Proteobacteria   2172
Actinobacteria   62
Thaumarchaeota   51
Verrucomicrobia  21
Lentisphaerae    16

From this data we can get one insight, you see, there is this thing called F/B ratio, it's not entirely a consensus but if you have a number bigger than 1, you are probably overweight.

The F/B ratio, is the ratio between Firmicutes and Bacteroides, they both are usually the most abundant phylum in any poop you can get your hands on.

50268 (Firmicutes) / 40042 (Bacteroidetes) = 1.25

Yes, I'm a little overweight.

What about the species?

Remember that we sequenced only a small (~250bp) subset of 16S gene? Unfortunately this short amplicon is not enough to classify everything at species level. Shaman gave-me only 31119 reads distributed among 13 species.

# SPECIE                                     # READS
Faecalibacterium prausnitzii                 16600
Prevotella stercorea                         7340
Bacteroides uniformis                        3914
Bacteroides eggerthii                        1584
Prevotella copri                             1086
Parabacteroides distasonis                   234
Dorea formicigenerans                        90
Lactobacillus salivarius                     83
Bacteroides ovatus                           72
Bifidobacterium adolescentis                 51
Eubacterium biforme                          35
Oxalobacter formigenes                       19
Collinsella aerofaciens                      11

The one that has more reads is Faecalibacterium prausnitzii, which is nice since this bacteria seems to play a role in helping fight inflammation and is usually decreased in Crohn's disease.

Another fun bacteria that I'm happy to have is the Oxalobacter formigenes, one function attributed to this bacteria is the Oxalate degradation which prevents the formation of Kidney Stones. Sadly oxalate-degrading activity cannot be detected in the gut of some individuals, one of the theories that explain this absence is the indiscriminate use of quinolones ( antibiotics like cipro ).

Indeed my last kidney sonogram doesn't show any sign of stones, but if there is a bacteria that prevent gallstones, I may be missing it...

There is a lot of literature about almost all my bacterias, but using my data alone they are mostly irrelevant. Even comparing my results with other big studies should be difficult, since difference in primers, sequencing methods, pipelines and reference database could create a lot of bias over my values.

But still, there is space for more knowledge extraction, I will definitely get back to it on a future article, but only after I write my own analysis pipeline.

Stay tuned.

References:

comments powered by Disqus