Where did SARS-CoV2 originate?

Spoiler alert: lab involvement very likely

Dec 04, 2021

Welcome to Science with Shrike! Today, we will delve into a politicized topic: the origins of SARS-CoV2. Shrike plans to avoid the political aspect to look at the evidence available and what information we would need to have higher confidence in our conclusions. You may already have a clear belief on this topic. That’s ok; political messaging does a very good job of polarizing you to one belief or another. If you disagree with the conclusions or have other questions, drop them in the comments!

What is Gain-of-Function Research?

Before we get into the virus biology, let’s talk a bit about NIH funding and gain-of-function research. Gain-of-function research is when we modify a pathogen to make it deadlier (usually to humans, but also crop pathogens). Sometimes this is intentional, sometimes this is unintentional. For example, if someone is studying Streptococcus pyogenes, which is a flesh-eating bacterium that also causes Scarlet Fever and strep throat, they might want to add or delete a gene to test a mechanism of interest (eg ‘Gene X enhances anaerobic growth’). Even if it’s unrelated to virulence, the potential for accidentally making the bacterium deadlier needs to be considered, and mitigated to the extent possible. For example, with S. pyogenes, it might be possible to use a strain that isn’t as deadly to humans. However, sometimes this is not possible (need to work with a particular clinical strain) or the question directly relates to the virulence.

Intentional gain-of-function research is important because it helps us understand how pathogens work and confirm specific mechanisms. Gain-of-function work is not automatically ‘bioweapon’ work. ‘Bioweapon' type of work is called ‘dual function’ research because it has both civilian and direct military application. The difference is that gain-of-function refers to any work potentially enhancing a pathogen; for military use, some pathogens are not worth using. To use a conventional weapons analogy, making a knife longer, or hold an edge better will improve its ability to hurt people, but it’s nowhere near the change in threat profile of upgrading from firing .22LR to 5.56mm.

Did NIH fund gain-of-function research in Wuhan, China?

Gain-of-function was not funded by NIH generally from 2014-2017 for flu, and the deadly human coronaviruses, SARS and MERS. After 2017, an extra committee was made to approve gain-of-function work. The US was also concerned about a pandemic respiratory virus, so there was a USAID-Emerging Pandemic Threats-PREDICT program aimed at identifying emerging pathogens, and how they might work before they jump into humans and cause a pandemic. This is a foreign aid program where the US partners with other countries to promote global health. This is also the program Trump was criticized for ending. USAID is not NIH.

So what about coronavirus gain-of-function research? Menachery et al 2015 Nat Med 21(12): 1508–1513 is one example. In this paper, the authors stick a novel Spike protein into a lab-generated coronavirus, and it improves infectivity in human cells. First, the paper being published in 2015 does not mean the gain-of-function work was performed then. Notably, the authors state “Experiments with the full-length and chimeric SHC014 recombinant viruses were initiated and performed before the GOF research funding pause and have since been reviewed and approved for continued study by the NIH.” Second, lots of funding sources, affiliations and authors means it is hard to say who did which parts of the paper where. Note that Dr Zhengli Shi at the Wuhan lab was funded by USAID, not NIH here. Other authors were funded by NIH, and the gain-of-function work was approved. However, if you check out projects using NIH Reporter, you can pull up research from any PI. Instead of looking up sub-projects (info not always given), you can check publications to figure out potential subawards. For example, Hu et al 2015 Virol J 12: 221 cites support for Dr Shi from an NIH grant. No US authors on there.

Does this mean the NIH funded gain-of-function research in China? Technically, it probably did not authorize it, but the question is directionally true. The NIH funded gain-of-function research, but not in Wuhan. The NIH funded Wuhan, but not specifically for gain-of-function research. The FOIA lawsuit from the Intercept confirms that Wuhan was funded by NIH to screen coronaviruses, including some recombinant ones made previously. The core contention for ‘gain-of-function’ is that EcoHealth failed to notify NIH that the recombinant bat coronaviruses had improved infectivity in mice, which would have triggered an additional review to determine if it should receive gain-of-function review. The recombinant coronavirus screen did not include anything that could give rise to SARS-CoV2. In Shrike’s opinion, this is weak support for ‘funding gain-of-function’ research because the review likely would have passed. If it was determined to be ‘gain-of-function’, the most likely outcome is more paperwork. Depending on what other funding was available, the “gain-of-function” part could also have been officially moved to another source of funding. So on one hand, the statement that the “NIH did not fund gain-of-function research in Wuhan” is likely a true statement, so long as it is carefully worded like that. On the other hand, the NIH (and USAID) funded the lab that is most likely the point of origin for SARS-CoV2 for novel coronavirus screening in order to prevent something like the SARS-CoV2 pandemic. Alternatively, Dr Shi could have performed gain-of-function research using NIH funds without telling the NIH (or even EcoHealth).

Teasing this out for a well-funded investigator is hard: which source of funding covered the experiments? Suppose Shrike has a box of 200 pipettes, and use 5 of those for gain-of-function, did the funder pay for gain-of-function research? If Shrike use 5000 pipettes annually, ordered them from 5 different grants (none of which are for gain-of-function research), which one paid for those 5 pipettes? If they only need 100 pipettes for a project, and you can only buy 200 at a time, who’s responsible for the extra 100 pipettes? Now multiply this by every reagent in the lab. It gets even worse for equipment. This is why Shrike believes both that a carefully worded statement can be true, but also not relevant to the overarching issue of lab funding. The main points in Shrike’s opinion are that the NIH funded Dr Shi for pandemic coronavirus screening, and NIH itself (including Fauci) was not aware of any research that could have directly led to SARS-CoV2.

Did SARS-CoV2 come from a lab?

With that background covered, the next question is the origin of SARS-CoV2. Did it come from a wet market in Wuhan, which reportedly was not even serving bats that day, or from the local BSL-3 lab dedicated to collecting and screening novel coronaviruses?

If the lab keeps up to date notebooks, those detail what experiments were done and how they were done. This would contain all the information needed to answer these questions. Since this lab is funded by the NIH, NIH has authority to visit the site and request additional documentation for the experiments supported by the funded work, and safety records. Shrike can no longer find the Chinese government statement about encouraging better biosafety from December 2020, so it is just rumor/recollection now. The Chinese also sanitized the lab early on during the pandemic. This is circumstantial evidence of a coverup. However, it does mean that evidence needed to determine one way or another has been compromised. It is impossible to determine for certain exactly what happened. The problem with a “reasonable doubt” standard is one’s reason changes depending on if they are pro- or anti-China. Also, when it comes to geopolitics, scientific proof is unnecessary for accomplishing foreign policy goals (see Gulf of Tonkin, and Iraq WMDs). Therefore, a focus on “proof”, or even an “investigation” is deflection, because China compromised any evidence implicating or exonerating the lab with its actions. The failure of international investigators to obtain the primary evidence gives everyone an excuse for disbelieving whatever is found (China faked it for reason x). Therefore, deciding “truth” in this situation is based on your feelings, not facts.

This is also why Shrike may be using modifiers to reduce certainty to some of these considerations. Lab leak seems to be the most likely hypothesis to Shrike. A bat coronavirus escaping from a place dedicated to collecting and working with bat coronaviruses is strong rationale. On the other hand, hypotheses with strong rationales can and often are wrong. You have to decide for yourself which you think is more likely. Shrike does see the irony in the USAID-PREDICT program directly funding the very pandemic it was intended to prevent.

Was SARS-CoV2 made in a lab?

Another closely related, but distinct question is if SARS-CoV2 was made in the lab. If it was, that would settle the source question more definitively. However, wording is important because ‘not made in a lab’ is not the same as ‘lab is not the source of the outbreak’. In Shrike’s opinion, the evidence is insufficient to distinguish between ‘naturally occurring coronavirus that escaped’, ‘lab-modified coronavirus that escaped’. Fully lab-synthesized is unlikely, and there are some origins that were correctly debunked. We’ll start with those.

Early reports asserted that SARS-CoV2 had a natural origin. These were overstated because they did not consider all the ways a virus could be engineered. Notably for this discussion, the SARS-CoV2 genome is most similar to bat coronaviruses, except for the Spike protein, which is more similar to pangolin. There are also a couple small. unique features seen in SARS-CoV2, not seen in closely related coronaviruses. Consequently, it is correct that SARS-CoV2 was not made the same way as done in the Nat Med 2015 paper referenced above. Genome analysis also shows that the older molecular biology techniques using restriction enzymes were not used. Claims about HIV sequences were wrong. Genome evolution studies that compared the rate of mutations in SARS-CoV2 between pangolin and bat portions suggested that the pangolin part was inserted 20 years ago. This would mean that the pangolin part was transferred prior to the coronavirus ever coming to Wuhan.

The problem with these analyses is that they left out a few other possibilities. For example, the genome for SARS-CoV2 could be synthesized entirely synthetically. This is unlikely due to cost, but possible. There are also modern cloning methods that do not leave restriction sites in the final product, which could have been used in modifying the virus (perhaps by inserting a pangolin sequence into a bat coronavirus). From a scientific standpoint this is unlikely due to rationale. If you are adding sequences, you start with a coronavirus that you already have well-characterized. So if you are testing different Spike proteins, you would start with your favorite bat coronavirus, not the newest one around. The only exception would be if the new one was special in some way. However, this is rationale, not hard evidence. To get a better idea of these possibilities, you can check the codon optimization.

Codon optimization refers to the fact that there are several ways to read mRNA to make most amino acids. The mRNA sequence CGG and CGA are called codons, and they both code for the same amino acid: Arginine It turns out that organisms do not use each of these codons equally. Therefore, each organism has a set of codon preferences that are distinct. There is also a second level of codon optimization, which is codon pairs. If an organism likes CGG for Arginine best, if it needs to code two Arginines in a row, you would assume it would be CGGCGG. This turns out to be wrong. There is a separate set of codon pair preferences, which also differ by organism. It needs to be emphasized that codon usage is rarely 100%/0%, so this needs to be analyzed for a large series of codons. Finding one unusual Arginine codon, for example, does not mean anything. This means codon analysis is useless for very short sequences (like the polybasic insertion in SARS-CoV2).

In contrast, codon optimization for something larger, like the Spike protein, can be useful. If SARS-CoV2 has a bat origin, it should have codon usage more similar to bats, even for the sequences more closely related to pangolins. If it was a cut-and-paste, Spike should be pangolin-optimized, while the rest is bat optimized. If it was synthesized in the lab, it would most likely be codon-optimized for humans or mice. Codon optimizing for bats would be possible, but most gene synthesis services only offer human, mice, E coli and yeast, so it would need to be done manually. Given that differences between mammalian codon optimizations are not huge, it’s not as important to get bat vs human or mouse, so it is less likely researchers would go this route, given the extra work (doing the optimization yourself vs clicking a button that says ‘optimize for human’). Since bat and pangolin codon usage is similar to each other and to human codon usage, codon usage analysis is hard to interpret. However, the SARS-CoV2 codon usage is not optimized for humans or mice. Spike also does not jump out as pangolin-optimized over bat optimized. That suggests a purely synthetic origin, or even a pangolin Spike cut-and-paste in the lab, is unlikely. So far, looks like natural origin is most likely.

There is one last way by which SARS-CoV2 could be modified in the lab. That is by culturing the virus in vitro using monkey or human cell lines. Collecting viruses is one thing, propagating them is another. To propagate a virus in bats, you need an active bat colony, and lots of money/containment to keep the colony going. Instead of growing the virus in bats, if the virus can be grown in human or monkey cells, it becomes easier, safer and cheaper to handle. One challenge with this approach is that it may cause differences in infectivity and severity of disease depending on the source. To control for this, one often only infects bats as the last step of virus propagation before using the bat-derived virus for experiments.

Growing viruses in cells avoids the trouble of using bats. However, there are two challenges. First is that the virus may not be adapted to growing in human or monkey cells. This problem is solved by throwing large amounts of virus on the cells and picking out the ones that survive, growing those to large amounts, and repeating the process many times until you get viruses that grow as fast as you want in the cells. This becomes a human (or monkey) cell adapted strain. In this process, it is not surprising that the virus adds the best things to improve infection, like furin cleavage sites. This is going to show accelerated evolution because of the selection process, so the 20-year evolutionary measurement is not reliable for serial passage through cell lines.

The second is attenuation. As you realized from the previous paragraph, the population of viruses is not homogenous. The most competitive subpopulations proliferate faster and take over. Therefore, there is intense pressure to become as efficient as possible. In this setting, viruses ditch many genes that do not help them survive and optimize for the environment they are in (cultured cells). However, cultured cells lack organ systems and an immune system. That means losing immune evasion genes, and potentially organ navigation genes confers an advantage in vitro. This process is called attenuation, and it makes a virus weaker. This approach was used to create the Sabin polio vaccine (oral polio vaccine or OPV), which is used in third world countries for polio vaccinations. The attenuated virus can still spread, but it induces protective immunity. Thus, you only need to vaccinate the one person who risks that you are not a CIA spy when you come with polio vaccines in the middle of nowhere Pakistan, and you can still potentially vaccinate the entire village. It also reverts to neurovirulence in ~1 in 2,500,000 cases, which is why the US uses the inactivated polio vaccine (IPV) instead. This illustrates the power of attenuated viruses, along with reasons why we prefer other methods for vaccine generation today.

Back to random bat coronaviruses. If you want to grow them in cells, these two processes—adaptation and attenuation—are both in play because the virus does not normally grow in human cells. The adaptation will outweigh the attenuation because ‘can’t grow in human cells’ is a bigger problem for the virus than ‘organ dissemination/immune clearance’. If someone passaged a non-human virus in cell culture just for maintenance—notice ‘biowarfare’ is not necessary here, and that for ‘biowarfare’ it would need to be passaged in animals, once it adapted, it’s likely going to be more virulent. It will accumulate mutations faster than expected (so the 20-year analysis fails), and might come up with things like furin cleavage sites and other tricks to optimize infectivity without a human ever inserting it. Any loss of bat-specific immune evasion may or may not change survival in the host. Thus, adapting a bat virus to human cells is expected to increase infectivity.

So how do we tell between these possibilities? We can’t based on the information that we have. Examination of well-kept lab books and other records in the lab would sort out 1) if SARS-CoV2 was kept in the Shi lab, known to them prior to the outbreak, and any natural source of the virus, 2) what bat coronaviruses had been identified vs left unidentified in the lab and 3) which viruses were passaged in human or monkey cells, either to generate viruses that could infect human/monkey cells or just as part of routine passage.

Preserving and sorting out this information would be Shrike’s priority for a regulatory agency, with oversight authority (ie Chinese government, NIH and WHO). The Chinese government has control of that information now, and they claimed the US was behind releasing SARS-CoV2. If the Chinese government opted to, they could alter and/or fabricate these documents. A less nefarious alternative is that the lab notebooks were poorly kept, and the necessary information is not present. That is very plausible. While Shrike does not favor an investigation at this point because it would be superficial and/or politicized, adding a good patent attorney to any investigative team would be a promising sign. Patent attorneys do a great job of looking for (and finding) lab notebook fraud because academics are terrible at keeping lab notebooks up to date.

To summarize what we know about SARS-CoV2 origin:

1) We cannot tell if it was an accidental or intentional release from the lab.

2) The virus shows no signs of being engineered, but we also cannot rule out serial passage.

3) If serial passage was used, we cannot distinguish between using the technique to expand the virus vs deliberately trying to improve infectivity in human/monkey/mouse cells.

4) Distinguishing between all of these alternatives has been compromised by the Chinese government.

Shrike favors the ‘accidental release from the lab while experimentally infecting a bat with a coronavirus that had been propagated in cell lines’ hypothesis, because Shrike believes in the triumph of human mistakes over human malice. Shrike would not be surprised to be wrong on this, though. Importantly, beware people who claim any of these options as definitive (or even beyond reasonable doubt), without additional evidence. They might be opinions, and even strongly-held opinions, but we’re missing the data to tell for sure.

Science with Shrike

Where did SARS-CoV2 originate?

Spoiler alert: lab involvement very likely

Discussion about this post